Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] CppCassandraDriverTest.TestBackfillBatchingEnabled fails in master #25737

Closed
1 task done
spolitov opened this issue Jan 24, 2025 · 0 comments
Closed
1 task done
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@spolitov
Copy link
Contributor

spolitov commented Jan 24, 2025

Jira Link: DB-15019

Description

Fails because of test timeout.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@spolitov spolitov added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Jan 24, 2025
@spolitov spolitov self-assigned this Jan 24, 2025
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Jan 24, 2025
spolitov added a commit that referenced this issue Jan 26, 2025
Summary:
The test uncovered the following kind of deadlock:
T1:
1) lock table for read
2) shared lock on catalog manager

T2 (create table):
1) unique lock on catalog manager
2) lock table (indexed) for write and then commit

Step 1 happened for both threads, so T1 tries to acquire shared lock on catalog manager, but cannot do it since T2 holds unique lock on it. While T2 tries to acquire commit lock on table, but fails since readers never become 0.

Fixed this deadlock.

Also added conflict detection for cases where we try to acquire the catalog manager mutex while an object read lock is already held (which can result in deadlocks; see below).
When acquiring a ReadLock on an RwcLock object (e.g. TableInfo / TabletInfo), we now increment a thread-local counter which is checked when we try to shared lock the catalog manager mutex.
This detection could be enabled in rwc_lock.h using RWC_LOCK_TRACK_EXTERNAL_DEADLOCK define.

So fixed all other places found by our tests.
Jira: DB-15019

Test Plan: CppCassandraDriverTest.TestBackfillBatchingEnabled

Reviewers: zdrudi, asrivastava, xCluster, hsunder

Reviewed By: asrivastava, hsunder

Subscribers: esheng, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D41399
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants