Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: bench_kv sticks #160

Closed
MrCroxx opened this issue May 16, 2022 · 4 comments · Fixed by #166
Closed

bug: bench_kv sticks #160

MrCroxx opened this issue May 16, 2022 · 4 comments · Fixed by #166
Assignees
Labels
bug Something isn't working

Comments

@MrCroxx
Copy link
Owner

MrCroxx commented May 16, 2022

bench_kv sticks. Maybe caused by deadlock, incorrect wait condition or incorrect CAS operations.

Can debug with:

RUSTFLAGS="--cfg tokio_unstable" RUNKV_METRICS=true RUST_BACKTRACE=1 cargo run --features console,deadlock --release --package runkv-bench --bin bench_kv -- --loop 1000
@MrCroxx MrCroxx added the bug Something isn't working label May 16, 2022
@MrCroxx MrCroxx self-assigned this May 16, 2022
@MrCroxx
Copy link
Owner Author

MrCroxx commented May 16, 2022

Weird. No deadlock detected and no unreasonable idle task.🤔

@MrCroxx
Copy link
Owner Author

MrCroxx commented May 16, 2022

Error raised in log:

2022-05-16T12:04:59.698565Z ERROR runkv_rudder::worker::compaction_detector: trigger compaction l0 error: other: compaction error: [Err(StorageError(ManifestError(InvalidVersionDiff("sst L1-433791697681 not exists")))), Err(StorageError(ManifestError(InvalidVersionDiff("sst L1-433791697690 not exists")))), Err(StorageError(ManifestError(InvalidVersionDiff("sst L1-433791697716 not exists")))), Err(StorageError(ManifestError(InvalidVersionDiff("sst L1-433791697708 not exists"))))]

Maybe caused by the error. And #149 maybe caused by the same reasion.

Not related, QwQ.

@MrCroxx
Copy link
Owner Author

MrCroxx commented May 16, 2022

New founds:

A new round of vote began when stuck:

2022-05-16T16:40:14.521787Z  INFO runkv_common::tracing_slog_drain: starting a new election raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:14.521823Z  INFO runkv_common::tracing_slog_drain: became pre-candidate at term 1 raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:14.521869Z  INFO runkv_common::tracing_slog_drain: broadcasting vote request raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:15.806796Z  INFO runkv_common::tracing_slog_drain: starting a new election raft_id=25 raft node=25 group=9 namespace=raft
2022-05-16T16:40:15.806835Z  INFO runkv_common::tracing_slog_drain: became pre-candidate at term 1 raft_id=25 raft node=25 group=9 namespace=raft
2022-05-16T16:40:15.806880Z  INFO runkv_common::tracing_slog_drain: broadcasting vote request raft_id=25 raft node=25 group=9 namespace=raft
2022-05-16T16:40:17.312915Z  INFO runkv_common::tracing_slog_drain: starting a new election raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:17.312957Z  INFO runkv_common::tracing_slog_drain: became pre-candidate at term 1 raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:17.313005Z  INFO runkv_common::tracing_slog_drain: broadcasting vote request raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:19.841470Z  INFO runkv_common::tracing_slog_drain: starting a new election raft_id=25 raft node=25 group=9 namespace=raft
2022-05-16T16:40:19.841507Z  INFO runkv_common::tracing_slog_drain: became pre-candidate at term 1 raft_id=25 raft node=25 group=9 namespace=raft
2022-05-16T16:40:19.841549Z  INFO runkv_common::tracing_slog_drain: broadcasting vote request raft_id=25 raft node=25 group=9 namespace=raft
2022-05-16T16:40:20.091152Z  INFO runkv_common::tracing_slog_drain: starting a new election raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:20.091189Z  INFO runkv_common::tracing_slog_drain: became pre-candidate at term 1 raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:20.091230Z  INFO runkv_common::tracing_slog_drain: broadcasting vote request raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:22.891380Z  INFO runkv_common::tracing_slog_drain: starting a new election raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:22.891420Z  INFO runkv_common::tracing_slog_drain: became pre-candidate at term 1 raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:22.891462Z  INFO runkv_common::tracing_slog_drain: broadcasting vote request raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:23.857864Z  INFO runkv_common::tracing_slog_drain: starting a new election raft_id=25 raft node=25 group=9 namespace=raft
2022-05-16T16:40:23.857902Z  INFO runkv_common::tracing_slog_drain: became pre-candidate at term 1 raft_id=25 raft node=25 group=9 namespace=raft
2022-05-16T16:40:23.857946Z  INFO runkv_common::tracing_slog_drain: broadcasting vote request raft_id=25 raft node=25 group=9 namespace=raft
2022-05-16T16:40:25.661396Z  INFO runkv_common::tracing_slog_drain: starting a new election raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:25.661436Z  INFO runkv_common::tracing_slog_drain: became pre-candidate at term 1 raft_id=17 raft node=17 group=6 namespace=raft
2022-05-16T16:40:25.661479Z  INFO runkv_common::tracing_slog_drain: broadcasting vote request raft_id=17 raft node=17 group=6 namespace=raft

@MrCroxx
Copy link
Owner Author

MrCroxx commented May 19, 2022

The deadlock locates in runkv_storage::raft_log_store::log, between fn append() and fn read().

fn append()

When batch leader appending the log, it first locks the active_file:

if is_leader {
let mut file = self
.core
.active_file
.write()
.instrument(trace_span!("wait_queue"))
.await;

And when log rotation is needed, it will locks the frozen_files and append the old active file to the queue:

self.core.frozen_files.write().await.push(frozen_file);

fn read()

When reading, frozen_files are locked and be checked first to see if the given index locates in frozen log files. If not, check the active log file instead. The order to acquire the locks of frozen_files and active_file in fn read()is opposite with fn append(), and the lock of frozen_files is forgotten to be freed:

pub async fn read(&self, log_file_id: u64, offset: u64, len: usize) -> Result<Vec<u8>> {
let mut frozen_files = self.core.frozen_files.write().await;
let first_log_file_id = self.core.first_log_file_id.load(Ordering::Acquire);
let log_file_index = (log_file_id - first_log_file_id) as usize;
let buf = if log_file_index < frozen_files.len() {
frozen_files[log_file_index]
.seek(std::io::SeekFrom::Start(offset))
.await?;
let mut buf = vec![0; len];
frozen_files[log_file_index].read_exact(&mut buf).await?;
buf
} else {
let mut active_file = self.core.active_file.write().await;
active_file.seek(std::io::SeekFrom::Start(offset)).await?;
let mut buf = vec![0; len];
active_file.read_exact(&mut buf).await?;
buf
};
Ok(buf)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant