feat(infra): concurrent materializer tests #1243

LePremierHomme · 2024-12-28T20:52:17Z

Introducing concurrent tests in materializer, enabling the generation of traceable workloads without significantly altering how test scenarios are written.

Existing tests refactored to use make_testnet instead of with_testnet.

Progress

For follow-up PRs

Create manifest files dynamically
Introduce profiling features
Introduce tracing / statistical analysis features

This change is

fendermint/testing/materializer/tests/docker.rs

fendermint/testing/materializer/src/concurrency.rs

raulk · 2024-12-30T12:24:09Z

fendermint/testing/materializer/tests/docker.rs

@@ -59,13 +54,14 @@ fn read_manifest(file_name: &str) -> anyhow::Result<Manifest> {

 /// Parse a manifest file in the `manifests` directory, clean up any corresponding
 /// testnet resources, then materialize a testnet and run some tests.
-pub async fn with_testnet<F, G>(manifest_file_name: &str, alter: G, test: F) -> anyhow::Result<()>
+pub async fn with_testnet<F, G>(manifest_file_name: &str, concurrency: Option<concurrency::Config>, alter: G, test: F) -> anyhow::Result<()>


I consider this inversion of control (IoC) as convenient utility -- rather bespoke -- that the original developer of the materializer used to create an initial batch of tests. I would not glorify it and turn it into a focal entrypoint for every future test.

There isn't a single clear cut way we'll want all potentially tests that require some concurrency to behave, so I'm not sold on introducing concurrency as framework feature.

Instead, I think we're better served if we introduced a non-IoC API here:

Extract the logic that actually materializes the definition into a separate function that returns the Manifest, DockerMaterializer, DockerTestnet.

The test can now call this function to materialize a definition, do whatever it wants (using whatever concurrency it desires).

Something needs to have a drop guard here that destroys the materialized testnet, probably the DockerTestnet? Not sure if that's implemented.

I've inverted the inversion, as you suggested, now providing a cleanup function. I don't think it's helpful to maintain 2 patterns, so I migrated prev tests.

It also helped to untangle concurrency from the framework, making it a simple test lib utility.

cryptoAtwill · 2025-01-08T14:20:03Z

fendermint/testing/materializer/src/bencher.rs

+    }
+
+    pub async fn record(&mut self, label: String) {
+        let duration = self.start_time.unwrap().elapsed();


I feel expect here if better if you assume caller know calling "start" should happen first.

I'll revise this API once the reporting summary is more solid.

cryptoAtwill · 2025-01-08T14:27:20Z

fendermint/testing/materializer/src/concurrency/mod.rs

+                    Ok(bencher) => (Some(bencher), None),
+                    Err(err) => (None, Some(err)),
+                };
+                step_results.lock().await.push(TestResult {


if this triggers a lot of threads, then everyone is waiting on this as well, also as step_results gets big, so allocation might take some time. Just curious, if step_results are not updated at all, will there be a big difference?

This is unlikely to be a bottleneck, it can only impose a small delay in the after-test lifecycle of the future, which isn't recorded nor is time sensitive. But I'll double check that once I'll get to high max concurrency figures.

If this slows down then maybe tokio::sync::mpsc might be a better solution. You can then collect all the messages after all steps has finished.

cryptoAtwill · 2025-01-08T14:30:27Z

fendermint/testing/materializer/src/concurrency/nonce_manager.rs

+
+#[derive(Default)]
+pub struct NonceManager {
+    nonces: Arc<Mutex<HashMap<H160, U256>>>,


I think this is a bottom neck as well, every address is waiting on the same lock. Maybe this might help: https://github.com/xacrimon/dashmap

Yes, this is just a temporary solution, I was hoping to remove it entirely. If not, I'll optimize it.

I think it would be good to try building in NonceManager from Ethers. Was there any problem with that?

karlem · 2025-01-21T13:12:50Z

I have a suggestion about how to improve the framework design to make it cleaner and more intuitive. While the current implementation works, there are some areas where terminology and structure can be refined to improve clarity and usability. Consider the following approach:

BenchmarkRunner

The BenchmarkRunner should:

Orchestrate the entire benchmarking process.

Execute each BenchmarkStep sequentially within the specified duration limit.

struct BenchmarkRunner {
    steps: Vec<BenchmarkStep>,
    max_duration: Duration,
}

impl BenchmarkRunner {
    fn new(steps: Vec<BenchmarkStep>, max_duration: Duration) -> Self;
    fn run(&self) -> BenchmarkResult;
}

BenchmarkStep

The BenchmarkStep would be similar to the current ExecutionStep, but it would encapsulate a specific test function, making it more modular and allowing different functions to run within a single test.

struct BenchmarkStep<F>
where
    F: Fn(TestInput) -> TestResult + Send + Sync + 'static {
    concurrency: usize,      // Number of concurrent test executions (N)
    run_duration: Duration,  // Execution time duration (in seconds)
    test_fn: Arc<F>,
}

impl<F> BenchmarkStep<F>
where
    F: Fn(TestInput) -> TestResult + Send + Sync + 'static
{
    fn execute(&self, stop_flag: Arc<AtomicBool>) -> StepResult;
}

The stop_flag (using AtomicBool) is used to stop execution gracefully if the overall test time has expired or in case of any other issue. This is just a suggestion—other mechanisms, such as a Signal abstraction, could also be considered.

Test Input and Result

The TestInput structure can remain as it is, without the current Bencher, simplifying the design.

struct TestResult {
    pub test_id: usize,
    pub step_id: usize,
    pub tx_hash: Option<H256>,
    pub tx_tracker: TransactionTracker,
    pub err: Option<anyhow::Error>,
}

TransactionTracker

Instead of the existing Bencher, a TransactionTracker can be introduced to provide a clearer API. The current API has the potential for errors if start is forgotten, leading to incorrect results.

The new method should automatically set the submission time to ensure correct tracking without requiring manual intervention.

struct TransactionTracker {
    submission_time: Instant,
    mempool_time: Option<Instant>,
    block_time: Option<Instant>,
}

impl TransactionTracker {
    fn new() -> Self;
    fn mark_mempool(&mut self);
    fn mark_block(&mut self);
    fn get_mempool_latency(&self) -> Option<Duration>;
    fn get_block_latency(&self) -> Option<Duration>;
}

StepResult

The StepResult should pre-calculate average latencies and other useful statistics for each step, making it equivalent to the current StepSummary.

struct StepResult {
    step_id: usize,
    tests: Vec<TestResult>,
    avg_mempool_latency: Duration,
    avg_block_latency: Duration,
    // Additional useful statistics for the step
}

impl StepResult {
    fn new(results: Vec<TestResult>) -> Self;
}

Execution Engine

The execution engine should support concurrent execution of the test function for a specified duration, allowing precise control over execution time.

fn run_concurrent<F>(concurrency: usize, run_duration: Duration, test_fn: F, stop_flag: Arc<AtomicBool>)
where
    F: Fn(TestInput) -> TestResult + Send + Sync + 'static;

The stop_flag ensures the execution stops when the total benchmark duration is reached or when other termination conditions occur.

BenchmarkResult

The BenchmarkResult serves as the overall execution summary, aggregating results from all benchmark steps.

struct BenchmarkResult {
    steps: Vec<StepResult>,
}

Conclusion

This revised design primarily improves terminology and clarity, making the framework more cohesive and intuitive. The key benefits of the proposed approach include:

Encapsulation: Each BenchmarkStep holds its own test function, making it easier to run varied tests within a single benchmark.

Clarity: Replacing Bencher with TransactionTracker simplifies the API and eliminates potential misuses.

Intuitive Structure: The separation of responsibilities across BenchmarkRunner, BenchmarkStep, and TransactionTracker makes the design easier to understand and maintain.

Overall, this proposal aligns closely with the current design but improves cohesion, intuitiveness, and robustness.

karlem

This is the first major review batch (1/2). Tomorrow, a smaller set of reviews will follow.

Outstanding reviews:

The tests in benches.rs
Thoroughly review summary.rs

karlem · 2025-01-21T11:09:41Z

fendermint/testing/materializer/src/concurrency/nonce_manager.rs

+
+#[derive(Default)]
+pub struct NonceManager {
+    nonces: Arc<Mutex<HashMap<H160, U256>>>,


I think it would be good to try building in NonceManager from Ethers. Was there any problem with that?

karlem · 2025-01-21T13:19:16Z

fendermint/testing/materializer/src/concurrency/mod.rs

+                    Ok(bencher) => (Some(bencher), None),
+                    Err(err) => (None, Some(err)),
+                };
+                step_results.lock().await.push(TestResult {


If this slows down then maybe tokio::sync::mpsc might be a better solution. You can then collect all the messages after all steps has finished.

karlem · 2025-01-21T13:38:03Z

fendermint/testing/materializer/src/concurrency/mod.rs

+    let mut results = Vec::new();
+    for (step_id, step) in cfg.steps.iter().enumerate() {
+        let semaphore = Arc::new(Semaphore::new(step.max_concurrency));
+        let mut handles = Vec::new();


I think using FuturesUnordered might be beneficial here? Since we do not care about order.

karlem · 2025-01-21T13:43:44Z

fendermint/testing/materializer/src/concurrency/mod.rs

+
+pub async fn execute<F>(cfg: config::Execution, test_factory: F) -> Vec<Vec<TestResult>>
+where
+    F: Fn(TestInput) -> Pin<Box<dyn Future<Output = anyhow::Result<TestOutput>> + Send>>,


Would it be possible to get rid of the Pin<Box<dyn Future....>? Maybe Fut: Future<Output = anyhow::Result<TestOutput>> + Send, or something like that?

karlem · 2025-01-21T13:47:28Z

fendermint/testing/materializer/src/concurrency/mod.rs

+        let step_results = Arc::new(tokio::sync::Mutex::new(Vec::new()));
+        let execution_start = Instant::now();
+        loop {
+            if execution_start.elapsed() > step.duration {


maybe?

while execution_start.elapsed() < step.duration {

karlem · 2025-01-21T18:26:22Z

fendermint/testing/materializer/src/concurrency/reporting/dataset.rs

+    sorted_data.sort_by(|a, b| a.partial_cmp(b).unwrap());
+
+    let count = sorted_data.len();
+    let mean: f64 = sorted_data.iter().sum::<f64>() / count as f64;


let (sum, min, max) = data.iter().fold((0.0, f64::INFINITY, f64::NEG_INFINITY), |(sum, min, max), &x| { (sum + x, sum::min(min, x), sum::max(max, x)) }); let mean = sum / count;

karlem · 2025-01-21T18:28:14Z

fendermint/testing/materializer/src/concurrency/reporting/dataset.rs

+    let max = *sorted_data.last().unwrap();
+    let min = *sorted_data.first().unwrap();
+
+    let percentile_90_index = ((count as f64) * 0.9).ceil() as usize - 1;


Just to make sure it won't panic. Maybe something like this might be useful?

let percentile_90_index = ((count * 0.9).ceil() as usize).min(data.len() - 1);

karlem · 2025-01-21T18:32:14Z

fendermint/testing/materializer/src/concurrency/reporting/dataset.rs

+    let percentile_90_index = ((count as f64) * 0.9).ceil() as usize - 1;
+    let percentile_90 = sorted_data[percentile_90_index];
+
+    Metrics {


I think it might be beneficial to use an external library like statrs to handle these calculations, especially if we plan to introduce more complex ones in the future.

karlem · 2025-01-21T18:32:34Z

fendermint/testing/materializer/src/concurrency/reporting/dataset.rs

+    }
+}
+
+pub fn calc_metrics(data: Vec<f64>) -> Metrics {


NIT: calculate_metrics

karlem · 2025-01-21T18:38:36Z

fendermint/testing/materializer/src/concurrency/reporting/summary.rs

+        blocks: HashMap<u64, Block<H256>>,
+        results: Vec<Vec<TestResult>>,
+    ) -> Self {
+        let step_txs = Self::map_results_to_txs(&results);


NIT:txs_by_steps

karlem · 2025-01-22T20:23:54Z

fendermint/testing/materializer/tests/docker_tests/benches.rs

+                .await
+                .unwrap();
+            tx = tx.gas(gas_estimation);
+            assert!(gas_estimation <= max_tx_gas_limit);


Should this fail to whole test run?

karlem · 2025-01-22T20:26:42Z

fendermint/testing/materializer/tests/docker_tests/benches.rs

+            }
+            input.bencher.mempool();
+
+            let receipt = pending


Maybe we should add a timeout here in case it isn't included?

karlem · 2025-01-22T21:12:36Z

fendermint/testing/materializer/src/concurrency/reporting/summary.rs

+        text_tables::render(&mut io::stdout(), data).unwrap();
+    }
+
+    fn map_results_to_txs(results: &[Vec<TestResult>]) -> Vec<Vec<H256>> {


Maybe extract_step_transaction_hashes?

karlem · 2025-01-22T21:13:20Z

fendermint/testing/materializer/src/concurrency/reporting/summary.rs

+            })
+            .collect()
+    }
+


Comment might be useful?
/// Group blocks by the "latest" step that contributed a TX to that block.

karlem · 2025-01-22T21:20:31Z

fendermint/testing/materializer/src/concurrency/reporting/tps.rs

+    let offset = 1; // The first block is skipped for lacking a block interval.
+    for i in offset..blocks.len() {
+        let prev_block = &blocks[i - 1];
+        let curr_block = &blocks[i];


for window in blocks.windows(2) { let prev = &window[0]; let curr = &window[1]; ... }

seems cleaner.

karlem · 2025-01-22T21:27:22Z

fendermint/testing/materializer/src/concurrency/reporting/tps.rs

+        let interval = curr_block.timestamp.saturating_sub(prev_block.timestamp);
+
+        if interval.le(&U256::zero()) {
+            continue;
+        }


let interval = curr.timestamp.saturating_sub(prev.timestamp); if interval.is_zero() { continue; }

this is correct as the subtraction which saturates at zero.

karlem · 2025-01-22T21:48:53Z

Both reviews (2/2) are now complete. That should be all for now :)

drahnr · 2025-02-05T15:09:24Z

fendermint/testing/materializer/src/concurrency/signal.rs

+// Copyright 2022-2024 Protocol Labs
+// SPDX-License-Identifier: Apache-2.0, MIT
+
+pub struct Signal(tokio::sync::Semaphore);


Uses a loom::sync::Mutex internally and is hence not signal safe, since pthread_mutex_lock is not signal safe. Please correct me if I am wrong.

i was just browsing code base, but i was curious what is your concern here, e.g what signal safety means

it is generally working pattern to hold blocking locks that guard short sections executed in async context. blocking primitives are used consistently in the tokio codebase, one of the examples https://github.com/tokio-rs/tokio/blob/4b3da20c9847b202cf110f7b7772fd4674edaecf/tokio/src/sync/barrier.rs#L142-L148 , and some info here https://tokio.rs/tokio/tutorial/shared-state under Tasks, threads, and contention paragraph

specifically in semaphore it guards a section that doesn't yield by itself, and should be very fast to complete (i expect that to be sub 1us), so preemption by os is very unlikely.

that said, there is actually no waiting in this wrapper

i guess you meant that if this is used directly in the signal interrupt handler, then it doesn't protect from re-entrancy

I think Signal needs some more context. My assumption was handling UNIX signals from doing a single pass.

feat(tests): concurrent materializer tests

86437d7

LePremierHomme requested a review from a team as a code owner December 28, 2024 20:52

LePremierHomme marked this pull request as draft December 28, 2024 20:56

LePremierHomme changed the title ~~feat(tests): concurrent materializer tests~~ [WIP] feat(tests): concurrent materializer tests Dec 28, 2024

cryptoAtwill reviewed Dec 30, 2024

View reviewed changes

fendermint/testing/materializer/tests/docker.rs Outdated Show resolved Hide resolved

raulk requested changes Dec 30, 2024

View reviewed changes

LePremierHomme added 4 commits December 30, 2024 23:15

implement concurrency::execute

9c4d665

add Bencher, wait for block inclusion

c06cc01

add NonceManager

3cbde9d

with_testnet -> make_testnet

2de3e9a

LePremierHomme changed the title ~~[WIP] feat(tests): concurrent materializer tests~~ feat(test): concurrent materializer tests Jan 7, 2025

add license headers

82ed89b

LePremierHomme marked this pull request as ready for review January 7, 2025 12:23

LePremierHomme changed the title ~~feat(test): concurrent materializer tests~~ feat(infra): concurrent materializer tests Jan 7, 2025

LePremierHomme added 3 commits January 7, 2025 14:36

add license headers

8d9041d

clippy

4bdfad7

extract provider and chain_id, add reporting table

2cae29d

cryptoAtwill reviewed Jan 8, 2025

View reviewed changes

LePremierHomme added 7 commits January 10, 2025 10:45

add basic TPS analysis

886a0e2

cargo fmt

d841316

fix test_out_of_order_mempool

034c799

add tps, dataset metrics

d395d1b

rename test func

14f948c

add test_contract_deployment

184ea13

add test_contract_call

2a92492

karlem self-requested a review January 20, 2025 17:48

karlem requested changes Jan 21, 2025

View reviewed changes

karlem reviewed Jan 22, 2025

View reviewed changes

drahnr reviewed Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(infra): concurrent materializer tests #1243

feat(infra): concurrent materializer tests #1243

LePremierHomme commented Dec 28, 2024 •

edited by raulk

Loading

raulk Dec 30, 2024

LePremierHomme Jan 7, 2025

cryptoAtwill Jan 8, 2025

LePremierHomme Jan 10, 2025

cryptoAtwill Jan 8, 2025

LePremierHomme Jan 10, 2025

karlem Jan 21, 2025

cryptoAtwill Jan 8, 2025

LePremierHomme Jan 10, 2025

karlem Jan 21, 2025

karlem commented Jan 21, 2025

karlem left a comment

karlem Jan 21, 2025

karlem Jan 21, 2025

karlem Jan 21, 2025

karlem Jan 21, 2025

karlem Jan 21, 2025

karlem Jan 21, 2025

karlem Jan 21, 2025

karlem Jan 21, 2025

karlem Jan 21, 2025

karlem Jan 21, 2025

karlem Jan 22, 2025

karlem Jan 22, 2025

karlem Jan 22, 2025

karlem Jan 22, 2025 •

edited

Loading

karlem Jan 22, 2025 •

edited

Loading

karlem Jan 22, 2025 •

edited

Loading

karlem commented Jan 22, 2025

drahnr Feb 5, 2025

dshulyak Feb 6, 2025

dshulyak Feb 6, 2025

drahnr Feb 6, 2025

feat(infra): concurrent materializer tests #1243

Are you sure you want to change the base?

feat(infra): concurrent materializer tests #1243

Conversation

LePremierHomme commented Dec 28, 2024 • edited by raulk Loading

Progress

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlem commented Jan 21, 2025

Conclusion

karlem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlem Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

karlem Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

karlem Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

karlem commented Jan 22, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LePremierHomme commented Dec 28, 2024 •

edited by raulk

Loading

karlem Jan 22, 2025 •

edited

Loading

karlem Jan 22, 2025 •

edited

Loading

karlem Jan 22, 2025 •

edited

Loading