Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving cancellations in the parallel builder. #353

Open
bertmiller opened this issue Jan 8, 2025 · 0 comments
Open

Improving cancellations in the parallel builder. #353

bertmiller opened this issue Jan 8, 2025 · 0 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@bertmiller
Copy link
Member

bertmiller commented Jan 8, 2025

### Summary
We need a robust way to handle cancellations (or updates) in the parallel builder, where drain_new_orders() returns None. Because sub-components (ConflictFinder, ResultsAggregator, etc.) store partial merges and simulation states, a single order removal is insufficient. Instead, we must fully reset them all, wiping stale references. However, the builder also spawns threads (worker pool, aggregator, block-building), which complicates concurrency.

Goals

  1. Full Reset: When cancellations occur, we must clear:
  • ConflictFinder
  • ConflictTaskGenerator's existing_groups / task_queue
  • ConflictResolvingPool's task_queue
  • SimulationCache
  • ResultsAggregator’s best_results
    Then re-ingest only orders and let the system proceed as if new (e.g. find conflicts, resolve them, build blocks).
  1. Thread Safety: Ensure no data races or partial updates if aggregator/block-building threads hold references.
  2. Maintain High Performance: Avoid large overhead from locking or repeated resets.

Core Considerations

  1. Reset Mechanism: All submodules (conflict finder, aggregator, simulation cache, etc) need a consistent way to clear and re-ingest orders—no partial or stale references.
  2. Concurrency Approach:
    *. Single-Thread Manager: Central struct (ParallelBuilder) mutates data, with worker threads taking only short-lived snapshots. Simple to reset, but limits fully parallel reads.
    *. Multi-Thread Shared State: Modules live behind locks (Arc<Mutex<...>>) or receive message-based commands. More flexible for continuous concurrency, but demands locking or event loops.
  3. Data Usage Pattern: Decide if aggregator/block-building hold references 24/7 (requiring concurrency control) or just use ephemeral tasks (easier single-thread reset).
  4. Performance vs. Simplicity: Locking or message-passing ensures safety for truly parallel modules, but increases complexity and overhead. A single-thread manager is simpler but less concurrent.

Please weigh in on the core considerations.

@bertmiller bertmiller added bug Something isn't working enhancement New feature or request labels Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant