Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes based on Griddy Chain #206

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Changes based on Griddy Chain #206

wants to merge 19 commits into from

Conversation

heemankv
Copy link
Contributor

@heemankv heemankv commented Jan 24, 2025

SNOS concurrency control and resource optimization

Core Changes

1. Orchestrator Concurrency Control

  • Added MADARA_ORCHESTRATOR_MAX_CONCURRENT_SNOS_JOBS: Controls parallel SNOS jobs per runner
  • Added MADARA_ORCHESTRATOR_UNIQUE_ID: Unique identifier for each runner pod

2. Job Processing Tracking

  • Runners record their unique ID in MongoDB job metadata
  • Prevents duplicate job processing through metadata checks
  • Enhanced job distribution across multiple runners

3. Consistency Improvements

  • Increased spawn_consumer interval: 500ms → 2s
  • Prevents race conditions in multi-pod job pickup operations
  • Ensures consistent job assignment across runners

4. Memory Optimization

  • Explicit scope blocks for immediate memory release
  • Chunked file processing for large data

@heemankv heemankv self-assigned this Jan 24, 2025
@coveralls
Copy link

coveralls commented Jan 24, 2025

Coverage Status

coverage: 67.173% (+0.09%) from 67.086%
when pulling 3c795aa on chain/griddy
into afb9afe on main.

Copy link
Contributor Author

@heemankv heemankv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self review done, lgtm

.env.example Outdated
@@ -81,6 +82,9 @@ MADARA_ORCHESTRATOR_HOST= # Server host
MADARA_ORCHESTRATOR_PORT= # Server port

#### SERVICE ####

MADARA_ORCHESTRATOR_UNIQUE_ID= # Unique ID for the orchestrator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used?


/// The maximum number of SNOS jobs to process concurrently.
#[arg(env = "MADARA_ORCHESTRATOR_MAX_CONCURRENT_SNOS_JOBS", long)]
pub max_concurrent_snos_jobs: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not do an optional u64 direcrtly?

@@ -419,4 +419,34 @@ impl Database for MongoDb {
ORCHESTRATOR_METRICS.db_calls_response_time.record(duration.as_secs_f64(), &attributes);
Ok(jobs)
}

#[tracing::instrument(skip(self, limit), fields(function_type = "db_call"), ret, err)]
async fn get_jobs_by_statuses_and_orchestrator_id(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the purpose is to differentiate between pods, I don't think it's a good design as discussed before.

return Ok(true);
}

let jobs = config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be stored in the db, we can use a semaphore for this. it's simpler imo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these changes needed anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants