Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ingest/processed/bytes metric #17581

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

neha-ellur
Copy link

@neha-ellur neha-ellur commented Dec 17, 2024

A new metric ingest/processed/bytes has been introduced to track the total number of bytes processed during ingestion tasks, including native batch ingestion, streaming ingestion, and multi-stage query (MSQ) ingestion tasks. This metric helps provide a unified view of data processing across different ingestion pathways.

Key changed/added classes in this PR

This metric was added in three key ingestion task classes:

  • IndexTask: A sequential ingestion task. The processed bytes were retrieved from the RowIngestionMetersTotals object (buildSegmentsMeters) and emitted directly after segment publication.
  • ParallelIndexSupervisorTask: A task that supervises parallel ingestion. Processed bytes were aggregated from subtasks' ingestion metrics (RowIngestionMetersTotals) and emitted safely with exception handling to ensure robustness.
  • SeekableStreamIndexTaskRunner: A runner for ingestion tasks that consume data from seekable streams (e.g., Kafka). The processed bytes were calculated based on the size of the data buffers (e.g., ByteEntity buffers) being processed for each record. The metric was emitted for each processed record.
  • MsqContollerImpl: The ingest/processed/bytes metric is emitted by aggregating bytes processed across all stages and workers during MSQ task execution. This includes fetching counters from the CounterSnapshotsTree, summing up bytes from all input channels for each worker and stage using a stream-based aggregation logic and emitting the aggregated bytes as the ingest/processed/bytes metric for the entire MSQ task.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Dec 18, 2024
@kfaraz kfaraz self-requested a review December 18, 2024 03:52
@apache apache deleted a comment from neha-ellur Dec 18, 2024
@@ -329,6 +331,27 @@ public void run(final QueryListener queryListener) throws Exception
}
// Call onQueryComplete after Closer is fully closed, ensuring no controller-related processing is ongoing.
queryListener.onQueryComplete(reportPayload);

long totalProcessedBytes = reportPayload.getCounters().copyMap().values().stream()
Copy link
Contributor

@cryptoe cryptoe Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a wrong place to put this logic .
Ingest/processed/bytes seems like a ingestion only metric no ?
If that is the case, we should emit the metric only if the query is an ingestion query.

you could probably expose a method here https://github.com/apache/druid/blob/9bebe7f1e5ab0f40efbff620769d0413c943683c/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java#L517 saying emit summary metrics and have the task report and the query passed to it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the logic

@cryptoe
Copy link
Contributor

cryptoe commented Jan 7, 2025

Also there are some static check failures which need to be looked at.

@neha-ellur
Copy link
Author

neha-ellur commented Jan 9, 2025

Also there are some static check failures which need to be looked at.

@cryptoe fixed

try {
emitMetric(toolbox.getEmitter(), "ingest/processed/bytes", rowStatsForRunningTasks.getProcessedBytes());
}
catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need a try-catch here. rowStatsForRunningTasks should always be non-null afaict.

.setMetric("ingest/processed/bytes", bytesProcessed)
);
}
catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need a try-catch.

Comment on lines 669 to 671
ServiceMetricEvent.builder()
.setDimension("taskId", task.getId())
.setDimension("dataSource", task.getDataSource())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, you could use IndexTaskUtils.setTaskDimensions() to set all task related dimensions.

Copy link
Author

@neha-ellur neha-ellur Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about setting the .setMetric("ingest/processed/bytes", bytesProcessed) ? @kfaraz
Does this look right?

IndexTaskUtils.setTaskDimensions(new ServiceMetricEvent.Builder(), task);
toolbox.getEmitter().emit(
  ServiceMetricEvent.builder().setMetric("ingest/processed/bytes", bytesProcessed)
);

Copy link
Contributor

@kfaraz kfaraz Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you would need to set that separately. Something like:

final ServiceMetricEvent.Builder metricBuilder = new ServiceMetricEvent.Builder();
IndexTaskUtils.setTaskDimensions(metricBuilder, task);
toolbox.getEmitter().emit(metricBuilder.setMetric("segment/txn/failure", 1));

Using IndexTaskUtils.setTaskDimensions() helps avoid duplication of the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated

}).sum()).sum()
: 0;

log.info("Total processed bytes: %d, query: %s", totalProcessedBytes, querySpec.getQuery());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a debug log.

Suggested change
log.info("Total processed bytes: %d, query: %s", totalProcessedBytes, querySpec.getQuery());
log.debug("Processed bytes[%d] for query[%s].", totalProcessedBytes, querySpec.getQuery());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants