-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Temporary chrome deployment #4440
Open
vitorguidi
wants to merge
50
commits into
master
Choose a base branch
from
chrome
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We are currently logging progression when unpacking, but we're missing the last log. This can be useful to monitor what's going on in CF.
### Motivation We currently lack metrics for build retrieval and unpacking times. This PR adds that, with granularity by fuzz target and job type. There are two different implementations for build downloading/unpacking: - In the Build class, from which RegularBuild, SplitTargetBuild, FuchsiaBuild and SymbolizedBuild inherit the downloading/unpacking behavior - In the CustomBuild class, which implements its own logic There are two possible cases for downloading/unpacking: clusterfuzz either downloads the whole build and unpacks it locally, or unpacks it remotely. This is the case for all build types except CustomBuild, which does not perform remote unpacking. For build retrieval over http, we do not track download time. For all the other cases, it suffices to keep track of start/finish time for download and unpacking. Finally, a _build_type is added to the constructor of the Build class, from which all other inherit. This is used to track the build type (debug or release), and is only mutated by SymbolizedBuild when attempting to fetch a debug build. Part of #4271
For non engine fuzzers, we do not need to have the full list of fuzz targets, since those builds are only containing the APP_NAME. For that reason, and when the build is fully unpacked, lazily fetch the fuzzing targets only when requested by the user of the class. This change will tremendously speed up the unpacking step for some of our fuzzers, see https://docs.google.com/document/d/1OepfVcuG2XNXLxgZIXVwgE-uOhX9RO5RfH0TSg_wmPU/. Co-authored-by: jonathanmetzman <[email protected]>
### Motivation We currently lack awareness on how old builds are during fuzz task. This PR implements that, by making the assumption that the Last Update Time metadata field in GCS is a good proxy for build age. [Documentation reference](https://cloud.google.com/storage/docs/json_api/v1/objects#resource) ### Approach Symbolized and custom builds do not matter, thus all builds of interest will be fetched from ```build_manager.setup_regular_build```. Logic for collecting all bucket paths and the latest revision was refactored, so that ```setup_regular_build``` can also figure out the latest revision for a given build and conditionally emit the proposed metric. ### Testing strategy !Todo: test this for fuzz, analyze, progression Locally ran tasks, with instructions from #4343 and #4345 , and verified the _emmit_build_age_metric function gets invoked and produces sane output. Commands used: ``` fuzz libFuzzer libfuzzer_asan_log4j2 ``` ![image](https://github.com/user-attachments/assets/66937297-20ec-44cf-925e-0004a905c92e) ``` progression 4992158360403968 libfuzzer_asan_qt ``` ![image](https://github.com/user-attachments/assets/0e1f1199-d1d8-4da5-814e-8d8409d1f806) ``` analyze 4992158360403968 libfuzzer_asan_qt (disclaimer: build revision was overriden mid flight to force a trunk build, since this testcase was already tied to a crash revision) ``` ![image](https://github.com/user-attachments/assets/dd3d5a60-36a1-4a9e-a21b-b72177ffdecd) Part of #4271
### Motivation Small changes for the build retrieval metric, as requested per Chrome folks after initially using the feature: * Adding the time for listing fuzz targets as a step * Added the total retrieval duration * Tracking metric as minutes, for readability Part of #4271
### Motivation The Chrome team has no easy visibility into how many manually uploaded test cases flake or successfully reproduce. This PR implements a counter metric to track that. There are three possible outcomes, each represented by a string label: 'reproduces', 'one_timer' and 'does_not_reproduce' Part of #4271
…4381) ### Motivation Once a testcase is generated (or manually uploaded), followup tasks (analyze/progression) are started. This happens by publishing to a pubsub queue, both for the manually uploaded case, and for the fuzzer generated case. If for any reason the messages are not processed, the testcase gets stuck. To get better visibility into these stuck testcases, the UNTRIAGED_TESTCASE_AGE metric is introduced, to pinpoint how old these testcases that have not yet been triaged are(more precisely, gone through analyze/regression/impact/progression tasks). ### Attention points Testcase.timestamp mutates in analyze task: https://github.com/google/clusterfuzz/blob/6ed80851ad0f6f624c5b232b0460c405f0a018b5/src/clusterfuzz/_internal/bot/tasks/utasks/analyze_task.py#L589 This makes it unreliable as a source of truth for testcase creation time. To circumvent that, a new ```created``` field is added to the Testcase entity, from which we can derive the correct creation time. Since this new field will only apply for testcases created after this PR is merged, Testcase.timestamp will be used instead to calculate the testcase age when the new field is missing. ### Testing strategy Ran the triage cron locally, and verified the codepath for the metric is hit and produces sane output (reference testcase: 4505741036158976). ![image](https://github.com/user-attachments/assets/6281b44f-768a-417e-8ec1-763f132c8181) Part of #4271
Chrome security shepherds manually upload testcases through appengine, triggering analyze task and, in case of a legitimate crash, the followup progression tasks: * Minimize * Analyze * Impact * Regression * Cleanup cronjob, when updating a bug to inform the user that all above stages were finished This PR adds instrumentation to track the time elapsed between the user upload, and the completion of the above events. * TestcaseUploadMetadata.timestamp was being mutated on the preprocess stage for analyze task. This mutation was removed, so that this entity can be the source of truth for when a testcase was in fact uploaded by the user. * The job name could be retrieved from the JOB_NAME env var within the uworker, however this does not work for the cleanup use case. For this reason, the job name is fetched from datastore instead. * The ```query_testcase_upload_metadata``` method was moved from analyze_task.py to a helpers file, so it could be reused across tasks and on the cleanup cronjob Every task mentioned was executed locally, with a valid uploaded testcase. The codepath for the metric emission was hit and produced the desired output, both for the tasks and the cronjob. Part of #4271
### Motivation Some cumulative distribution metrics (build age, retrieval, testcase age, testcase triage duration) are misbehaving and capping at 1. This PR intends to aid in debugging that.
This reverts commit 5404212.
### Motivation Cumulative distribution metrics from the monitoring initiative were incorrectly set to use the fixed width bucketer, and/or width=0.05 and max_buckets=20. This caused percentile metrics to cap at 1, which was wrong behavior. This PR attempts to fix that by moving them all to Geometric Bucketer, without the aforementioned limits. It also reverts #4429 , since it apparently broke triage.py in chrome. Part of #4271
https://crbug.com/380707237 is caused by this discrepency between master and chrome branches.
Cherry picking #4441 onto chrome temp branch Co-authored-by: Ali HIJAZI <[email protected]>
…ng everything (#4486) Now that we are lazily checking for fuzzing targets, it makes sense to allow remote unpacking even when unpacking the full archive. Furthermore, it seems that remote unpacking performances are much higher than local unpacking on CF bots, so this might improve overall performances of the build_manager.
### Motivation This merges #4489, #4458 and #4483 to the chrome temporary deployment branch The purpose is to have task error rate metrics, and log what old testcases are polluting the testcase upload metrics, so we can figure out if a purge is necessary --------- Co-authored-by: jonathanmetzman <[email protected]>
This merges #4494 into the temporary chrome deployment
#4503) Fixes the following error: ``` AttributeError 'ProtoType' object has no attribute 'DESCRIPTOR' Failed to flush metrics: 'ProtoType' object has no attribute 'DESCRIPTOR' Traceback (most recent call last): File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/metrics/monitor.py", line 118, in _flush_metrics metric.monitoring_v3_time_series(series, labels, start_time, end_time, File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/metrics/monitor.py", line 350, in monitoring_v3_time_series self._set_value(point.value, value) File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/metrics/monitor.py", line 404, in _set_value point.int64_value = value ^^^^^^^^^^^^^^^^^ File "/mnt/scratch0/clusterfuzz/src/third_party/proto/message.py", line 935, in __setattr__ pb_value = marshal.to_proto(pb_type, value) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/scratch0/clusterfuzz/src/third_party/proto/marshal/marshal.py", line 229, in to_proto proto_type.DESCRIPTOR.has_options ^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'ProtoType' object has no attribute 'DESCRIPTOR' ```
cherry-pick #4411
This adds 9 archives from Fuzzilli to the general test-input archives used by fuzzers on Clusterfuzz. The Fuzzilli-side archives are refreshed every few days. We'll add a freshness metric in a follow up. This was tested locally with butler. See bug: https://crbug.com/362963886
Cherry-pick of #4526
Cherry-pick of #4531
Merging #4372 into chrome
Otherwise new issues may error out because it's a required field. Cherry pick: #4379 Co-authored-by: Oliver Chang <[email protected]>
They are unsupported on the new tracker and aren't needed anyway. Example: https://issues.oss-fuzz.com/issues/379066972#comment1 Cherrypick of #4409 Co-authored-by: maflcko <[email protected]>
In application logs for debugging and in testcase comments for user understanding. Bug: https://crbug.com/380264190 Cherry pick of #4467 Co-authored-by: Titouan Rigoudy <[email protected]>
Custom binaries are often used emulators (i.e. the binary being used by the host). This code follows the same strategy used for the ADB binary above in `get_adb_path()`. Cherry pick: #4518 Co-authored-by: Mark Teffeteller <[email protected]>
This allows manipulating the weight of individual FuzzerJob entries in the database. Complements the `fuzz-target set` command. Cherry pick: #4392 Co-authored-by: Titouan Rigoudy <[email protected]>
…#4537) ### Motivation oss_fuzz_build_status and oss_fuzz_apply_ccs are broken due to buganizer rejecting emails without associated accounts. This serves as a workaround so at least some notifications can suceeed. (google/oss-fuzz#12536) ### Why this works Buganizer throws 400 errors listing out all rejected emails in the exception, so we can just parse the non gaia emails from that and complete the execution by ignoring those. ### Testing strategy Ran both cronjobs locally with the proposed new code, and verified they ran to completion. Cherry pick of #4406 Co-authored-by: Vitor Guidi <[email protected]>
Allow instances to specify the GCS bucket location for data bundle buckets in `project.yaml` as a new key: `data_bundle_bucket_location`. This will allow creating regional buckets instead of using the default `US` multi-region which results in high data transfer costs in Chrome's instance. Cherry pick of: #4479 Co-authored-by: Titouan Rigoudy <[email protected]>
There were some hardcoded constants that needed to be updated for the Google Issue Tracker migration. We'll also stop adding the 'Project' custom field for now, because it's hard to propagate custom field values and it's not essential for build failure issues. This should hopefully fix google/oss-fuzz#12536 (comment). Cherry-pick of #4378 Co-authored-by: Oliver Chang <[email protected]>
Introduces regex-based filtering directly in the API to improve performance and reduce number of calls to ab api server. Cherry pick: #4297 Co-authored-by: aditya-wazir <[email protected]>
The device check is updated to use find instead of a literal match so that sanitized version of the devices (e.g: cheetah_hwasan) can also be used --------- Cherry pick: #4256 Co-authored-by: svasudevprasad <[email protected]>
Merging #4530 into the Chrome branch
This adds all PRs related to the terraform dashboard.
Co-authored-by: Peter Boström <[email protected]>
Co-authored-by: Paul Semel <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.