Releases: apollographql/router
v1.60.0-rc.1
1.60.0-rc.1
v2.0.0-preview.5
2.0.0-preview.5
v1.59.2
Important
This release contains important fixes which address resource utilization regressions which impacted Router v1.59.0 and v1.59.1. These regressions were in the form of:
- A small baseline increase in memory usage; AND
- Additional per-request CPU and memory usage for queries which included references to abstract types with a large number of implementations
If you have enabled Distributed query plan caching, this release contains changes which necessarily alter the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.
🐛 Fixes
Improve performance of query hashing by using a precomputed schema hash (PR #6622)
The router now uses a simpler and faster query hashing algorithm with more predictable CPU and memory usage. This improvement is enabled by using a precomputed hash of the entire schema, rather than computing and hashing the subset of types and fields used by each query.
For more details on why these design decisions were made, please see the PR description
By @IvanGoncharov in #6622
Fix increased memory usage in sysinfo
since Router 1.59.0 (PR #6634)
In version 1.59.0, Apollo Router started using the sysinfo
crate to gather metrics about available CPUs and RAM. By default, that crate uses rayon
internally to parallelize its handling of system processes. In turn, rayon creates a pool of long-lived threads.
In a particular benchmark on a 32-core Linux server, this caused resident memory use to increase by about 150 MB. This is likely a combination of stack space (which only gets freed when the thread terminates) and per-thread space reserved by the heap allocator to reduce cross-thread synchronization cost.
This regression is now fixed by:
- Disabling
sysinfo
’s use ofrayon
, so the thread pool is not created and system processes information is gathered in a sequential loop. - Making
sysinfo
not gather that information in the first place since Router does not use it.
By @SimonSapin in #6634
v1.60.0-rc.0
1.60.0-rc.0
v1.59.2-rc.0
1.59.2-rc.0
v2.0.0-preview.4
2.0.0-preview.4
v1.59.1
Important
This release was impacted by a resource utilization regression which was fixed in v1.59.2. See the release notes for that release for more details. As a result, we recommend using v1.59.2 rather than v1.59.1 or v1.59.0.
🐛 Fixes
Fix transmitted header value for Datadog priority sampling resolution (PR #6017)
The router now transmits correct values of x-datadog-sampling-priority
to downstream services.
Previously, an x-datadog-sampling-priority
of -1
was incorrectly converted to 0
for downstream requests, and 2
was incorrectly converted to 1
. When propagating to downstream services, this resulted in values of USER_REJECT
being incorrectly transmitted as AUTO_REJECT
.
Enable accurate Datadog APM metrics (PR #6017)
The router supports a new preview feature, the preview_datadog_agent_sampling
option, to enable sending all spans to the Datadog Agent so APM metrics and views are accurate.
Previously, the sampler option in telemetry.exporters.tracing.common.sampler
wasn't Datadog-aware. To get accurate Datadog APM metrics, all spans must be sent to the Datadog Agent with a psr
or sampling.priority
attribute set appropriately to record the sampling decision.
The preview_datadog_agent_sampling
option enables accurate Datadog APM metrics. It should be used when exporting to the Datadog Agent, via OTLP or Datadog-native.
telemetry:
exporters:
tracing:
common:
# Only 10 percent of spans will be forwarded from the Datadog agent to Datadog. Experiment to find a value that is good for you!
sampler: 0.1
# Send all spans to the Datadog agent.
preview_datadog_agent_sampling: true
Using these options can decrease your Datadog bill, because you will be sending only a percentage of spans from the Datadog Agent to Datadog.
Important
- Users must enable
preview_datadog_agent_sampling
to get accurate APM metrics. Users that have been using recent versions of the router will have to modify their configuration to retain full APM metrics. - The router doesn't support
in-agent
ingestion control. - Configuring
traces_per_second
in the Datadog Agent won't dynamically adjust the router's sampling rate to meet the target rate. - Sending all spans to the Datadog Agent may require that you tweak the
batch_processor
settings in your exporter config. This applies to both OTLP and Datadog native exporters.
Learn more by reading the updated Datadog tracing documentation for more information on configuration options and their implications.
Fix non-parent sampling (PR #6481)
When the user specifies a non-parent sampler the router should ignore the information from upstream and use its own sampling rate.
The following configuration would not work correctly:
exporters:
tracing:
common:
service_name: router
sampler: 0.00001
parent_based_sampler: false
All spans are being sampled.
This is now fixed and the router will correctly ignore any upstream sampling decision.
By @BrynCooke in #6481
v1.59.1-rc.0
1.59.1-rc.0
v1.59.0
Important
This release was impacted by a resource utilization regression which was fixed in v1.59.2. See the release notes for that release for more details. As a result, we recommend using v1.59.2 rather than v1.59.1 or v1.59.0.
Important
If you have enabled distributed query plan caching, updates to the query planner in this release will result in query plan caches being regenerated rather than reused. On account of this, you should anticipate additional cache regeneration cost when updating to this router version while the new query plans come into service.
🚀 Features
General availability of native query planner
The router's native, Rust-based, query planner is now generally available and enabled by default.
The native query planner achieves better performance for a variety of graphs. In our tests, we observe:
- 10x median improvement in query planning time (observed via
apollo.router.query_planning.plan.duration
) - 2.9x improvement in router’s CPU utilization
- 2.2x improvement in router’s memory usage
Note: you can expect generated plans and subgraph operations in the native
query planner to have slight differences when compared to the legacy, JavaScript-based query planner. We've ascertained these differences to be semantically insignificant, based on comparing ~2.5 million known unique user operations in GraphOS as well as
comparing ~630 million operations across actual router deployments in shadow
mode for a four month duration.
The native query planner supports Federation v2 supergraphs. If you are using Federation v1 today, see our migration guide on how to update your composition build step. Subgraph changes are typically not needed.
The legacy, JavaScript, query planner is deprecated in this release, but you can still switch
back to it if you are still using Federation v1 supergraph:
experimental_query_planner_mode: legacy
Note: The subgraph operations generated by the query planner are not
guaranteed consistent release over release. We strongly recommend against
relying on the shape of planned subgraph operations, as new router features and
optimizations will continuously affect it.
By @sachindshinde, @goto-bus-stop, @duckki, @TylerBloom, @SimonSapin, @dariuszkuc, @lrlna, @clenfest, and @o0Ignition0o.
Ability to skip persisted query list safelisting enforcement via plugin (PR #6403)
If safelisting is enabled, a router_service
plugin can skip enforcement of the safelist (including the require_id
check) by adding the key apollo_persisted_queries::safelist::skip_enforcement
with value true
to the request context.
Note: this doesn't affect the logging of unknown operations by the
persisted_queries.log_unknown
option.
In cases where an operation would have been denied but is allowed due to the context key existing, the attribute persisted_queries.safelist.enforcement_skipped
is set on the apollo.router.operations.persisted_queries
metric with value true
.
Add fleet awareness plugin (PR #6151)
A new fleet_awareness
plugin has been added that reports telemetry to Apollo about the configuration and deployment of the router.
The reported telemetry include CPU and memory usage, CPU frequency, and other deployment characteristics such as operating system and cloud provider. For more details, along with a full list of data captured and how to opt out, go to our
data privacy policy.
By @jonathanrainer, @nmoutschen, @loshz in #6151
Add fleet awareness schema metric (PR #6283)
The router now supports the apollo.router.instance.schema
metric for its fleet_detector
plugin. It has two attributes: schema_hash
and launch_id
.
By @loshz and @nmoutschen in #6283
Support client name for persisted query lists (PR #6198)
The persisted query manifest fetched from Apollo Uplink can now contain a clientName
field in each operation. Two operations with the same id
but different clientName
are considered to be distinct operations, and they may have distinct bodies.
The router resolves the client name by taking the first from the following that exists:
- Reading the
apollo_persisted_queries::client_name
context key that may be set by arouter_service
plugin - Reading the HTTP header named by
telemetry.apollo.client_name_header
, which defaults toapollographql-client-name
If a client name can be resolved for a request, the router first tries to find a persisted query with the specified ID and the resolved client name.
If there is no operation with that ID and client name, or if a client name cannot be resolved, the router tries to find a persisted query with the specified ID and no client name specified. This means that existing PQ lists that don't contain client names will continue to work.
To learn more, go to persisted queries docs.
🐛 Fixes
Fix coprocessor empty body object panic (PR #6398)
Previously, the router would panic if a coprocessor responds with an empty body object at the supergraph stage:
{
... // other fields
"body": {} // empty object
}
This has been fixed in this release.
Note: the previous issue didn't affect coprocessors that responded with formed responses.
By @BrynCooke in #6398
Ensure cost directives are picked up when not explicitly imported (PR #6328)
With the recent composition changes, importing @cost
results in a supergraph schema with the cost specification import at the top. The @cost
directive itself is not explicitly imported, as it's expected to be available as the default export from the cost link. In contrast, uses of @listSize
to translate to an explicit import in the supergraph.
Old SDL link
@link(
url: "https://specs.apollo.dev/cost/v0.1"
import: ["@cost", "@listSize"]
)
New SDL link
@link(url: "https://specs.apollo.dev/cost/v0.1", import: ["@listSize"])
Instead of using the directive names from the import list in the link, the directive names now come from SpecDefinition::directive_name_in_schema
, which is equivalent to the change we made on the composition side.
By @tninesling in #6328
Fix query hashing algorithm (PR #6205)
The router includes a schema-aware query hashing algorithm designed to return the same hash across schema updates if the query remains unaffected. This update enhances the algorithm by addressing various corner cases to improve its reliability and consistency.
Fix typo in persisted query metric attribute (PR #6332)
The apollo.router.operations.persisted_queries
metric reports an attribute when a persisted query was not found.
Previously, the attribute name was persisted_quieries.not_found
, with one i
too many. Now it's persisted_queries.not_found
.
By @goto-bus-stop in #6332
Fix telemetry instrumentation using supergraph query selector (PR #6324)
Previously, router telemetry instrumentation that used query selectors could log errors with messages such as this is a bug and should not happen
.
These errors have now been fixed, and configurations with query selectors such as the following work properly:
telemetry:
exporters:
metrics:
common:
views:
# Define a custom view because operation limits are different than the default latency-oriented view of OpenTelemetry
- name: oplimits.*
aggregation:
histogram:
buckets:
- 0
- 5
- 10
- 25
- 50
- 100
- 500
- 1000
instrumentation:
instruments:
supergraph:
oplimits.aliases:
value:
query: aliases
type: histogram
unit: number
description: "Aliases for an operation"
oplimits.depth:
value:
query: depth
type: histogram
unit: number
description: "Depth for an operation"
oplimits.height:
value:
query: height
type: histogram
unit: number
description: "Height for an operation"
oplimits.root_fields:
value:
query: root_fields
type: histogram
unit: number
description: "Root fields for an operation"
By @BNJ...
v1.59.0-rc.0
1.59.0-rc.0