Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drupal packages always appearing in changes.json API #1490

Closed
drumm opened this issue Nov 19, 2024 · 14 comments
Closed

Drupal packages always appearing in changes.json API #1490

drumm opened this issue Nov 19, 2024 · 14 comments
Labels

Comments

@drumm
Copy link

drumm commented Nov 19, 2024

We are currently polling https://packagist.org/metadata/changes.json for changes to packages we are mirroring. The code for this is at https://gitlab.com/drupal-infrastructure/package-signing/packagist-signed/-/blob/main/bin/update-needed.php?ref_type=heads#L47-50

We are making requests approximately every 20 minutes like

https://packagist.org/metadata/changes.json?since=17320522510000
https://packagist.org/metadata/changes.json?since=17320540870000
https://packagist.org/metadata/changes.json?since=17320551340000
https://packagist.org/metadata/changes.json?since=17320569580000

Every time the response has updates for drupal/drupal drupal/core drupal/core-recommended

https://github.com/drupal/core-recommended for example is not updated that frequently. The last 2 webhooks sent from GitHub over 24 hours ago
image

I can’t think of anything else on our end that could be triggering these 3 packages to update so frequently. Updating our mirror every 20 minutes of these 3 packages is taking a decent amount of resources, that I’d like to reduce.

Is there anything we can do better for this situation, or is something about these 3 packages somehow looping eternally in Packagist.org internals?

@Seldaek
Copy link
Member

Seldaek commented Nov 28, 2024

Are you sure it updates every time? Here are the last 20 update jobs for drupal/core for example.

+---------------------+---------------------+-----------+-----------------+
| createdAt           | completedAt         | status    | type            |
+---------------------+---------------------+-----------+-----------------+
| 2024-11-28 00:01:34 | 2024-11-28 00:03:12 | completed | package:updates |
| 2024-11-28 00:01:33 | 2024-11-28 00:01:36 | completed | package:updates |
| 2024-11-27 23:54:33 | 2024-11-27 23:55:37 | completed | package:updates |
| 2024-11-27 23:54:32 | 2024-11-27 23:54:35 | completed | package:updates |
| 2024-11-27 23:37:22 | 2024-11-27 23:38:00 | completed | package:updates |
| 2024-11-27 23:37:21 | 2024-11-27 23:37:24 | completed | package:updates |
| 2024-11-27 23:29:47 | 2024-11-27 23:30:10 | completed | package:updates |
| 2024-11-27 23:29:46 | 2024-11-27 23:29:49 | completed | package:updates |
| 2024-11-27 23:22:22 | 2024-11-27 23:22:55 | completed | package:updates |
| 2024-11-27 23:22:20 | 2024-11-27 23:22:23 | completed | package:updates |
| 2024-11-27 23:15:04 | 2024-11-27 23:15:07 | completed | package:updates |
| 2024-11-27 23:15:04 | 2024-11-27 23:17:51 | completed | package:updates |
| 2024-11-27 22:54:20 | 2024-11-27 22:54:22 | completed | package:updates |
| 2024-11-27 22:54:16 | 2024-11-27 22:54:19 | completed | package:updates |
| 2024-11-27 22:47:24 | 2024-11-27 22:47:51 | completed | package:updates |
| 2024-11-27 22:47:22 | 2024-11-27 22:47:26 | completed | package:updates |
| 2024-11-27 18:29:58 | 2024-11-27 18:32:23 | completed | package:updates |
| 2024-11-27 18:29:56 | 2024-11-27 18:29:59 | completed | package:updates |
| 2024-11-27 18:22:51 | 2024-11-27 18:22:53 | completed | package:updates |
| 2024-11-27 18:22:51 | 2024-11-27 18:22:59 | completed | package:updates |
+---------------------+---------------------+-----------+-----------------+

It is updating quite frequently yes.. not quite every 20min but yes, often.

@Seldaek
Copy link
Member

Seldaek commented Nov 28, 2024

Now why is it updating? I'm not sure, in theory it should just be github pushes, or one of the maintainers using the API to manually trigger updates.

@drumm
Copy link
Author

drumm commented Nov 30, 2024

This looks like it lines up with GitHub push webhooks starting at 2024-11-27 22:47:22 in your logs. With a 5 hour offset, I am in US eastern time.

image

This would have been a committer reviewing a batch of merges and pushing that frequently. There is nothing for 2024-11-28 in my screenshot. Do you have a similar 29 hour gap?

@Seldaek
Copy link
Member

Seldaek commented Dec 1, 2024

Yeah I think so:

| 2024-11-29 05:06:12 | 2024-11-29 05:06:14 | completed | package:updates |
| 2024-11-29 05:06:04 | 2024-11-29 05:06:07 | completed | package:updates |
                 [ here is a 29h gap ]
| 2024-11-28 00:01:34 | 2024-11-28 00:03:12 | completed | package:updates | <= not sure why a second one here
| 2024-11-28 00:01:33 | 2024-11-28 00:01:36 | completed | package:updates | <= matches your 27th 19:01:33 (+5..)
| 2024-11-27 23:54:33 | 2024-11-27 23:55:37 | completed | package:updates | 
| 2024-11-27 23:54:32 | 2024-11-27 23:54:35 | completed | package:updates |
| 2024-11-27 23:37:22 | 2024-11-27 23:38:00 | completed | package:updates |
| 2024-11-27 23:37:21 | 2024-11-27 23:37:24 | completed | package:updates |
| 2024-11-27 23:29:47 | 2024-11-27 23:30:10 | completed | package:updates |
| 2024-11-27 23:29:46 | 2024-11-27 23:29:49 | completed | package:updates |
| 2024-11-27 23:22:22 | 2024-11-27 23:22:55 | completed | package:updates |
| 2024-11-27 23:22:20 | 2024-11-27 23:22:23 | completed | package:updates |
| 2024-11-27 23:15:04 | 2024-11-27 23:17:51 | completed | package:updates |
| 2024-11-27 23:15:04 | 2024-11-27 23:15:07 | completed | package:updates |
| 2024-11-27 22:54:20 | 2024-11-27 22:54:22 | completed | package:updates |
| 2024-11-27 22:54:16 | 2024-11-27 22:54:19 | completed | package:updates |

@drumm
Copy link
Author

drumm commented Dec 2, 2024

That's looking good and consistent.

Throughout that time period, our script checking https://packagist.org/metadata/changes.json was still seeing drupal/core being updated at least once every 10 minutes, it is on an every-10-minute cron.

For example, as I'm writing this, I requested https://packagist.org/metadata/changes.json?since=17331548900000 and got a response containing this within a couple minutes.

{"type":"update","package":"drupal\/drupal","time":1733155265},{"type":"update","package":"drupal\/drupal~dev","time":1733155265},{"type":"update","package":"drupal\/core","time":1733155265},{"type":"update","package":"drupal\/core~dev","time":1733155265},{"type":"update","package":"drupal\/core-recommended","time":1733155265},{"type":"update","package":"drupal\/core-recommended~dev","time":1733155265}

1733155265 is Dec 2 11:01:05 EST 2024 and the webhook deliveries for drupal/core are:

  • 2024-12-02 10:42:08
  • 2024-12-02 11:11:53

Right now, a committer is committing at a good pace, but this still doesn't line up as well as the above timestamps.

Also interesting, the time is the exact same for all three packages. Our webhook deliveries for drupal/core-recommended are skewed by a few seconds:

  • 2024-12-02 10:41:49
  • 2024-12-02 11:11:04

All three of these packages are derived from the same repository with subtree splits. Our splitting script doesn't coordinate pushes at the exact same time.

So it seems like something in Packagist.org is updating all three at the same timestamp for the changes.json API, unrelated to webhook pushes.

@Seldaek
Copy link
Member

Seldaek commented Dec 10, 2024

OK so I added some tracking of what triggered an update job, and this reveals something interesting maybe? :)

+----------------------------+---------------------+---------------------+-----------+-----------------+-----------------------------------------------------------+
| name                       | createdAt           | completedAt         | status    | type            | j.payload->>'$.source'                                    |
+----------------------------+---------------------+---------------------+-----------+-----------------+-----------------------------------------------------------+
| drupal/cms                 | 2024-12-10 15:18:49 | 2024-12-10 15:18:51 | completed | package:updates | manual_hook (Drupal-Infrastructure @ /api/update-package) |
| drupal/drupal              | 2024-12-10 15:17:29 | 2024-12-10 15:17:33 | completed | package:updates | github_official_hook                                      |
| drupal/legacy-project      | 2024-12-10 14:58:26 | 2024-12-10 15:01:18 | completed | package:updates | manual_hook (Drupal-Infrastructure @ /api/update-package) |
| drupal/legacy-project      | 2024-12-10 14:58:25 | 2024-12-10 14:58:28 | completed | package:updates | github_official_hook                                      |
| drupal/recommended-project | 2024-12-10 14:58:23 | 2024-12-10 15:01:18 | completed | package:updates | manual_hook (Drupal-Infrastructure @ /api/update-package) |
| drupal/recommended-project | 2024-12-10 14:58:23 | 2024-12-10 14:58:25 | completed | package:updates | github_official_hook                                      |
| drupal/core                | 2024-12-10 14:58:20 | 2024-12-10 15:01:18 | completed | package:updates | manual_hook (Drupal-Infrastructure @ /api/update-package) |
| drupal/core                | 2024-12-10 14:58:19 | 2024-12-10 14:58:22 | completed | package:updates | github_official_hook                                      |

Seems like we get pushes from our packagist-configured github hook and then just a bit after (or in one instance a minute later) another one from the Drupal-Infrastructure user using API.

The second update gets queued tho and retried after a bit I think which leads to the completedAt being usually a few minutes behind the first update.

@Seldaek
Copy link
Member

Seldaek commented Dec 10, 2024

Also note that it does not seem to be doing it for drupal/drupal (no manual hook) or drupal/cms (that one has no official hook firing but only the API one 🤷🏻‍♂️ )

@drumm
Copy link
Author

drumm commented Dec 11, 2024

Thanks for the additional tracking!

The doubled up ones must be https://bitbucket.org/drupalorg-infrastructure/subtree-splitter/src/ee8b917a1806f2ff682116e279e6f5781089f3f0/src/Command/PushCommand.php#lines-105. There's some logic to untangle, and I won't make any deployments for a few days while I'm traveling. Something along the lines of:

  • Leave the package creation logic, for when Drupal Core adds a new component
  • I believe this is shared code for core subtree splits and other uses. The core subtree splits with a GitHub repository can skip the update event, since GitHub webhooks took care of this
  • Or we could deconfigure GitHub webhooks and just rely on us sending the API requests. That would isolate ourselves from a GitHub webhook outage or delay, and avoid thinking about this code; however, making these packages as normal as possible is good and predictable

To explain the ones behaving correctly already:

  • drupal/drupal is a basic mirror without anything special, so the webhook works normally
  • drupal/cms does not have a GitHub repository, its only on our self-hosted GitLab, so we use the API. I believe the code doing this is the shared code for other uses above

So the hope is removing the double updates normalizes all of the behavior for the 3 packages in https://packagist.org/metadata/changes.json

@Seldaek
Copy link
Member

Seldaek commented Dec 11, 2024

Yeah I agree, hopefully that'll solve it. If it's not that then I currently have no clue what else could be the cause.

@drumm
Copy link
Author

drumm commented Jan 2, 2025

I deployed my first approach on December 23, https://bitbucket.org/drupalorg-infrastructure/subtree-splitter/commits/85ded723114a42c0b87ea45e44bc4ad63cf4c7f0

This has improved for us, except it’s not correlated to anything I've spotted so far. drupal/core-recommended settled down to expected behavior on November 22, reverted back to as this issue describes December 5, and got better on December 13.

More weirdly, our staging instance is currently only looping on drupal/drupal, and production is looping only on drupal/core. So somehow the two instances are interpreting responses from https://packagist.org/metadata/changes.json differently. Both instances are in the same AWS availability zone, so I expect they’re hitting the same caches/etc on Packagist.org. This seems much more like a client-side issue with this behavior. I'll have to add more logging to see what might be going on.

@drumm
Copy link
Author

drumm commented Jan 2, 2025

Actually, I'm seeing the same behavior in https://packagist.org/metadata/changes.json. Using a recent timestamp, 5 minutes ago, the same three are updated - drupal/drupal, drupal/core, drupal/core-recommended. What I’m seeing is the process that updates the package list to match is missing packages for some reason, so it isn’t looking for the right packages.

drupal/core-recommended has't actually had a push since drupal/core-recommended@2a47cc9 two weeks ago. So should not have had any recent webhook deliveries, or other API calls to initiate an update on Packagist.org. And the last update time on https://packagist.org/packages/drupal/core-recommended matches that.

Any chance something like the number of downloads being updated is bumping the last update time used by https://packagist.org/metadata/changes.json ?

@Seldaek
Copy link
Member

Seldaek commented Jan 8, 2025

The only way I would see that a file gets added to the metadata/changes.json is if its metadata changes, or you trigger an update via the PUT /packages/{name} endpoint passing ?manualUpdate=1 (which would set $forceDump = true)

if (
!$forceDump
&& file_exists($path)
&& file_get_contents($path) === $contents
// files dumped before then are susceptible to be out of sync, so force them all to be dumped once more at least
&& filemtime($path) >= 1606210609
) {
return;
}

@Seldaek
Copy link
Member

Seldaek commented Jan 8, 2025

In the metadata files, there are no number of downloads.. the only change would be from a new commit to a branch, new tag/release, or a new security advisory being published.

I guess we'd need to save some metadata file e.g. https://packagist.org/p2/drupal/core-recommended.json as a pretty-printed json file, then if it changes diff it and figure out what changed if anything.

@drumm
Copy link
Author

drumm commented Jan 16, 2025

I've been making incremental changes over the past few weeks on our end, https://gitlab.com/drupal-infrastructure/package-signing/packagist-signed/-/commits/main?ref_type=heads. I don't think I made any substantive changes to the main cron scanning for changes in https://packagist.org/metadata/changes.json, the were certainly times when I was causing issues making it do a full sync without checking Packagist.org's API. But maybe I did bump something to get it into line. Also there's a possibility our double-requests to update the packages in Packagist.org were still getting through internal queues in Packagist.org.

@drumm drumm closed this as completed Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants