-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sliding Sync: Pre-populate room data for quick filtering/sorting #17512
Merged
erikjohnston
merged 147 commits into
develop
from
madlittlemods/sliding-sync-pre-populate-room-meta-data
Aug 29, 2024
Merged
Sliding Sync: Pre-populate room data for quick filtering/sorting #17512
erikjohnston
merged 147 commits into
develop
from
madlittlemods/sliding-sync-pre-populate-room-meta-data
Aug 29, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MadLittleMods
commented
Aug 8, 2024
synapse/storage/schema/main/delta/87/01_sliding_sync_memberships.sql
Outdated
Show resolved
Hide resolved
…he joined rooms table
This way if the row exists, we can rely on the information in it. And only use a fallback for rows that don't exist.
…oom-meta-data Conflicts: synapse/handlers/sliding_sync/__init__.py
erikjohnston
approved these changes
Aug 29, 2024
Woo 🎉 🚀 🎉 🥳 🎈 |
erikjohnston
deleted the
madlittlemods/sliding-sync-pre-populate-room-meta-data
branch
August 29, 2024 15:09
@erikjohnston Super grateful that you were able to try this PR out multiple times on your own server to find all of the weird old corners in the data that we still need to deal with today 🙇 Thank you @erikjohnston and @reivilibre for all of the discussion so we can ensure that the data is reliable to use before and after the gradual migration is fully complete! |
erikjohnston
added a commit
that referenced
this pull request
Aug 29, 2024
This was referenced Aug 29, 2024
erikjohnston
pushed a commit
that referenced
this pull request
Aug 30, 2024
… sync tables (#17635) Fix outlier re-persisting causing problems with sliding sync tables Follow-up to #17512 When running on `matrix.org`, we discovered that a remote invite is first persisted as an `outlier` and then re-persisted again where it is de-outliered. The first the time, the `outlier` is persisted with one `stream_ordering` but when persisted again and de-outliered, it is assigned a different `stream_ordering` that won't end up being used. Since we call `_calculate_sliding_sync_table_changes()` before `_update_outliers_txn()` which fixes this discrepancy (always use the `stream_ordering` from the first time it was persisted), we're working with an unreliable `stream_ordering` value that will possibly be unused and not make it into the `events` table.
erikjohnston
added a commit
that referenced
this pull request
Sep 1, 2024
This was referenced Sep 1, 2024
erikjohnston
added a commit
that referenced
this pull request
Sep 1, 2024
Based on #17629 Utilizing the new sliding sync tables added in #17512 for fast acquisition of rooms for the user and filtering/sorting. --------- Co-authored-by: Eric Eastwood <[email protected]>
This was referenced Sep 3, 2024
erikjohnston
added a commit
that referenced
this pull request
Sep 5, 2024
3 tasks
erikjohnston
added a commit
that referenced
this pull request
Sep 10, 2024
3 tasks
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pre-populate room data for quick filtering/sorting in the Sliding Sync API
Spawning from #17450 (comment)
This PR is acting as the Synapse version
N+1
step in the gradual migration being tracked by #17623Adding two new database tables:
sliding_sync_joined_rooms
: A table for storing room meta data that the local server is still participating in. The info here can be shared across allMembership.JOIN
. Keyed on(room_id)
and updated when the relevant room current state changes or a new event is sent in the room.sliding_sync_membership_snapshots
: A table for storing a snapshot of room meta data at the time of the local user's membership. Keyed on(room_id, user_id)
and only updated when a user's membership in a room changes.Also adds background updates to populate these tables with all of the existing data.
We want to have the guarantee that if a row exists in the sliding sync tables, we are able to rely on it (accurate data). And if a row doesn't exist, we use a fallback to get the same info until the background updates fill in the rows or a new event comes in triggering it to be fully inserted. This means we need a couple extra things in place until we bump
SCHEMA_COMPAT_VERSION
and run the foreground update in theN+2
part of the gradual migration. For context on why we can't rely on the tables without these things see [1].stream_ordering
of thesliding_sync_joined_rooms
table (compare to max-stream_ordering
of theevents
table). Forsliding_sync_membership_snapshots
, we can compare to the max-stream_ordering
oflocal_current_membership
sliding_sync_joined_rooms
/sliding_sync_membership_snapshots
tables since any new events sent in rooms would have also needed to be written to the sliding sync tables. For example a new event needs to bumpevent_stream_ordering
insliding_sync_joined_rooms
table or some state in the room changing (like the room name). Or another example of someone's membership changing in a room affectingsliding_sync_membership_snapshots
.events
/local_current_membership
). The rooms that need recalculating are added to thesliding_sync_joined_rooms_to_recalculate
table.All of this extra functionality can be removed once the
SCHEMA_COMPAT_VERSION
is bumped with support for the new sliding sync tables so people can no longer downgrade (theN+2
part of the gradual migration).[1]
For
sliding_sync_joined_rooms
, since we partially insert rows as state comes in, we can't rely on the existence of the row for a givenroom_id
. We can't even rely on looking at whether the background update has finished. There could still be partial rows from when someone reverted their Synapse version after the background update finished, had some state changes (or new rooms), then upgraded again and more state changes happen leaving a partial row.For
sliding_sync_membership_snapshots
, we insert items as a whole except for theforgotten
columnso we can rely on rows existing and just need to always use a fallback for theWe could have an out-of-date membership from when someone reverted their Synapse version. (same problems as outlined forforgotten
data. We can't use theforgotten
column in the table for the same reasons above aboutsliding_sync_joined_rooms
.sliding_sync_joined_rooms
above)Discussed in an internal meeting
TODO
stream_ordering
/bump_stamp
sender
so we can filterLEAVE
memberships and distinguish from kicks.tombstone
state to help address Sliding Sync: Addis_tombstoned: false
filter in the defaults #17540tombstone_successor_room_id
forgotten
status to avoid extra lookup/table-join onroom_memberships
N+1
: (this PR) BumpSCHEMA_VERSION
to87
. Add new tables and background update to backfill all rows. Since this is a new table, we don't have to add anyNOT VALID
constraints and validate them when the background update completes. Read from new tables with a fallback in cases where the rows aren't filled in yet.N+2
: BumpSCHEMA_VERSION
to88
and bumpSCHEMA_COMPAT_VERSION
to87
because we don't want people to downgrade and miss writes while they are on an older version. Add a foreground update to finish off the backfill so we can read from new tables without the fallback. Application code can now rely on the new tables being populated.Dev notes
Reference:
N + 3
: Read from columnfull_user_id
rather thanuser_id
of tablesprofiles
anduser_filters
matrix-org/synapse#15649 (comment)rooms.creator
field that needed a background update to backfill data, Populaterooms.creator
field for easy lookup matrix-org/synapse#10697rooms.room_version
that needed a background update to backfill data, Addrooms.room_version
column matrix-org/synapse#6729room_stats_state.room_type
that needed a background update to backfill data, Implement MSC3827: Filtering of/publicRooms
by room type matrix-org/synapse#13031insertion_events
,insertion_event_edges
,insertion_event_extremities
,batch_events
current_state_events
updated insynapse/storage/databases/main/events.py
Dealing with
portdb
(synapse/_scripts/synapse_port_db.py
), #17512 (comment)SQL queries:
Both of these are equivalent and work in SQLite and Postgres
Options 1:
Option 2:
If we don't need the
membership
condition, we could use:Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.(run the linters)