-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Design notes for integration syncing #1189
Comments
I'm actually not sure about the polymorphic approach outlined above. Ecto docs say that this is not ideal and instead suggest we should define relationships directly; in this case that would be:
Along with:
We wouldn't be able to do I actually kind of like the above because it's pretty self-documenting that the |
The |
One thing discovered here is that
The design as it stands makes changes relatively challenging. Perhaps an easier approach is to, for now, remove the possibility of a This also comes with some design challenges, although they are much less challenging:
A |
@begedin just marked as |
I've created #1190 as a first step |
One thing I'm concerned about (unless I'm missing something) is that we seem to be making an assumption that, for example, we can safely ignore an event tied to a With the example of a github issue, would it not be possible for us to receive I guess we can implicitly discard them due to our sync process in which we take all the data we get in the payload and use it for updating, regardless of event type, but still, it may be possible that there still exists a scenario where we get the two events above out of order, and the one we are processing does not have all the latest data. |
Another concern with explicit instead of polymorphic relationships is what happens when we add other integrations. We'll have to expand our schema significantly. Not sure how that performs when the expansions are additional relationships to other schemas. I do agree that this is the more correct ecto approach, though. |
Yes I prefer the polymorphism but José seemed to indicate that it significantly underperforms due to lack of foreign keys and this table is likely to have tens of millions of rows. |
This would be overcome by using the webhook event as a trigger to retrieve the information from the external API for the given data. This will get more complex with events like label and assignment, but for the events that impact the resource directly I think we are safe with that assumption. |
We would have a sync transaction for each type of internal record we want to sync. For example:
Every sync transaction record, regardless of type, would have a:
Then each type would have type-specific fields, e.g. a
If the event is due to the resource being created, there will not be a conflict. If the resource was created from our own clients, then there is no external GitHub ID yet. The same is true of events coming in from external providers and there is no internal record yet. I'm not yet clear as to whether we should conduct any conflict checking on these event types, but my guess is no. It should likely jump straight to When an event comes in from GitHub we should (using a
We would also need within the logic for updating the given record to check whether the record's updated timestamp is after the transaction's timestamp. If it is, then we need to bubble the changeset validation error and mark the transaction as |
I'd like to figure out naming here of the record we're tracking. What is this even a queue of? My feeling is that it's best described as a |
I'm also starting to wonder whether there should be separate queues for the separate record types. This might actually make more sense to me. For example, we'd have a |
Should this be a queue?
|
Some upsides of the approaches above that I wanted to document, in no particular order:
|
Notes from my end
I'm assuming each transaction here will always involve fetching the latest relevant data from github? Seems that way, especially from this rule, but wanted to make it clear. Also, seems to me like we'll have to be identifying and storing the github installation id into the sync transaction record at the time of initially creating it. We can't really be sure who to authenticate as otherwise, unless I'm missing something. |
Closing in favor of #1368 |
Problem
Even with only GitHub integration, we have been having difficulties with the sync up/down to the external service.
task
andcomment
records are createdOur basic goals:
Each
Event
can have a:direction
-:inbound | :outbound
integration_
fieldsintegration_external_id
- theid
of the integration resource from the external providerintegration_updated_at
- the last updated at timestamp of the integration resource from the external providerintegration_record_id
- theid
of our cached record for the resourceintegration_record_type
- thetype
our cached record for the resource as the table namerecord_
fieldsrecord_id
- theid
of the record for the resource connected to this integrationrecord_type
- thetype
of the record for the resource connected to this integration as the table namecanceled_by
- theid
of theEvent
that canceled this oneduplicate_of
- theid
of theEvent
that this is a duplicate ofignored_for_id
- theid
of the record that caused this event to be ignoredignored_for_type
- thetype
of the record (table name) that caused this event to be ignoredstate
-:queued | :processing | :completed | :errored | :canceled | :ignored | :duplicate | :disabled
We may want our own writes to our own records, even without integrations, to also go through this process. Not sure.
When an event comes in we should:
integration_external_id
where:integration_updated_at
is after our event's last updated timestamp (limit 1):ignored
and stop processing, setignored_for_id
to theid
of the event in thelimit 1
query andignored_for_type
to this event table's nameintegration_updated_at
timestamp for the relevantrecord_
is equal to our event's last updated timestamp (limit 1):duplicate
and stop processing, setduplicate_of
to theid
of the event in thelimit 1
modified_at
timestamp for the relevantrecord_
is after our event's last updated timestamp:ignored
and stop processing, setignored_for_id
to therecord_id
andignored_for_type
to therecord_type
integration_external_id
where:integration_updated_at
is before our event's last updated timestamp:canceled
and setcanceled_by
to theid
of this event:queued
event or:processing
event for theintegration_external_id
:queued
:processing
, create or update the relevant record matchingrecord_id
andrecord_type
through the relationship on the record forintegration_record_id
andintegration_record_type
:completed
, kick off process to look for next:queued
item where theintegration_updated_at
timestamp is the oldestWe would also need within the logic for updating the given record to check whether the record's updated timestamp is after the event's timestamp. If it is, then we need to bubble the changeset validation error and mark the event
:ignored
as above.The text was updated successfully, but these errors were encountered: