-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we need to support get_all with a wildcard parent_id? #1616
Comments
My position on stuff like this has always been: tests are specifications. If tests don't fail you can remove whatever you want :) |
It turns out there actually are two tests that cover |
Are you sure it's not a feature that we added to support the quota features? |
Can you give us pointers to the tests? Which PR / blame ? Any mention in CHANGELOG? Are those tests present in the original cliquet repo? |
I found its usage by searching "pattern matching" :) kinto/kinto/plugins/accounts/views.py Lines 82 to 83 in 5904085
I remember now that we suffered when implementing the accounts parent id stuff. Each account entry is isolated using There might other ways to implement this, and I don't think the «account admin» feature is important enough to justify having lower perfs on |
I also found why I implemented it in the first place: https://github.com/Kinto/kinto-webpush/pull/4/files#diff-737c25e4e87da6befa4a70fef60cc91aR38 |
Check this out. I messed with my (Note: In this database every single row belongs to the same First I run the original query from WITH collection_filtered AS (
SELECT id, last_modified, data, deleted
FROM records
WHERE parent_id LIKE '/buckets/build-hub/collections/%'
AND collection_id = 'record'
AND NOT deleted
),
total_filtered AS (
SELECT COUNT(id) AS count_total
FROM collection_filtered
WHERE NOT deleted
),
paginated_records AS (
SELECT DISTINCT id
FROM collection_filtered
)
SELECT count_total,
a.id, as_epoch(a.last_modified) AS last_modified,
MD5(jsonb_pretty(a.data)) AS data
FROM paginated_records AS p JOIN collection_filtered AS a ON (a.id = p.id),
total_filtered
ORDER BY last_modified DESC
LIMIT 10; and I get:
Note the two identical rows with ID Next, change that SQL to NOT do a WITH collection_filtered AS (
SELECT id, last_modified, data, deleted
FROM records
WHERE parent_id LIKE '/buckets/build-hub/collections/%'
AND collection_id = 'record'
AND NOT deleted
),
total_filtered AS (
SELECT COUNT(id) AS count_total
FROM collection_filtered
WHERE NOT deleted
),
paginated_records AS (
SELECT id
FROM collection_filtered
)
SELECT count_total,
a.id, as_epoch(a.last_modified) AS last_modified,
MD5(jsonb_pretty(a.data)) AS data
FROM paginated_records AS p JOIN collection_filtered AS a ON (a.id = p.id),
total_filtered
ORDER BY last_modified DESC
LIMIT 10; Now you get this:
What the hell?! 4 records with ID Compare this with my prototype in #1615 select id, as_epoch(last_modified) AS last_modified,
MD5(jsonb_pretty(data)) AS data
from records
WHERE parent_id LIKE '/buckets/build-hub/collections/%'
ORDER BY last_modified DESC
LIMIT 10; YOu get:
Just like the original query (the one currently in master, first in this comment). Two identical rows with ID If you really want to get distinct ID, last_modified, data combos you need to run this query: select DISTINCT id, as_epoch(last_modified) AS last_modified,
MD5(jsonb_pretty(data)) AS data
from records
WHERE parent_id LIKE '/buckets/build-hub/collections/%'
ORDER BY last_modified DESC
LIMIT 10; Then you get:
All truly distinct rows. |
What's weird about this is that the |
Another (perhaps unimportant) reason why I'm pessimistic towards the use of
The results:
1.8 seconds versus. 1.3 milliseconds. |
Yikes! The above comment was on my little
14 seconds versus 0.2 milliseconds. |
Ok we need to fix that, thanks for investigating ❤️ |
"fixed" :) |
Thanks for digging this up, I would never have thought to look here. In both of these use cases, the intent is to support some kind of scoping (typically by user) but having some kind of special functionality to which the scoping doesn't apply. In these circumstances you use Naively, I guess I would have assumed that we use the permission backend to do this kind of scoping, because it seems cleaner and more powerful. But I guess doing that is pretty tricky because permissions are inherited -- if you have I will open a PR to document this a little better. Thanks! |
This issue title is "Do we need to support get_all with a wildcard parent_id?" So I guess the answer to that is Yes. I uncovered that if you remove the Something that's still a weird is that if you have two identical rows in If you want to I can attempt to add a test that simulates that. I.e. that you get records in a list that are indistinguishable. I.e.
|
I think it's clear that we don't have robust tests on the wildcard matching. I think it would be good to add some. I think it would be prudent to figure out what the correct behavior is and write tests that reflect that correct behavior before updating with the query. |
@glasserc It sucked that removing the It's more pressing to me that, today,
If I then query for: [
{"id": "peter", "last_modified": "2018-05-01T12:00:00", "data": {}},
{"id": "peter", "last_modified": "2018-05-01T12:00:00", "data": {}},
] (it's implicit that I don't need to know the returned records' collection_id because I just specified that in my I think what I'm saying is that, let's think about that first and then we can either fix this or add a test case. |
Yes, I think we should think about that too. That's why I wrote we should figure out what the correct behavior should be and write tests that reflect that. What do you think the correct behavior should be? |
Because parent_id with wildcards can match different objects that have the same ID, this old use of records indexed by IDs to get the union of a bunch of records no longer works. Instead, rely on the fact that the same record, matched multiple times, is the same object in memory, and use its memory location as its unique ID. This is kind of a hack, but I couldn't think of a better solution. Although this is one obvious place where we rely on every object in a given "result set" having a unique ID, there may be others hidden elsewhere. Refs: Kinto#1616
Because parent_id with wildcards can match different objects that have the same ID, this old use of records indexed by IDs to get the union of a bunch of records no longer works. Instead, rely on the fact that the same record, matched multiple times, is the same object in memory, and use its memory location as its unique ID. This is kind of a hack, but I couldn't think of a better solution. Although this is one obvious place where we rely on every object in a given "result set" having a unique ID, there may be others hidden elsewhere. Refs: Kinto#1616
Because parent_id with wildcards can match different objects that have the same ID, this old use of records indexed by IDs to get the union of a bunch of records no longer works. Instead, rely on the fact that the same record, matched multiple times, is the same object in memory, and use its memory location as its unique ID. This is kind of a hack, but I couldn't think of a better solution. Although this is one obvious place where we rely on every object in a given "result set" having a unique ID, there may be others hidden elsewhere. Refs: Kinto#1616
Because parent_id with wildcards can match different objects that have the same ID, this old use of records indexed by IDs to get the union of a bunch of records no longer works. Instead, rely on the fact that the same record, matched multiple times, is the same object in memory, and use its memory location as its unique ID. This is kind of a hack, but I couldn't think of a better solution. Although this is one obvious place where we rely on every object in a given "result set" having a unique ID, there may be others hidden elsewhere. Refs: Kinto#1616
Because parent_id with wildcards can match different objects that have the same ID, this old use of records indexed by IDs to get the union of a bunch of records no longer works. Instead, rely on the fact that the same record, matched multiple times, is the same object in memory, and use its memory location as its unique ID. This is kind of a hack, but I couldn't think of a better solution. Although this is one obvious place where we rely on every object in a given "result set" having a unique ID, there may be others hidden elsewhere. Refs: Kinto#1616
Because parent_id with wildcards can match different objects that have the same ID, this old use of records indexed by IDs to get the union of a bunch of records no longer works. Instead, rely on the fact that the same record, matched multiple times, is the same object in memory, and use its memory location as its unique ID. This is kind of a hack, but I couldn't think of a better solution. Although this is one obvious place where we rely on every object in a given "result set" having a unique ID, there may be others hidden elsewhere. Refs: Kinto#1616
While investigating #1507, @peterbe discovered that the
DISTINCT
doesn't really make sense unless you have a wildcardparent_id
. He proved this to his own satisfaction by removingDISTINCT
and observing that no tests break.Under what circumstances do we even have a wildcard
parent_id
? Unlikedelete
, where you might want to delete a thing and all its children, I couldn't think of a way to invoke this mechanism using the HTTP API. Indeed, there's a subtle bug in the current query and the fact that nobody has reported it makes me suspect that nobody ever actually uses it. Can we get rid of it?The text was updated successfully, but these errors were encountered: