Performance of plural endpoint with millions of records #1507

leplatrem · 2018-02-21T09:14:43Z

Reaching a plural endpoint with a million record should be super fast when using pagination (?_limit=10) or filtering. Apparently, it's not the case.

I suspect #1267 to be responsible for the regression.

See mozilla-services/buildhub#350 (840K records)

For anyone interested in tackling this, here is a quick way to fill up a collection with fake records:

$ psql -h localhost -U postgres -w


DO
$$
BEGIN
    FOR i IN 1..100000 LOOP
       INSERT INTO records (id, parent_id, collection_id) VALUES ('id-' || i, '/buckets/a/collections/b', 'record');
    END LOOP;
END;
$$;

The text was updated successfully, but these errors were encountered:

leplatrem · 2018-02-28T14:54:36Z

plural endpoint = GET list of records in a collection

peterbe · 2018-04-24T18:36:19Z

The SELECT it does looks like this:

WITH collection_filtered AS (
    SELECT id, last_modified, data, deleted
      FROM records
     WHERE parent_id = '/buckets/build-hub/collections/releases'
       AND collection_id = 'record'
       AND NOT deleted

),
total_filtered AS (
    SELECT COUNT(id) AS count_total
      FROM collection_filtered
     WHERE NOT deleted
),
paginated_records AS (
    SELECT DISTINCT id
      FROM collection_filtered

)
 SELECT count_total,
       a.id, as_epoch(a.last_modified) AS last_modified, a.data
  FROM paginated_records AS p JOIN collection_filtered AS a ON (a.id = p.id),
       total_filtered
  ORDER BY last_modified DESC
 LIMIT 10;

Even though it's limited to the first 10, the whole query becomes a "beast" for Postgres:

                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=485454.57..485454.60 rows=10 width=80) (actual time=8778.417..8778.419 rows=10 loops=1)
   CTE collection_filtered
     ->  Seq Scan on records  (cost=0.00..197338.38 rows=772692 width=1009) (actual time=0.058..965.993 rows=772688 loops=1)
           Filter: ((NOT deleted) AND (parent_id = '/buckets/build-hub/collections/releases'::text) AND (collection_id = 'record'::text))
           Rows Removed by Filter: 4
   CTE total_filtered
     ->  Aggregate  (cost=16419.71..16419.72 rows=1 width=8) (actual time=2948.909..2948.909 rows=1 loops=1)
           ->  CTE Scan on collection_filtered  (cost=0.00..15453.84 rows=386346 width=32) (actual time=0.002..2874.297 rows=772688 loops=1)
                 Filter: (NOT deleted)
   CTE paginated_records
     ->  HashAggregate  (cost=17385.57..17387.57 rows=200 width=32) (actual time=712.816..878.339 rows=772688 loops=1)
           Group Key: collection_filtered_1.id
           ->  CTE Scan on collection_filtered collection_filtered_1  (cost=0.00..15453.84 rows=772692 width=32) (actual time=0.016..376.150 rows=772688 loops=1)
   ->  Sort  (cost=254308.91..256240.64 rows=772692 width=80) (actual time=8778.415..8778.417 rows=10 loops=1)
         Sort Key: (as_epoch(a.last_modified)) DESC
         Sort Method: top-N heapsort  Memory: 44kB
         ->  Hash Join  (cost=8.52..237611.31 rows=772692 width=80) (actual time=4539.533..8618.027 rows=772688 loops=1)
               Hash Cond: (a.id = p.id)
               ->  CTE Scan on collection_filtered a  (cost=0.00..15453.84 rows=772692 width=72) (actual time=0.060..464.148 rows=772688 loops=1)
               ->  Hash  (cost=6.02..6.02 rows=200 width=40) (actual time=4537.915..4537.915 rows=772688 loops=1)
                     Buckets: 65536 (originally 1024)  Batches: 32 (originally 1)  Memory Usage: 3585kB
                     ->  Nested Loop  (cost=0.00..6.02 rows=200 width=40) (actual time=3661.729..4242.430 rows=772688 loops=1)
                           ->  CTE Scan on total_filtered  (cost=0.00..0.02 rows=1 width=8) (actual time=2948.910..2948.911 rows=1 loops=1)
                           ->  CTE Scan on paginated_records p  (cost=0.00..4.00 rows=200 width=32) (actual time=712.817..1192.142 rows=772688 loops=1)
 Planning time: 0.949 ms
 Execution time: 8847.949 ms

~9 seconds!!

Compare with...

select id, as_epoch(last_modified) AS last_modified, data
from records
WHERE parent_id = '/buckets/build-hub/collections/releases'
  AND collection_id = 'record' and deleted=false
  order by last_modified desc limit 10;

which looks like this:

                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..11.46 rows=10 width=1008) (actual time=1.887..1.970 rows=10 loops=1)
   ->  Index Scan Backward using idx_records_last_modified_epoch on records  (cost=0.42..852510.52 rows=772692 width=1008) (actual time=1.886..1.967 rows=10 loops=1)
         Filter: ((NOT deleted) AND (parent_id = '/buckets/build-hub/collections/releases'::text) AND (collection_id = 'record'::text))
 Planning time: 2.522 ms
 Execution time: 4.152 ms

The count is quite expensive.

SELECT count(*)
  FROM records
 WHERE parent_id = '/buckets/build-hub/collections/releases'
   AND collection_id = 'record'
   AND NOT deleted;

That returns 772,688 here in my database.
And it looks like this explained:

                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=192382.43..192382.44 rows=1 width=8) (actual time=412.275..412.275 rows=1 loops=1)
   ->  Gather  (cost=192382.21..192382.42 rows=2 width=8) (actual time=412.269..412.273 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=191382.21..191382.22 rows=1 width=8) (actual time=408.734..408.734 rows=1 loops=3)
               ->  Parallel Seq Scan on records  (cost=0.00..190577.33 rows=321955 width=0) (actual time=0.852..386.003 rows=257563 loops=3)
                     Filter: ((NOT deleted) AND (parent_id = '/buckets/build-hub/collections/releases'::text) AND (collection_id = 'record'::text))
                     Rows Removed by Filter: 1
 Planning time: 0.763 ms
 Execution time: 414.734 ms

Note, if I create an index...:

create index records_parent_id_collection_id_not_deleted_idx ON records (parent_id, collection_id) where not deleted;

                                                                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=59101.87..59101.88 rows=1 width=8) (actual time=118.455..118.455 rows=1 loops=1)
   ->  Gather  (cost=59101.66..59101.87 rows=2 width=8) (actual time=118.435..118.448 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=58101.66..58101.67 rows=1 width=8) (actual time=114.181..114.181 rows=1 loops=3)
               ->  Parallel Index Only Scan using records_parent_id_collection_id_not_deleted_idx on records  (cost=0.42..57296.77 rows=321955 width=0) (actual time=0.101..92.983 rows=257563 loops=3)
                     Index Cond: ((parent_id = '/buckets/build-hub/collections/releases'::text) AND (collection_id = 'record'::text))
                     Heap Fetches: 18777
 Planning time: 1.149 ms
 Execution time: 122.577 ms

So it takes ~9 seconds. But if you break it up into a count and a plain select it could be 4.152 ms + 414.734 ms. And with the index, it would be 4.152 ms + 122.577 ms instead.

Basically, what takes 9 seconds can take 0.5 seconds if we break it up.

peterbe · 2018-04-24T18:47:48Z

Used best-explain-analyze.py to run the big fat query a bunch of times to see what the best possible time you can get. The result is:

EXECUTION TIME
	BEST    8543.617ms  <-- the important number
	MEAN    8872.852ms
	MEDIAN  8752.921ms
PLANNING TIME
	BEST    0.935ms
	MEAN    1.148ms
	MEDIAN  1.161ms

Meaning, the best it can do (after 10 attempts) is 8543.617ms.

For the record, if I change it from LIMIT 10 to LIMIT 1000 the results are:

EXECUTION TIME
	BEST    8661.977ms  <-- the important number
	MEAN    8952.208ms
	MEDIAN  8906.576ms
PLANNING TIME
	BEST    0.933ms
	MEAN    1.149ms
	MEDIAN  1.019ms

In conclusion, doing LIMIT 10 or LIMIT 1000 doesn't make a big difference.

peterbe · 2018-04-24T18:50:44Z

And in the same vein, if I run the simple select

select id, as_epoch(last_modified) AS last_modified, data
from records
WHERE parent_id = '/buckets/build-hub/collections/releases'
  AND collection_id = 'record' and deleted=false
  order by last_modified desc limit 10;

100 times, I get:

EXECUTION TIME
	BEST    1.483ms  <-- the important number
	MEAN    1.746ms
	MEDIAN  1.728ms
PLANNING TIME
	BEST    0.824ms
	MEAN    0.939ms
	MEDIAN  0.903ms

So, instead of ~4ms the actual number is 1.483ms.

peterbe · 2018-04-24T19:08:21Z

The scary thing about this is that during those 9 seconds, it's really CPU intensive and can cause other more pressing and complex queries to be blocked and build up.

I have a question though: Why the SELECT DISTINCT id FROM collection_filtered part?
Adding that to the simple select so it becomes:

select DISTINCT id, as_epoch(last_modified) AS last_modified, data
from records
WHERE parent_id = '/buckets/build-hub/collections/releases'
  AND collection_id = 'record' and deleted=false
  order by last_modified desc limit 10;

Then you get:

EXECUTION TIME
	BEST    5803.857ms
	MEAN    5955.790ms
	MEDIAN  5901.252ms
PLANNING TIME
	BEST    1.004ms
	MEAN    1.090ms
	MEDIAN  1.070ms

It looks like this:

                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1147472.86..1147472.96 rows=10 width=1008) (actual time=5590.636..5590.695 rows=10 loops=1)
   ->  Unique  (cost=1147472.86..1155199.78 rows=772692 width=1008) (actual time=5590.635..5590.693 rows=10 loops=1)
         ->  Sort  (cost=1147472.86..1149404.59 rows=772692 width=1008) (actual time=5590.634..5590.636 rows=10 loops=1)
               Sort Key: (as_epoch(last_modified)) DESC, id COLLATE "C", data
               Sort Method: external merge  Disk: 772928kB
               ->  Seq Scan on records  (cost=0.00..390511.38 rows=772692 width=1008) (actual time=0.787..1698.278 rows=772688 loops=1)
                     Filter: ((NOT deleted) AND (parent_id = '/buckets/build-hub/collections/releases'::text) AND (collection_id = 'record'::text))
                     Rows Removed by Filter: 4
 Planning time: 1.102 ms
 Execution time: 5925.728 ms

If there are multiple different IDs in that parent_id/collection_id the DISTINCT id won't filter out anything, because they're different. The only time the IDs might be repeated (and DISTINCT id does something) is if the parent_id or collection_id is different. Right?

After all you can't ever have something like this:

buildhub=# select id, parent_id, collection_id from records where parent_id = '/buckets/build-hub/collections/releases' AND collection_id = 'record' and deleted=false order by last_modified desc limit 3;
                 id                  |                parent_id                | collection_id
-------------------------------------+-----------------------------------------+---------------
 firefox_beta_60-0b14rc2_win64_zh-tw | /buckets/build-hub/collections/releases | record
 firefox_beta_60-0b14rc2_win64_vi    | /buckets/build-hub/collections/releases | record
 firefox_beta_60-0b14rc2_win64_vi    | /buckets/build-hub/collections/releases | record

THIS IS IMPOSSIBLE. (because of the index "records_pkey" PRIMARY KEY, btree (id, parent_id, collection_id)). I faked the sample output.

peterbe · 2018-04-24T19:32:55Z

Ah! I get it. If the parent_id is a wildcard expression you will a benefit from the DISTINCT id

kinto/kinto/core/storage/postgresql/__init__.py

Line 625 in c0a08d6

if '*' in parent_id:

So this is what I propose;

Break up the counting from the select. I.e. do two conn.execute(count_query) and conn.execute(select_query).
If the '*' in parent_id is true, use DISTINCT id in the plain select query.

I'm still a bit confused about the DISTINCT at all. If you ask to get all records in parent_id="something*" why should you only get the first one (sorted by last_modified desc)?

Natim · 2018-04-24T20:49:05Z

I'm still a bit confused about the DISTINCT at all. If you ask to get all records in parent_id="something*" why should you only get the first one (sorted by last_modified desc)?

I don't recall but I am pretty sure if you remove that DISTINCT and run the postgresql storage tests one test will fail and explain it.

glasserc · 2018-04-25T18:51:59Z

I am skeptical that separating the COUNT into a separate query will be faster. It means scanning the table two times instead of only once. But please feel free to open a PR with some benchmark numbers!

peterbe · 2018-04-25T19:10:02Z

I am skeptical that separating the COUNT into a separate query will be faster.

The select one doesn't scan the table at all. It just uses an index based on last_modified.

But please feel free to open a PR with some benchmark numbers!

Will do! 🤠

glasserc · 2018-04-25T19:13:22Z

I guess I should also mention that keeping it in one query is probably better for reentrancy, although it may not matter much for this particular use case.

peterbe · 2018-04-26T18:06:32Z

I don't recall but I am pretty sure if you remove that DISTINCT and run the postgresql storage tests one test will fail and explain it.

Tried that. Running make tests didn't fail.

I still thing it's a weird thing to use distinct at all. But I don't want to solve that as part of this issue. This issue can ignore that debate and walk around the "problem".

glasserc · 2018-11-28T15:20:48Z

It seems like we're going to discuss this next week when we're all together in Orlando. In the interests of readiness, could we have some information about what the context/use case here is? It seems like these requests are happening in Buildhub, which we know has a ton of records all in one collection. What kinds of requests are being made? What are they used for?

I want to rule out the risk that we implement a "fix" but we end up still slow because of other requests that end up repeatedly filtering a large collection.

Since we acknowledge that this is largely specific to the pathological use that Buildhub does of Kinto, maybe the right fix is to just add Buildhub-specific indexes to the Buildhub Kinto DB.

peterbe · 2018-11-28T16:46:06Z

Mind you, Buildhub is being slowly replaced with Buildhub2 which doesn't use Kinto. It's a slowly dying project.

One thing I'm curious about is; are there other collections in the wild (I guess we are restricted to knowing about Mozilla projects) that have large number of records per collection?

▶ cat ~/count.sql
select parent_id, collection_id, count(*) as count from records
group by parent_id, collection_id
order by count desc limit 20;

▶ psql buildhub < ~/count.sql
                   parent_id                   | collection_id |  count
-----------------------------------------------+---------------+---------
 /buckets/build-hub/collections/releases       | record        | 1037997
                                               | bucket        |       2
 /buckets/2ca20155-dd42-fa2b-8e18-7102d2c3af79 | collection    |       1
 /buckets/build-hub                            | collection    |       1
(4 rows)

▶ psql workon < ~/count.sql
                            parent_id                            | collection_id | count
-----------------------------------------------------------------+---------------+-------
 /buckets/22cf4aa5-9bf3-2539-da76-a7d382a7c354/collections/todos | record        |    58
 /buckets/396037c3-8d15-495d-2e0a-037068da6dfa/collections/todos | record        |    56
 /buckets/f6d78db1-bcd7-8518-2f94-01545e622769/collections/todos | record        |     8
 /buckets/4a96c0e8-bed0-5c26-dcad-3b0a7619a6a5/collections/todos | record        |     6
                                                                 | bucket        |     5
 /buckets/f6d78db1-bcd7-8518-2f94-01545e622769                   | collection    |     1
 /buckets/396037c3-8d15-495d-2e0a-037068da6dfa                   | collection    |     1
 /buckets/ff27fd9d-5a5b-5d94-ef7c-c7e8103bb25c                   | collection    |     1
 /buckets/4a96c0e8-bed0-5c26-dcad-3b0a7619a6a5                   | collection    |     1
 /buckets/22cf4aa5-9bf3-2539-da76-a7d382a7c354                   | collection    |     1
(10 rows)

If Mozilla's Buildhub1 is a globally unique instance then your point @glasserc is extremely relevant.

By the way, would it be an idea to release a plugin called "kinto-without-counts" or something that basically overrides the SELECT queries as per my naive patches mentioned above? I don't think I would even personally care about using it for Buildhub, under the circumstances, even if it existed.

glasserc · 2018-11-28T17:34:57Z

Per my comments in #1624, I think having two SELECT queries violates abstractions without any real benefit. If the only problem it solves is Buildhub, and if you don't want to use it for Buildhub, I would say no, let's not release it.

I'm open to trying to optimize Buildhub more generally, which is a broader topic, but in order to know how best to attack it, I'd need to have some information about what the context is. What kinds of requests are being made? What are they used for?

peterbe · 2018-11-28T18:58:33Z

I think the gist of #1624 isn't about doing two SELECT queries. It's about doing 1 SELECT * FROM ... when there's a GET request and 1 SELECT COUNT(*) FROM ... when there's a HEAD request.

glasserc · 2018-12-18T14:10:00Z

Is this issue still outstanding, or can we close it now that #1931 landed?

leplatrem · 2018-12-18T14:10:23Z

Yes! Thanks!

leplatrem added the bug label Feb 21, 2018

leplatrem changed the title ~~Performance of plural endpoint without a million of objects~~ Performance of plural endpoint with a million of objects Feb 26, 2018

peterbe added a commit to peterbe/kinto that referenced this issue Apr 26, 2018

optimize postgresql storage get_all, fixes Kinto#1507

fd1e423

peterbe added a commit to peterbe/kinto that referenced this issue Apr 26, 2018

optimize postgresql storage get_all, fixes Kinto#1507

cb61a32

peterbe mentioned this issue Apr 26, 2018

[WIP] Optimize postgresql storage get all fixes 1507 #1615

Closed

6 tasks

peterbe added a commit to peterbe/kinto that referenced this issue Apr 26, 2018

optimize postgresql storage get_all, fixes Kinto#1507

51c7bf9

glasserc mentioned this issue Apr 26, 2018

Do we need to support get_all with a wildcard parent_id? #1616

Closed

peterbe mentioned this issue Apr 30, 2018

Question: Latest Nightly Appbuildid? Hitting 504 gateway timeouts mozilla-services/buildhub#431

Open

This was referenced May 4, 2018

Cleanup Postgres get_all #1622

Merged

What should we do with Total-Records? #1624

Closed

peterbe changed the title ~~Performance of plural endpoint with a million of objects~~ Performance of plural endpoint with millions of objects Nov 28, 2018

peterbe changed the title ~~Performance of plural endpoint with millions of objects~~ Performance of plural endpoint with millions of records Nov 28, 2018

leplatrem closed this as completed Dec 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of plural endpoint with millions of records #1507

Performance of plural endpoint with millions of records #1507

leplatrem commented Feb 21, 2018

leplatrem commented Feb 28, 2018

peterbe commented Apr 24, 2018

peterbe commented Apr 24, 2018

peterbe commented Apr 24, 2018

peterbe commented Apr 24, 2018

peterbe commented Apr 24, 2018

Natim commented Apr 24, 2018

glasserc commented Apr 25, 2018

peterbe commented Apr 25, 2018

glasserc commented Apr 25, 2018

peterbe commented Apr 26, 2018

glasserc commented Nov 28, 2018

peterbe commented Nov 28, 2018

glasserc commented Nov 28, 2018

peterbe commented Nov 28, 2018

glasserc commented Dec 18, 2018

leplatrem commented Dec 18, 2018

Performance of plural endpoint with millions of records #1507

Performance of plural endpoint with millions of records #1507

Comments

leplatrem commented Feb 21, 2018

leplatrem commented Feb 28, 2018

peterbe commented Apr 24, 2018

peterbe commented Apr 24, 2018

peterbe commented Apr 24, 2018

peterbe commented Apr 24, 2018

peterbe commented Apr 24, 2018

Natim commented Apr 24, 2018

glasserc commented Apr 25, 2018

peterbe commented Apr 25, 2018

glasserc commented Apr 25, 2018

peterbe commented Apr 26, 2018

glasserc commented Nov 28, 2018

peterbe commented Nov 28, 2018

glasserc commented Nov 28, 2018

peterbe commented Nov 28, 2018

glasserc commented Dec 18, 2018

leplatrem commented Dec 18, 2018