Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request only nearby and recent business records #35

Open
kikuomax opened this issue Aug 12, 2021 · 26 comments
Open

Request only nearby and recent business records #35

kikuomax opened this issue Aug 12, 2021 · 26 comments
Assignees

Comments

@kikuomax
Copy link
Member

kikuomax commented Aug 12, 2021

Requesting all of the business records online will incur too much network traffic and AWS charge. We have to limit the request.

Basic strategy,

  • Request business records that fit in the screen
  • Request at most N recent business records
@kikuomax kikuomax self-assigned this Aug 12, 2021
@kikuomax
Copy link
Member Author

This means we have to able to query business records by geolocation.
This article may help.

@kikuomax
Copy link
Member Author

S2 Geometry Library looks great.
But we have to consider compatibility between the map tile coordinate system supported by Mapbox (maplibre).

@kikuomax
Copy link
Member Author

Since the search region depends on the visible area (zoom level) on the map, I think a single geohash is not sufficient for our purpose. Ideally, every business record should be indexed by individual zoom levels.

@kikuomax
Copy link
Member Author

kikuomax commented Aug 14, 2021

According to the DynamoDB quota, DynamoDB can have at most 20 global secondary indexes per table. Not all of the zoom levels supported by Mapbox (0 to 22) fit in this limitation.

@kikuomax
Copy link
Member Author

We have to determine some typical zoom levels.

@kikuomax
Copy link
Member Author

kikuomax commented Aug 14, 2021

I use zoom levels 17 and 18 with my mobile phone during I walk my dog.
I think zoom level 19 is close enough to determine the precise location of the business record.
I often use zoom levels 15 and 16.
I sometimes use zoom levels 11 to 14.
I do not think zoom levels 0 to 10 make any difference.

@kikuomax
Copy link
Member Author

kikuomax commented Aug 14, 2021

Zoom levels to index,

  • 18 (covers 19 to 22)
  • 17
  • 16
  • 15

Above indexing is enough for dog walking. Further indexing is necessary for browsing. Unfortunately I have no clue about it.

  • 10 (covers 11 to 14)
  • 6 (covers 7 to 9)
  • 3 (covers 4 and 5)
  • 0 (covers 1 and 2)

Indexing the level 0 is not necessary because it is equivalent to scanning every record.

@kikuomax
Copy link
Member Author

kikuomax commented Aug 14, 2021

7 global secondary indexes should not harm, but consume more WCUs and RCUs.

@kikuomax
Copy link
Member Author

kikuomax commented Aug 16, 2021

To work around the limitation of the number of global secondary indexes, we could store an additional item per zoom level in the table, that associates a business record with its map tile coordinates at a specific zoom level.
A primary key combination would be

  • PK = <zoom-level>:<x>:<y>
  • SK = timestamp

Since this method needs more put_item requests, it will be more error-prone.

@kikuomax
Copy link
Member Author

I prefer global secondary index as long as the quota does not matter.

@kikuomax
Copy link
Member Author

To query business records of a specific dog at a zoom level z, here are the requirements for a primary key combination of the index corresponding to the zoom level z,

  • partition key = <dog-id>:<tile-x>:<tile-y>
  • sort key = timestamp

@kikuomax
Copy link
Member Author

kikuomax commented Aug 17, 2021

There are two different sets of requirements for a primary key combination, one for a specific dog (private view), and the other for all dogs (public view; i.e., #38).

@kikuomax
Copy link
Member Author

One solution is to create 7 more similar indexes for the public view. This consumes precious indexes.

@kikuomax
Copy link
Member Author

Another solution is to create separate items for public and private views. The two views share indexes but have different prefixes.

Private item,

  • partition key: private:<dog-id>:<x>:<y>
  • sort key: timestamp

Public item,

  • partition key: public:<x>:<y>
  • sort key: timestamp

@kikuomax
Copy link
Member Author

By the way, isn't it a bad idea to have a huge partition in DynamoDB? The map tile indexing I proposed here will create a huge partition especially for lower zoom levels.

@kikuomax
Copy link
Member Author

When I googled about the problems having a huge partition in DynamoDB, I found a scary article said the partition size is up to 10GB! But according to the documentation, this limit is applied only to a table with one or more local secondary indexes. So this should not matter to the business record table.

@kikuomax
Copy link
Member Author

A huge partition may be more susceptible to the capacity cap per partition (3,000 RCUs and 1,000 WCUs). It should not matter for our app so far.

@kikuomax
Copy link
Member Author

I found that CloudFormation cannot create or delete more than one global secondary index in a single update. This is painful.
aws-cloudformation/cloudformation-coverage-roadmap#229

@kikuomax
Copy link
Member Author

We have to provision global secondary indexes (GSIs) one by one. I hope I am not stupid enough to edit the CDK stack every time I provision a single GSI. May we use a context value to control which GSI is going to be provisioned?

@kikuomax
Copy link
Member Author

How do we get map tiles visible on the screen? We can listen for "sourcedata" event to know which map tile is requested, but no event is notified after the map tile is cached.

@kikuomax
Copy link
Member Author

How do we get map tiles visible on the screen? We can listen for "sourcedata" event to know which map tile is requested, but no event is notified after the map tile is cached.

My concern is that once business records in a map tile are queried at a "sourcedata" event, they will not be re-queried unless mapbox cache is cleared. But this should not matter unless you want to monitor business records of your dog friend updated by other than you. Because updates made by you are immediately recorded on memory.

@kikuomax
Copy link
Member Author

We have to invent our own caching feature though, use "sourcedata" events for now.

@kikuomax
Copy link
Member Author

kikuomax commented Aug 26, 2021

Because not all of zoom levels are indexed, use the following algorithm to cover a queried map tile at (x, y) at a zoom level z.

  1. Use exact z, x and y if z is an indexed zoom level.
  2. Use z-1, floor(x/2) and floor(y/2) if z-1 is an indexed zoom level.
  3. Use z-2, floor(x/4) and floor(y/4) if z-2 is an indexed zoom level.
  4. and so on

zoom-level-covering

@kikuomax
Copy link
Member Author

kikuomax commented Aug 28, 2021

One problem of depending on a "sourcedata" event is that the maximum zoom level is capped by that of the event (tile) source. It is 16 in the case of the style mapbox://styles/mapbox/streets-v11.

@kikuomax
Copy link
Member Author

kikuomax commented Aug 29, 2021

A global business explorer (#38) will be added as a map tile source in the future. For finer zoom levels, I think we can count on it.

@kikuomax
Copy link
Member Author

I realized that the zoom level zero also has to be indexed because there is no index scanning all of public records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant