Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared schema link #1055

Closed
wants to merge 8 commits into from
Closed

Shared schema link #1055

wants to merge 8 commits into from

Conversation

Horusiath
Copy link
Contributor

@Horusiath Horusiath commented Feb 20, 2024

This PR depends on ##1049

It basically stores the relationship between shared schema db and databases linked to it:

  • The relationships are stored in form of shared_schema_links table in the meta store of a shared-schema db.
  • When we create a dependent db, we store a link in shared-schema.
  • When we destroy dependent db, we remove link from shared-schema.
  • When we destroy shared-schema, we first check if all dependent dbs are already destroyed. We fail if they don't.

TODO: namespace forking (for linked db) also should store the link to shared schema, but this PR doesn't solve this yet.

@Horusiath Horusiath requested review from MarinPostma, psarna and LucioFranco and removed request for psarna February 20, 2024 13:25
@Horusiath Horusiath marked this pull request as ready for review February 21, 2024 14:27
@Horusiath Horusiath requested a review from haaawk February 21, 2024 14:27
let ns = self
.inner
.make_namespace
.create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we create shared_schema_db here. Shouldn't it already exist here @Horusiath ?
If we're creating it for every link what will happen if we create 2 databases that link to the same shared schema? Wouldn't it be created twice then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't it enough to just call let e = self.inner.store.get(shared_schema_db).await;?

@haaawk
Copy link
Contributor

haaawk commented Feb 22, 2024

@LucioFranco please review the metastore changes

}
if config.config.is_shared_schema {
let res: rusqlite::Result<u32> = guard.conn.query_row(
"SELECT COUNT(1) FROM shared_schema_links WHERE shared_schema_name = ?",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of using COUNT(1) instead of COUNT(*) here?

@haaawk
Copy link
Contributor

haaawk commented Feb 22, 2024

I think the data model for schema links is wrong. It duplicates entries:

sqlite> CREATE TABLE IF NOT EXISTS shared_schema_links (
(x1...>                 id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
(x1...>                 shared_schema_name TEXT NOT NULL,
(x1...>                 db_name TEXT NOT NULL
(x1...>             );
sqlite> INSERT OR REPLACE INTO shared_schema_links (shared_schema_name, db_name) VALUES ('a', 'b');
sqlite> INSERT OR REPLACE INTO shared_schema_links (shared_schema_name, db_name) VALUES ('a', 'b');
sqlite> INSERT OR REPLACE INTO shared_schema_links (shared_schema_name, db_name) VALUES ('a', 'b');
sqlite> select * from shared_schema_links;
1|a|b
2|a|b
3|a|b

Something's not consistent here. We're using INSERT OR REPLACE but we never replace.

Copy link
Contributor

@MarinPostma MarinPostma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right approach. I think the linked schema should be part of the config, and then have a table/column to quickly lookup what table is using what schema, if any. This can be enforced by sqlite constraints as well.

#[derive(Debug, Default)]
struct MetaState {
config: Arc<DatabaseConfig>,
shared_schemas: HashMap<NamespaceName, HashSet<NamespaceName>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we store that? and why here? this is some database-specific config; why does it have a map of the namespace to sets of namespaces?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a map from shared_schema -> set of dbs that use that shared schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not the place to store that. One instance of that is created for every namespace. Why do we need that mapping in memory?

Copy link
Contributor

@haaawk haaawk Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be empty for everything but shared schema databases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still don't understand why this needs to me in memory. Also, why is it is mapping namespaces to maps of namespaces? Are we storing all the shared schema databases to the databases that uses them? Why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but:

  1. the shared schema is almost never accessed, it's not like you are doing 10 migrations per second
  2. it can get very big, if you have 100s of thousands of dbs for a single namespace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing: something that we should do asap unload the config when the namespace is unloaded. Imagine the cost of reloading all the shared namespaces in memory every time that schema is accessed for one reason or another. This is potentially very long, and we most likely don't care about what db depend on that schema

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are fair points. So instead of caching in memory we fetch the list from the meta store each time it's being used. That's what you're proposing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, might even be an iterator, it we don't need the whole list at once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarinPostma @haaawk for the context: this piece of code is made for compliance with HandleState::Internal which - from what I've seen - is used only for testing purposes.

Comment on lines +153 to +154
shared_schema_name TEXT NOT NULL,
db_name TEXT NOT NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a references to the main table keys instead?

Comment on lines +356 to +362
if let Some(shared_schema) = shared_schema_name {
app_state
.namespaces
.shared_schema_link(shared_schema, namespace)
.await?;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not good, we could end up in a state with the db created and the not linked. This should be done in the same transaction as storing the config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair this is not worse than what we have already where we could end up with db being created but it's config missing, no @MarinPostma ?

@haaawk
Copy link
Contributor

haaawk commented Feb 22, 2024

I don't think this is the right approach. I think the linked schema should be part of the config, and then have a table/column to quickly lookup what table is using what schema, if any. This can be enforced by sqlite constraints as well.

WDYM @MarinPostma isn't @Horusiath creating such a table?

to quickly lookup what table is using what schema

You meant what namespace is using what schema, right?

@haaawk haaawk mentioned this pull request Feb 24, 2024
@haaawk
Copy link
Contributor

haaawk commented Feb 24, 2024

I've applied all the comments here -> #1080

@haaawk haaawk closed this Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants