Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indices support (WIP) #335

Draft
wants to merge 4 commits into
base: travis/beam-0.8.0.0
Choose a base branch
from

Conversation

Martoon-00
Copy link

@Martoon-00 Martoon-00 commented Dec 10, 2018

Motivation

Currently, there is no way to deal with the database schema purely via Haskell code, which is not good at least because when we define our schema in raw SQL, we get table and fields names defined in several places.
I'm planning to make SQL engine know about foreign key constraints in the next PR, and these two seem to provide a basis for moving away from manual schemas definition completely.

PR Details

The following features are implemented:

  • Manual indices definition.
    Here a user can provide a list of indices on the per-table basis, each index is defined by a sequence of its fields and its settings (only most basic settings are supported for now). With syntax as the following.

  • Automatic indices detection.
    A user can make a DatabaseIndices (which is similar to DatabaseSettings but contains information about indices correspondingly) via defaultDbIndices, which gathers indices on those columns which are assumed to participate in JOINs.
    Later the user can add these indices to the custom ones via dedicated function mergeDbIndices, or immediately use addDefaultDbIndices.

  • Manual/automatic migrations support.
    Previously gathered indices are converted to TableChecks and added to CheckedDatabaseSettings.
    Two action providers are supplied: ADD INDEX and DROP INDEX.
    Herewith, unfortunately, there is no SQLite support, since there is no way to modify indices of the existing table there. A common workaround is to rename the table, create a new one with the desired indices and copy all the data, but I'm not yet ready to write copying action providers, although generally, that scheme seems to be perfectly discoverable with existing automatic migration means.

  • Testing.

    • Unit tests on indices definition.
    • Manual migrations.
    • Automatic migrations.
  • Haddock: builds.

  • Extend the manual.

Notes

I have a set of TODOs remained where I'm not sure about proper design choice, would appreciate hearing your point of view on this.

Also, I added .stylish-Haskell.yaml which corresponds to the style of this project, otherwise I get pretty mad diffs each time I save a file.

@Martoon-00 Martoon-00 force-pushed the martoon/add-indices branch 2 times, most recently from 3e9e642 to 287341f Compare December 10, 2018 20:36
@tathougies
Copy link
Collaborator

Hello!

Wow.. Thanks! This is a lot to digest, but right off the bat, I'll ask you to rebase on top of the beam-0.8.0.0 branch, since that represents the next release. There'll be a few changes you'll have to make to get it compiling. The upside is that it should trigger a gitlab CI build, which will have to pass before I merge.

Additions to the manual would be nice.

I'm still getting through the code.

Travis


import Database.Beam.Schema.Tables

-- Some day it should have more options and allow to modify them depending on the backend.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When merged, add a ticket for this to the issue reporting system

-- instances for all of them.
-- | Traverses the given table and for every field which is some 'PrimaryKey'
-- makes corresponding SQL index, this allows "JOIN"s on this table perform nicely.
instance {-# OVERLAPPABLE #-}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these instances overlappable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure whether is it a good choice, but I saw that there are plans to support many different entities type, while we want to have indices only for TableEntities. It looked like the one who will enable support for those other entities should not bother about indices at all, thus a default overlappable instance.
Should I define instances for each entity type explicitly instead?

-- | Provide options for an automatically created index, which is caused by a primary key
-- of one table being embedded into another.
-- This typeclass is only needed to be defined for inter-table references (see ':->').
class IndexFromReference reference where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I see right now is that tables should be able to have two foreign keys that refer to the same table type at the haskell level, but really refer to two different tables at the database level.

For example

data TwoKeyT f =
  TwoKey {
    _key1 :: PrimaryKey RefT f
    _key2 :: PrimaryKey RefT f
  }

data RefT f = Ref { _field1 :: C f Text, _field2 :: C f Int }

data Database entity = Database { _ref1 :: entity (TableEntity RefT), _ref2 :: entity (TableEntity RefT), _twoKey :: entity (TableEntity TwoKeyT) }

How would I make it so that it knows that _key1 refers to _ref2 and _key2 to _ref1 (or vice versa). I believe the way it is now, each type is uniquely linked to one database table, which is a large constraint.

One way to get around this that would also (I think) have the benefit of removing the instance overlap is to not use the type class mechanism and require the user to construct an appropriate value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a major problem. In case of this implementation via type class mechanism a workaround is to parametrize table type with a phantom type parameter, thus making different tables of the same type distinguishable in a Haskell code.
However, making user explicitly make a construct which describes options for the generated indices seems to be virtually better in several ways. Doing so may be tricky if we still want the type system to enforce the user to specify indices for all embedded PrimaryKeys present; I described my raw suggestion right below this code, we could discuss it :)

@Martoon-00
Copy link
Author

Oh, having CI checking my work would be really nice!
I see beam-0.8.0.0-fix-ci branch and multiple travis/beam-0.8.0.0-* braches, which one should I choose?

@tathougies
Copy link
Collaborator

Just regular beam-0.8.0.0. I'll get rid of the other ones. They're all merged.

@Martoon-00
Copy link
Author

That's weird, I see no plain beam-0.8.0.0 branch 🤔

@tathougies
Copy link
Collaborator

@Martoon-00
Copy link
Author

Ah, thanks. I wasn't sure it was that branch you were talking about because beam-core tests fail to compile there (713522b commit). Could you check?

@tathougies
Copy link
Collaborator

The latest commit there should be 377596e

@Martoon-00
Copy link
Author

Oh right, I searched in my forked repository. Thanks for helping with figuring this out, I'm rebasing.

@tathougies
Copy link
Collaborator

You'll need to resolve conflicts too to get CI working.

@Martoon-00
Copy link
Author

Yeah, it's gonna take a while :)

This includes:
* Manual indices definition;
* Automatic detection of indices required for performant JOINs;
* Manual and automatic index creation/deletion via `ADD/DROP INDEX`
  (without SQLite support)
@Martoon-00 Martoon-00 changed the base branch from master to travis/beam-0.8.0.0 December 20, 2018 10:15
@Martoon-00
Copy link
Author

Martoon-00 commented Dec 20, 2018

Finally, I rebased upon beam-0.8.0.0 branch. Now there are a couple of small issues:

  1. CI here is not triggered, probably because I'm proposing this change from another repo.
    Is there an option to enable CI in this case and could you do that, or I'd better relocate my code to this repository?
  2. I activated CI in my fork for now, and sometimes tests fail https://travis-ci.org/serokell/beam/jobs/470453508.
    Postgres seems to happily truncate strings after \0 character.

@tp-woven
Copy link

tp-woven commented Aug 8, 2022

Oh, wow, this looks really cool. Any chance of this getting resurrected and merged?

@kmicklas
Copy link
Member

@tp-woven You're welcome to take it over! I'm happy to review but unfortunately don't have the time for major feature development on Beam these days.

@Martoon-00
Copy link
Author

Yeah, @tp-woven I would be glad if you manage to take this work or pick it as inspiration for your own implementation, especially since @kmicklas approved this. Now I cannot allocate time to this.

If I remember correctly, this PR mostly handles the core part (not tested at all) but may need a decent update of the migration package.

@tp-woven
Copy link

Thanks, I'll try to take over. Fair warning: I'm a Haskell noob, so it may take a while... 😅

@kmicklas kmicklas marked this pull request as draft June 8, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants