Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries that have type errors or reference missing columns should give helpful errors/warnings #129

Open
jrmuizel opened this issue May 22, 2020 · 7 comments

Comments

@jrmuizel
Copy link

STR:

  • load https://flatteningthecurve.herokuapp.com/data/canadatesting
  • execute select cumulative_testing from default where province = "Ontario"
  • you get: Type error: Function Equals is not implemented for types Type { decoded: String, codec: Some(Codec { ops: [PushDataSection(1), PushDataSection(2), DictLookup(U8)], column_name: "province", section_types: [U8, U64, U8], decoded_type: String, is_summation_preserving: false, is_order_preserving: true, is_fixed_width: true }), is_scalar: false, is_borrowed: false }, Type { decoded: Null, codec: None, is_scalar: false, is_borrowed: false }
@cswinter
Copy link
Owner

Thanks for reporting this! Queries like this should work so there's probably some bug that won't be too hard to fix. I'll take a closer look on the weekend.

@cswinter
Copy link
Owner

Actually, the problem looks like it might be caused by province being a missing/null column. What do you get when you run the :memtree command? What command did you use to load the data?

So I think there's two issues here:

  • comparison between null and String is not defined, this should probably just evaluate to false
  • querying columns that don't exist should display a warning

@jrmuizel
Copy link
Author

$ ./repl --load /tmp/canada_testing.csv
Loading /tmp/canada_testing.csv into table default.
Loaded data in 1.76ms.

# Table `default` (884 rows, 9.39KiB) #
id: 1.7KiB
province: 1.1KiB
date: 3.1KiB
cumulative_testing: 3.5KiB

# Table `_meta_tables` (0 rows, 153B) #
locustdb> :memtree
default  9.39KiB  884 rows  10.9B/row
├─ cumulative_testing       fully resident    3.5KiB   37%    4.0B/row
├─ date                     fully resident    3.1KiB   33%    3.6B/row
├─ id                       fully resident    1.7KiB   18%    2.0B/row
└─ province                 fully resident    1.1KiB   12%    1.3B/row

_meta_tables  0.000B  0 rows  NaNB/row

@cswinter
Copy link
Owner

cswinter commented May 23, 2020

Oh I think I figured out the actual problem, the parser interprets "Ontario" as an identifier for the (non-existent) Ontario column. This query should work: select cumulative_testing from default where province = 'Ontario'.

The LocustDB version on latest master actually works as you would expect for the original query and has some other fixes as well, I ought to publish a new version. I'll probably want to automate the whole release process with a github action which will take me a little while.

And of course the error message should still be improved.

@cswinter
Copy link
Owner

Correction: single quotes are in fact the correct way to delimit strings in SQL and the only reason it was working for me was because I was testing with an old version of LocustDB with less sophisticated parser.

@jrmuizel
Copy link
Author

Is it possible to give a better error message in this case? https://github.com/MaterializeInc/materialize gives: ERROR: column "Ontario" does not exist in this case.

@cswinter
Copy link
Owner

Is it possible to give a better error message in this case? https://github.com/MaterializeInc/materialize gives: ERROR: column "Ontario" does not exist in this case.

Basically yes, though it's a bit more tricky in LocustDB because its semantics actually allow for querying non-existent columns which are then assumed to be all nulls. This is because it has no fixed schema and any column might exist, even if there are no values within any particular shard.

List of action items from this issue:

  • Automate release process and release new LocustDB version
  • Improve error messages for type errors by pointing to the offending subexpression and include the simple base types in addition to the full type expression that include codecs
  • Detect when queries contain a column that is null/non-existent across all shards and print a warning
  • Detect when double quotes are used in a way that implies a string was intended and print helpful warning messge
  • Make String<->Null equality operator work

@cswinter cswinter changed the title Function Equals is not implemented for String Queries that have type errors or reference missing columns should give helpful errors/warnings May 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants