Colin Dellow
6648ff5968
add string == row group filter
...
For the statscan census set filtering on `== 'Dawson Creek'`, the query
goes from 980ms to 660ms.
This is expected, since the data isn't sorted by that column.
I'll try adding some scaffolding to do filtering at the row level, too.
We could also try unpacking the dictionary and testing the individual
values, although we may want some heuristics to decide whether it's
worth doing -- eg if < 10% of the rows have a unique value.
Ideally, this should be like a ~1ms query.
2018-03-15 20:40:21 -04:00
Colin Dellow
dc431aee20
Dispatch row group filtering based on parquet type
2018-03-15 20:25:02 -04:00
Colin Dellow
92ba5f94e0
reuse FileMetaData
...
For the statscan dataset, parsing the file metadata takes ~30-40ms,
so stash it away for future re-use.
2018-03-15 19:57:38 -04:00
Colin Dellow
769060dbcb
Add stub row group filters for text/int/dbl
...
Checkpointing to investigate why min/max stats for text aren't
present
2018-03-12 23:07:41 -04:00
Colin Dellow
110e3e3668
row group skipping for is [not] null queries
2018-03-12 21:09:00 -04:00
Colin Dellow
acc15256ec
Add rowgroup filtering for rowid
2018-03-12 20:42:50 -04:00
Colin Dellow
1f938a005d
More tests cases to deal with affinity
...
I'm not sure how these manifest - whether SQLite retypes them based on
column affinity before we see them, or whether they're provided as is.
2018-03-11 19:18:44 -04:00
Colin Dellow
095b576cc2
Scaffolding for row group filters, tests
...
rowid is special since its column index is -1, so add
explicit tests around it
2018-03-11 15:44:51 -04:00
Colin Dellow
5559a7b563
Fix when last rowgroup is not same size as first
...
...change test data to use 99 rows, so that when we have
rowgroup size 10 we exercise this code.
2018-03-11 15:15:27 -04:00
Colin Dellow
830053c1fc
Scaffolding for in-extension filtering
...
Supports IS NULL and IS NOT NULL checks
2018-03-11 13:58:10 -04:00
Colin Dellow
210f322a1c
Code to pretty print constraints
2018-03-10 10:59:53 -05:00
Colin Dellow
67005623df
`ensureColumn` catches up when rows are skipped
2018-03-04 22:29:35 -05:00
Colin Dellow
bb3a9440f7
Add query test framework, fix xFilter
2018-03-04 21:05:26 -05:00
Colin Dellow
4c54ab89ae
Don't segfault on full table scan
2018-03-04 17:49:19 -05:00
Colin Dellow
7edb5e472f
Support BLOBs
2018-03-04 17:20:59 -05:00
Colin Dellow
67b0d96967
float support
2018-03-03 20:57:09 -05:00
Colin Dellow
eb0b48f867
Boolean, INT96, INT64
2018-03-03 20:00:50 -05:00
Colin Dellow
1de843fca8
Very rough first cut
...
supports int32, double, strings.
2018-03-03 15:44:01 -05:00