Commit Graph

16 Commits

Author SHA1 Message Date
Mikko Harju 96405b77dc Refactored the types to match the most recent Apache Arrow version 2019-11-13 15:17:29 +02:00
Colin Dellow d3ab5ff3e7 Cache clauses -> row group mapping
Create a shadow table. For `stats`, it'd be `_stats_rowgroups`.

It contains three columns:

- the clause (eg `city = 'Dawson Creek'`)
- the initial estimate, as a bitmap of rowgroups based on stats
- the actual observed rowgroups, as a bitmap

This papers over poorly sorted parquet files, at the cost of some disk
space. It makes interactive queries much more natural -- drilldown style
queries are much faster, as they can leverage work done by previous
queries.

eg 'SELECT * FROM stats WHERE city = 'Dawson Creek' and question_id >= 1935 and question_id <= 1940`
takes ~584ms on first run, but 9ms on subsequent runs.

We only create entries when the estimates don't match the actual
results.

Fixes #6
2018-03-24 23:57:15 -04:00
Colin Dellow 1f3ffce560 Row group filtering for BYTE_ARRAY 2018-03-18 15:03:08 -04:00
Colin Dellow f7f1ed03d1 add row filter for string ==
This gets the census `== 'Dawson Creek'` query down to ~410ms from
~650ms.

That still seems much slower than it should be. Am I accidentally
doing a copy? Now to go learn how to profile C++ code...
2018-03-15 21:37:52 -04:00
Colin Dellow 769060dbcb Add stub row group filters for text/int/dbl
Checkpointing to investigate why min/max stats for text aren't
present
2018-03-12 23:07:41 -04:00
Colin Dellow acc15256ec Add rowgroup filtering for rowid 2018-03-12 20:42:50 -04:00
Colin Dellow 095b576cc2 Scaffolding for row group filters, tests
rowid is special since its column index is -1, so add
explicit tests around it
2018-03-11 15:44:51 -04:00
Colin Dellow 830053c1fc Scaffolding for in-extension filtering
Supports IS NULL and IS NOT NULL checks
2018-03-11 13:58:10 -04:00
Colin Dellow 210f322a1c Code to pretty print constraints 2018-03-10 10:59:53 -05:00
Colin Dellow 67005623df `ensureColumn` catches up when rows are skipped 2018-03-04 22:29:35 -05:00
Colin Dellow bb3a9440f7 Add query test framework, fix xFilter 2018-03-04 21:05:26 -05:00
Colin Dellow 4c54ab89ae Don't segfault on full table scan 2018-03-04 17:49:19 -05:00
Colin Dellow 7edb5e472f Support BLOBs 2018-03-04 17:20:59 -05:00
Colin Dellow 67b0d96967 float support 2018-03-03 20:57:09 -05:00
Colin Dellow eb0b48f867 Boolean, INT96, INT64 2018-03-03 20:00:50 -05:00
Colin Dellow 1de843fca8 Very rough first cut
supports int32, double, strings.
2018-03-03 15:44:01 -05:00