Commit Graph

9 Commits

Author SHA1 Message Date
Colin Dellow d7c5002cee Move some code out of ensureColumn
Saves ~4% on the cold census needle query (~425ms -> ~405ms)
2018-06-23 19:10:23 -04:00
Colin Dellow d3ab5ff3e7 Cache clauses -> row group mapping
Create a shadow table. For `stats`, it'd be `_stats_rowgroups`.

It contains three columns:

- the clause (eg `city = 'Dawson Creek'`)
- the initial estimate, as a bitmap of rowgroups based on stats
- the actual observed rowgroups, as a bitmap

This papers over poorly sorted parquet files, at the cost of some disk
space. It makes interactive queries much more natural -- drilldown style
queries are much faster, as they can leverage work done by previous
queries.

eg 'SELECT * FROM stats WHERE city = 'Dawson Creek' and question_id >= 1935 and question_id <= 1940`
takes ~584ms on first run, but 9ms on subsequent runs.

We only create entries when the estimates don't match the actual
results.

Fixes #6
2018-03-24 23:57:15 -04:00
Colin Dellow 92ba5f94e0 reuse FileMetaData
For the statscan dataset, parsing the file metadata takes ~30-40ms,
so stash it away for future re-use.
2018-03-15 19:57:38 -04:00
Colin Dellow 824a416f51 better debug logs for xBestIndex 2018-03-08 13:21:33 -05:00
Colin Dellow 67005623df `ensureColumn` catches up when rows are skipped 2018-03-04 22:29:35 -05:00
Colin Dellow 7edb5e472f Support BLOBs 2018-03-04 17:20:59 -05:00
Colin Dellow 18f07f4c43 More defensive, add caveats 2018-03-03 20:30:46 -05:00
Colin Dellow eb0b48f867 Boolean, INT96, INT64 2018-03-03 20:00:50 -05:00
Colin Dellow 1de843fca8 Very rough first cut
supports int32, double, strings.
2018-03-03 15:44:01 -05:00