sqlite-parquet-vtable/parquet
Colin Dellow d3ab5ff3e7 Cache clauses -> row group mapping
Create a shadow table. For `stats`, it'd be `_stats_rowgroups`.

It contains three columns:

- the clause (eg `city = 'Dawson Creek'`)
- the initial estimate, as a bitmap of rowgroups based on stats
- the actual observed rowgroups, as a bitmap

This papers over poorly sorted parquet files, at the cost of some disk
space. It makes interactive queries much more natural -- drilldown style
queries are much faster, as they can leverage work done by previous
queries.

eg 'SELECT * FROM stats WHERE city = 'Dawson Creek' and question_id >= 1935 and question_id <= 1940`
takes ~584ms on first run, but 9ms on subsequent runs.

We only create entries when the estimates don't match the actual
results.

Fixes #6
2018-03-24 23:57:15 -04:00
..
.gitignore Initial checkin of CSV table 2018-03-02 18:59:34 -05:00
Makefile Compile w/static linkages for parquet 2018-03-20 19:06:39 -04:00
cmds.txt Code to pretty print constraints 2018-03-10 10:59:53 -05:00
go Code to pretty print constraints 2018-03-10 10:59:53 -05:00
parquet.cc Cache clauses -> row group mapping 2018-03-24 23:57:15 -04:00
parquet_cursor.cc Cache clauses -> row group mapping 2018-03-24 23:57:15 -04:00
parquet_cursor.h Cache clauses -> row group mapping 2018-03-24 23:57:15 -04:00
parquet_filter.cc Cache clauses -> row group mapping 2018-03-24 23:57:15 -04:00
parquet_filter.h Cache clauses -> row group mapping 2018-03-24 23:57:15 -04:00
parquet_table.cc Cache clauses -> row group mapping 2018-03-24 23:57:15 -04:00
parquet_table.h Cache clauses -> row group mapping 2018-03-24 23:57:15 -04:00