1
0
mirror of https://github.com/cldellow/sqlite-parquet-vtable.git synced 2025-10-25 01:49:56 +00:00

Cache clauses -> row group mapping

Create a shadow table. For `stats`, it'd be `_stats_rowgroups`.

It contains three columns:

- the clause (eg `city = 'Dawson Creek'`)
- the initial estimate, as a bitmap of rowgroups based on stats
- the actual observed rowgroups, as a bitmap

This papers over poorly sorted parquet files, at the cost of some disk
space. It makes interactive queries much more natural -- drilldown style
queries are much faster, as they can leverage work done by previous
queries.

eg 'SELECT * FROM stats WHERE city = 'Dawson Creek' and question_id >= 1935 and question_id <= 1940`
takes ~584ms on first run, but 9ms on subsequent runs.

We only create entries when the estimates don't match the actual
results.

Fixes #6
This commit is contained in:
Colin Dellow
2018-03-24 23:51:15 -04:00
parent d2c736f25a
commit d3ab5ff3e7
9 changed files with 397 additions and 63 deletions

View File

@@ -47,7 +47,7 @@ main() {
fi
cat "$root"/parquet-generator/*.sql > "$root"/testcase-bootstrap.sql
rm test.db
rm -f test.db
"$root"/sqlite/sqlite3 test.db -init "$root"/testcase-bootstrap.sql < /dev/null
if [ ! -v NO_DEBUG ] && [ "$(cat testcases.txt | wc -l)" == "1" ]; then
set -x