mirror of
https://github.com/cldellow/sqlite-parquet-vtable.git
synced 2025-09-18 22:59:58 +00:00
Cache clauses -> row group mapping
Create a shadow table. For `stats`, it'd be `_stats_rowgroups`. It contains three columns: - the clause (eg `city = 'Dawson Creek'`) - the initial estimate, as a bitmap of rowgroups based on stats - the actual observed rowgroups, as a bitmap This papers over poorly sorted parquet files, at the cost of some disk space. It makes interactive queries much more natural -- drilldown style queries are much faster, as they can leverage work done by previous queries. eg 'SELECT * FROM stats WHERE city = 'Dawson Creek' and question_id >= 1935 and question_id <= 1940` takes ~584ms on first run, but 9ms on subsequent runs. We only create entries when the estimates don't match the actual results. Fixes #6
This commit is contained in:
12
README.md
12
README.md
@@ -72,6 +72,18 @@ constraints before returning control to SQLite's virtual machine. This minimizes
|
||||
the number of allocations performed when many rows are filtered out by
|
||||
the user's criteria.
|
||||
|
||||
### Memoized slices
|
||||
|
||||
Individual clauses are mapped to the row groups they match.
|
||||
|
||||
eg going on row group statistics, which store minimum and maximum values, a clause
|
||||
like `WHERE city = 'Dawson Creek'` may match 80% of row groups.
|
||||
|
||||
In reality, it may only be present in one or two row groups.
|
||||
|
||||
This is recorded in a shadow table so future queries that contain that clause
|
||||
can read only the necessary row groups.
|
||||
|
||||
### Types
|
||||
|
||||
These Parquet types are supported:
|
||||
|
Reference in New Issue
Block a user