`./make-linux` clones and builds:
- arrow
- brotli
- lz4
- parquet
- snappy
- zlib
- zstd
- this project
as a statically linked binary. Two Boost libs are still pulled in as
shared libs, should probably fix that, too, for ultimate portability.
Create a shadow table. For `stats`, it'd be `_stats_rowgroups`.
It contains three columns:
- the clause (eg `city = 'Dawson Creek'`)
- the initial estimate, as a bitmap of rowgroups based on stats
- the actual observed rowgroups, as a bitmap
This papers over poorly sorted parquet files, at the cost of some disk
space. It makes interactive queries much more natural -- drilldown style
queries are much faster, as they can leverage work done by previous
queries.
eg 'SELECT * FROM stats WHERE city = 'Dawson Creek' and question_id >= 1935 and question_id <= 1940`
takes ~584ms on first run, but 9ms on subsequent runs.
We only create entries when the estimates don't match the actual
results.
Fixes#6
Very simplistics - select M fields, filters on N fields, slight bias to
use values of same type of the field it's comparing against.
No segfaults yet, but one test case that generates differing output when
run against `nulls` and `nulls1`:
```
select rowid from nulls1 where binary_9 >= '56' and ts_5 < 496886400000;
```
Regularize the parquets - nulls and nonulls each come in 3 variants,
with 1, 10 and 99 rows per rowgroup.
All test queries are written against nullsA, no_nullsA.
Next commit will introduce a tool to expand these template queries to
go against the actual tables.
- define `datetime`, `printf` fns in pg so it produces similar
output as sqlite
- tidy up input data to be less wide
To do: some fns to make it easy to generate a new test case. Probably
want to mount all the 3 parquets simultaneously and refer to the
sqlite table by the same name as the pg table.
- "fixed_size_binary" -> "binary_10"
- make null parquet use rowgroups of sie 10: first rowgroup
has no nulls, 2nd has all null, 3rd-10th have alternating
nulls
This is prep for making a Postgres layer to use as an oracle
for generating test cases so that we have good coverage before
implementing advanced `xBestIndex` and `xFilter` modes.