Commit Graph

16 Commits

Author SHA1 Message Date
Colin Dellow 0bdcc9895e All-in-one build command
`./make-linux` clones and builds:

- arrow
- brotli
- lz4
- parquet
- snappy
- zlib
- zstd
- this project

as a statically linked binary. Two Boost libs are still pulled in as
shared libs, should probably fix that, too, for ultimate portability.
2018-06-24 21:11:07 -04:00
Colin Dellow 84a22e6e77 link to blog 2018-06-24 11:39:44 -04:00
Colin Dellow fd87c44ccd Add link to csv2parquet 2018-06-23 23:58:13 -04:00
Colin Dellow 596496c9cb rejig README 2018-03-25 00:07:56 -04:00
Colin Dellow d3ab5ff3e7 Cache clauses -> row group mapping
Create a shadow table. For `stats`, it'd be `_stats_rowgroups`.

It contains three columns:

- the clause (eg `city = 'Dawson Creek'`)
- the initial estimate, as a bitmap of rowgroups based on stats
- the actual observed rowgroups, as a bitmap

This papers over poorly sorted parquet files, at the cost of some disk
space. It makes interactive queries much more natural -- drilldown style
queries are much faster, as they can leverage work done by previous
queries.

eg 'SELECT * FROM stats WHERE city = 'Dawson Creek' and question_id >= 1935 and question_id <= 1940`
takes ~584ms on first run, but 9ms on subsequent runs.

We only create entries when the estimates don't match the actual
results.

Fixes #6
2018-03-24 23:57:15 -04:00
Colin Dellow cafd087113 Update README 2018-03-24 12:49:03 -04:00
Colin Dellow 045e17da34 Note about 64-bit sqlite 2018-03-18 18:25:08 -04:00
Colin Dellow d430a45e41 Update README 2018-03-18 15:08:02 -04:00
Colin Dellow 01e8ffaba7 Row group filtering for double/float 2018-03-16 16:30:05 -04:00
Colin Dellow e87f0d0f68 Note about versions 2018-03-16 00:19:25 -04:00
Colin Dellow 7edb5e472f Support BLOBs 2018-03-04 17:20:59 -05:00
Colin Dellow f3e78408bf Update demo to use checked in parquet 2018-03-04 13:06:50 -05:00
Colin Dellow aea9469bff tweak wording 2018-03-04 13:04:58 -05:00
Colin Dellow 18f07f4c43 More defensive, add caveats 2018-03-03 20:30:46 -05:00
Colin Dellow 552da5a647 Initial checkin of CSV table
parquet.cc is a fork of the sample CSV virtual table at
https://www.sqlite.org/src/artifact?ci=trunk&filename=ext/misc/csv.c

So far the only changes are those needed to make it compile cleanly in
C++11 mode.
2018-03-02 18:59:34 -05:00
Colin Dellow 8b9b3bcc9d
Initial commit 2018-03-02 18:37:08 -05:00