sqlite-parquet-vtable

Commit Graph

Author	SHA1	Message	Date
Colin Dellow	ec6e970bbc	Fix `order by rowid` to apply w/o clause Fixes #12, first screen of datasette is fast now	2018-06-24 15:20:06 -04:00
Colin Dellow	5b59ba02fe	Make ORDER BY ROWID fast Fixes #11	2018-06-24 15:07:27 -04:00
Colin Dellow	16cdd70f2b	Short-circuit row group evaluation We can avoid eagerly computing bitmasks for other constraints this way. Possible future work - order the constraints such that we evaluate the one that is cheapest/most likely to prune a row group first. This reduces the cyclist query from ~65ms to ~60ms	2018-06-24 11:08:56 -04:00
Colin Dellow	b9c58bd97e	persist row group clauses on EOF ...not on close. Fixes #9	2018-06-23 16:25:56 -04:00
Colin Dellow	d3ab5ff3e7	Cache clauses -> row group mapping Create a shadow table. For `stats`, it'd be `_stats_rowgroups`. It contains three columns: - the clause (eg `city = 'Dawson Creek'`) - the initial estimate, as a bitmap of rowgroups based on stats - the actual observed rowgroups, as a bitmap This papers over poorly sorted parquet files, at the cost of some disk space. It makes interactive queries much more natural -- drilldown style queries are much faster, as they can leverage work done by previous queries. eg 'SELECT * FROM stats WHERE city = 'Dawson Creek' and question_id >= 1935 and question_id <= 1940` takes ~584ms on first run, but 9ms on subsequent runs. We only create entries when the estimates don't match the actual results. Fixes #6	2018-03-24 23:57:15 -04:00
Colin Dellow	51d0f27a68	don't segfault on low memory Fixes #8	2018-03-24 12:48:29 -04:00
Colin Dellow	599430b2f4	Add #ifdefs around printfs	2018-03-20 19:57:12 -04:00
Colin Dellow	1f3ffce560	Row group filtering for BYTE_ARRAY	2018-03-18 15:03:08 -04:00
Colin Dellow	3b557f7fb0	Add explicit test for file not found ...caching the metadata moved where ParquetTable did I/O, which introduced a segfault on not found	2018-03-18 11:58:23 -04:00
Colin Dellow	95748a5192	Remove bool from Constraint	2018-03-12 20:50:30 -04:00
Colin Dellow	acc15256ec	Add rowgroup filtering for rowid	2018-03-12 20:42:50 -04:00
Colin Dellow	830053c1fc	Scaffolding for in-extension filtering Supports IS NULL and IS NOT NULL checks	2018-03-11 13:58:10 -04:00
Colin Dellow	d28ae86d15	Test unusable constraints	2018-03-10 13:38:34 -05:00
Colin Dellow	96fcafcd2f	Add test cases	2018-03-10 13:25:13 -05:00
Colin Dellow	210f322a1c	Code to pretty print constraints	2018-03-10 10:59:53 -05:00
Colin Dellow	824a416f51	better debug logs for xBestIndex	2018-03-08 13:21:33 -05:00
Colin Dellow	0d4806ca6f	Rejig parquet generation - "fixed_size_binary" -> "binary_10" - make null parquet use rowgroups of sie 10: first rowgroup has no nulls, 2nd has all null, 3rd-10th have alternating nulls This is prep for making a Postgres layer to use as an oracle for generating test cases so that we have good coverage before implementing advanced `xBestIndex` and `xFilter` modes.	2018-03-06 21:02:26 -05:00
Colin Dellow	67005623df	`ensureColumn` catches up when rows are skipped	2018-03-04 22:29:35 -05:00
Colin Dellow	bb3a9440f7	Add query test framework, fix xFilter	2018-03-04 21:05:26 -05:00
Colin Dellow	7edb5e472f	Support BLOBs	2018-03-04 17:20:59 -05:00
Colin Dellow	67b0d96967	float support	2018-03-03 20:57:09 -05:00
Colin Dellow	eb0b48f867	Boolean, INT96, INT64	2018-03-03 20:00:50 -05:00
Colin Dellow	1de843fca8	Very rough first cut supports int32, double, strings.	2018-03-03 15:44:01 -05:00
Colin Dellow	f8599f8d3e	Rename some references to CSVs ...some nonsensical things, like "first row of Parquet", but we'll tidy them up later.	2018-03-02 19:18:36 -05:00
Colin Dellow	552da5a647	Initial checkin of CSV table parquet.cc is a fork of the sample CSV virtual table at https://www.sqlite.org/src/artifact?ci=trunk&filename=ext/misc/csv.c So far the only changes are those needed to make it compile cleanly in C++11 mode.	2018-03-02 18:59:34 -05:00

25 Commits