Commit Graph

42 Commits

Author SHA1 Message Date
Colin Dellow 25709ae098 exercise xDestroy 2018-07-05 20:03:30 -04:00
Colin Dellow 45ecf24a75 Support float32
Fixes #32
2018-07-05 19:48:14 -04:00
Colin Dellow fd06ec5a23 test `rowid IS NULL`
Found via coverage
2018-07-05 19:17:19 -04:00
Colin Dellow ebb0eb7710 Add test case for #30 2018-07-04 19:59:55 -04:00
Colin Dellow 2e1ac92882 Revert "Add other random test case"
This reverts commit 3bdc6f7078.
2018-07-04 19:49:42 -04:00
Colin Dellow 3bdc6f7078 Add other random test case 2018-07-04 19:48:36 -04:00
Colin Dellow 33f8dbe4f4 Add test case for #26 2018-07-04 19:45:08 -04:00
Colin Dellow 5b26a78c1f Improve random query generation
...throw in the occasional `NOT`
2018-07-04 19:14:35 -04:00
Colin Dellow 0bdcc9895e All-in-one build command
`./make-linux` clones and builds:

- arrow
- brotli
- lz4
- parquet
- snappy
- zlib
- zstd
- this project

as a statically linked binary. Two Boost libs are still pulled in as
shared libs, should probably fix that, too, for ultimate portability.
2018-06-24 21:11:07 -04:00
Colin Dellow d3ab5ff3e7 Cache clauses -> row group mapping
Create a shadow table. For `stats`, it'd be `_stats_rowgroups`.

It contains three columns:

- the clause (eg `city = 'Dawson Creek'`)
- the initial estimate, as a bitmap of rowgroups based on stats
- the actual observed rowgroups, as a bitmap

This papers over poorly sorted parquet files, at the cost of some disk
space. It makes interactive queries much more natural -- drilldown style
queries are much faster, as they can leverage work done by previous
queries.

eg 'SELECT * FROM stats WHERE city = 'Dawson Creek' and question_id >= 1935 and question_id <= 1940`
takes ~584ms on first run, but 9ms on subsequent runs.

We only create entries when the estimates don't match the actual
results.

Fixes #6
2018-03-24 23:57:15 -04:00
Colin Dellow d2c736f25a Add LIMIT/OFFSET to random queries 2018-03-24 19:02:30 -04:00
Colin Dellow 51d0f27a68 don't segfault on low memory
Fixes #8
2018-03-24 12:48:29 -04:00
Colin Dellow 6fa7bc3d0b Add harness for low memory testing 2018-03-24 11:27:06 -04:00
Colin Dellow 8bf890ab66 Fix incorrect row pruning for non-text BYTE_ARRAY 2018-03-18 19:43:09 -04:00
Colin Dellow 893e4c63f5 Add testcase generator
Very simplistics - select M fields, filters on N fields, slight bias to
use values of same type of the field it's comparing against.

No segfaults yet, but one test case that generates differing output when
run against `nulls` and `nulls1`:

```
select rowid from nulls1 where binary_9 >= '56' and ts_5 < 496886400000;
```
2018-03-18 19:11:26 -04:00
Colin Dellow b0c7b229dd Create queries from templates if needed 2018-03-18 17:50:39 -04:00
Colin Dellow 7f2042742b Also compare queries against SQLite itself 2018-03-18 17:49:12 -04:00
Colin Dellow e2af2a07a4 Make rowid start from 1, not 0
Unclear whether this is strictly required, but I'm going to start using
SQLite as an oracle, and it'll be simpler if our rowids match theirs.
2018-03-18 17:03:46 -04:00
Colin Dellow 078754467e Generate queries from templates
Huzzah, a bunch of failures have appeared.
2018-03-18 14:28:31 -04:00
Colin Dellow e3f0dff083 Move queries/* to templates 2018-03-18 13:28:56 -04:00
Colin Dellow 65ea1b2f61 Rewrite tests for automatic generation
Regularize the parquets - nulls and nonulls each come in 3 variants,
with 1, 10 and 99 rows per rowgroup.

All test queries are written against nullsA, no_nullsA.

Next commit will introduce a tool to expand these template queries to
go against the actual tables.
2018-03-18 13:11:29 -04:00
Colin Dellow 3b557f7fb0 Add explicit test for file not found
...caching the metadata moved where ParquetTable did I/O,
which introduced a segfault on not found
2018-03-18 11:58:23 -04:00
Colin Dellow a3af16eb54 Row-filtering for other string ops 2018-03-17 15:28:51 -04:00
Colin Dellow 753a490687 Tests for blobs 2018-03-16 23:53:08 -04:00
Colin Dellow cbf388698b BOOL and INT96 tests 2018-03-16 16:02:11 -04:00
Colin Dellow 110e3e3668 row group skipping for is [not] null queries 2018-03-12 21:09:00 -04:00
Colin Dellow acc15256ec Add rowgroup filtering for rowid 2018-03-12 20:42:50 -04:00
Colin Dellow 1f938a005d More tests cases to deal with affinity
I'm not sure how these manifest - whether SQLite retypes them based on
column affinity before we see them, or whether they're provided as is.
2018-03-11 19:18:44 -04:00
Colin Dellow 095b576cc2 Scaffolding for row group filters, tests
rowid is special since its column index is -1, so add
explicit tests around it
2018-03-11 15:44:51 -04:00
Colin Dellow 5559a7b563 Fix when last rowgroup is not same size as first
...change test data to use 99 rows, so that when we have
rowgroup size 10 we exercise this code.
2018-03-11 15:15:27 -04:00
Colin Dellow d28ae86d15 Test unusable constraints 2018-03-10 13:38:34 -05:00
Colin Dellow 96fcafcd2f Add test cases 2018-03-10 13:25:13 -05:00
Colin Dellow b7c134efc0 test-queries: can debug a testcase
`tests/test-queries regex` filters the test cases.

If the resulting set has only one test case, run it under gdb.
2018-03-10 11:54:36 -05:00
Colin Dellow 2d616c54fb More tests 2018-03-07 20:30:25 -05:00
Colin Dellow 35fcde926c Rewrite SQL oracle harness 2018-03-07 20:20:34 -05:00
Colin Dellow caefc23b1e Add a pg oracle
- define `datetime`, `printf` fns in pg so it produces similar
  output as sqlite

- tidy up input data to be less wide

To do: some fns to make it easy to generate a new test case. Probably
want to mount all the 3 parquets simultaneously and refer to the
sqlite table by the same name as the pg table.
2018-03-07 19:40:38 -05:00
Colin Dellow 0d4806ca6f Rejig parquet generation
- "fixed_size_binary" -> "binary_10"
- make null parquet use rowgroups of sie 10: first rowgroup
  has no nulls, 2nd has all null, 3rd-10th have alternating
  nulls

This is prep for making a Postgres layer to use as an oracle
for generating test cases so that we have good coverage before
implementing advanced `xBestIndex` and `xFilter` modes.
2018-03-06 21:02:26 -05:00
Colin Dellow 56245c1d3d test case for nulls 2018-03-04 22:48:39 -05:00
Colin Dellow 67005623df `ensureColumn` catches up when rows are skipped 2018-03-04 22:29:35 -05:00
Colin Dellow bb3a9440f7 Add query test framework, fix xFilter 2018-03-04 21:05:26 -05:00
Colin Dellow 7edb5e472f Support BLOBs 2018-03-04 17:20:59 -05:00
Colin Dellow a4f368af9c Add tests for unsupported types 2018-03-04 13:02:42 -05:00