Colin Dellow 
							
						 
					 
					
						
						
							
						
						d2c736f25a 
					 
					
						
						
							
							Add LIMIT/OFFSET to random queries  
						
						 
						
						
						
						
					 
					
						2018-03-24 19:02:30 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						cafd087113 
					 
					
						
						
							
							Update README  
						
						 
						
						
						
						
					 
					
						2018-03-24 12:49:03 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						51d0f27a68 
					 
					
						
						
							
							don't segfault on low memory  
						
						 
						
						... 
						
						
						
						Fixes  #8  
						
						
					 
					
						2018-03-24 12:48:29 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						6fa7bc3d0b 
					 
					
						
						
							
							Add harness for low memory testing  
						
						 
						
						
						
						
					 
					
						2018-03-24 11:27:06 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						599430b2f4 
					 
					
						
						
							
							Add #ifdefs around printfs  
						
						 
						
						
						
						
					 
					
						2018-03-20 19:57:12 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						5480de7fb6 
					 
					
						
						
							
							Compile w/static linkages for parquet  
						
						 
						
						... 
						
						
						
						Fixes  #4 . A stock Ubuntu 14.04 can now install sqlite3:amd64 and
libboost-all-dev, then use this module to read the test parquet file. 
						
						
					 
					
						2018-03-20 19:06:39 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						8bf890ab66 
					 
					
						
						
							
							Fix incorrect row pruning for non-text BYTE_ARRAY  
						
						 
						
						
						
						
					 
					
						2018-03-18 19:43:09 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						893e4c63f5 
					 
					
						
						
							
							Add testcase generator  
						
						 
						
						... 
						
						
						
						Very simplistics - select M fields, filters on N fields, slight bias to
use values of same type of the field it's comparing against.
No segfaults yet, but one test case that generates differing output when
run against `nulls` and `nulls1`:
```
select rowid from nulls1 where binary_9 >= '56' and ts_5 < 496886400000;
``` 
						
						
					 
					
						2018-03-18 19:11:26 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						045e17da34 
					 
					
						
						
							
							Note about 64-bit sqlite  
						
						 
						
						
						
						
					 
					
						2018-03-18 18:25:08 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						b0c7b229dd 
					 
					
						
						
							
							Create queries from templates if needed  
						
						 
						
						
						
						
					 
					
						2018-03-18 17:50:39 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						7f2042742b 
					 
					
						
						
							
							Also compare queries against SQLite itself  
						
						 
						
						
						
						
					 
					
						2018-03-18 17:49:12 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						e2af2a07a4 
					 
					
						
						
							
							Make rowid start from 1, not 0  
						
						 
						
						... 
						
						
						
						Unclear whether this is strictly required, but I'm going to start using
SQLite as an oracle, and it'll be simpler if our rowids match theirs. 
						
						
					 
					
						2018-03-18 17:03:46 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						d430a45e41 
					 
					
						
						
							
							Update README  
						
						 
						
						
						
						
					 
					
						2018-03-18 15:08:02 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						1f3ffce560 
					 
					
						
						
							
							Row group filtering for BYTE_ARRAY  
						
						 
						
						
						
						
					 
					
						2018-03-18 15:03:08 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						7b302a0eb2 
					 
					
						
						
							
							Bail on rowId constraint when non-int  
						
						 
						
						
						
						
					 
					
						2018-03-18 14:31:23 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						078754467e 
					 
					
						
						
							
							Generate queries from templates  
						
						 
						
						... 
						
						
						
						Huzzah, a bunch of failures have appeared. 
						
						
					 
					
						2018-03-18 14:28:31 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						e3f0dff083 
					 
					
						
						
							
							Move queries/* to templates  
						
						 
						
						
						
						
					 
					
						2018-03-18 13:28:56 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						65ea1b2f61 
					 
					
						
						
							
							Rewrite tests for automatic generation  
						
						 
						
						... 
						
						
						
						Regularize the parquets - nulls and nonulls each come in 3 variants,
with 1, 10 and 99 rows per rowgroup.
All test queries are written against nullsA, no_nullsA.
Next commit will introduce a tool to expand these template queries to
go against the actual tables. 
						
						
					 
					
						2018-03-18 13:11:29 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						3b557f7fb0 
					 
					
						
						
							
							Add explicit test for file not found  
						
						 
						
						... 
						
						
						
						...caching the metadata moved where ParquetTable did I/O,
which introduced a segfault on not found 
						
						
					 
					
						2018-03-18 11:58:23 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						4cbde9fc09 
					 
					
						
						
							
							Row filtering for doubles  
						
						 
						
						
						
						
					 
					
						2018-03-17 16:09:57 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						86e09b111e 
					 
					
						
						
							
							Add row filtering for int32/64/96/boolean  
						
						 
						
						
						
						
					 
					
						2018-03-17 16:05:38 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						a3af16eb54 
					 
					
						
						
							
							Row-filtering for other string ops  
						
						 
						
						
						
						
					 
					
						2018-03-17 15:28:51 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						03a20a9432 
					 
					
						
						
							
							LIKE row group filtering  
						
						 
						
						... 
						
						
						
						~1.7s -> ~1.0s for the census data set on `LIKE 'Dawson %'` 
						
						
					 
					
						2018-03-17 00:11:38 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						753a490687 
					 
					
						
						
							
							Tests for blobs  
						
						 
						
						
						
						
					 
					
						2018-03-16 23:53:08 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						01e8ffaba7 
					 
					
						
						
							
							Row group filtering for double/float  
						
						 
						
						
						
						
					 
					
						2018-03-16 16:30:05 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						9c22fd1f57 
					 
					
						
						
							
							Row group filters for strings, int32/64/96, bools  
						
						 
						
						
						
						
					 
					
						2018-03-16 16:07:41 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						cbf388698b 
					 
					
						
						
							
							BOOL and INT96 tests  
						
						 
						
						
						
						
					 
					
						2018-03-16 16:02:11 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						e87f0d0f68 
					 
					
						
						
							
							Note about versions  
						
						 
						
						
						
						
					 
					
						2018-03-16 00:19:25 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						1f4cebe2a6 
					 
					
						
						
							
							Don't use accessors  
						
						 
						
						... 
						
						
						
						This drops the `= 'Dawson Creek'` query from 210ms to 145ms.
Maybe inlining would have been an option here? I'm not familiar enough
with g++ to know. :( 
						
						
					 
					
						2018-03-15 23:04:11 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						8ba13f44d5 
					 
					
						
						
							
							Remove unnecessary copy  
						
						 
						
						... 
						
						
						
						Now the `== 'Dawson Creek'` query is ~210ms, which is approx the
same as a `count(*)` query. This seems maybe OK, since the row group
filter is only excluding 30% of records. 
						
						
					 
					
						2018-03-15 22:10:45 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						f7f1ed03d1 
					 
					
						
						
							
							add row filter for string ==  
						
						 
						
						... 
						
						
						
						This gets the census `== 'Dawson Creek'` query down to ~410ms from
~650ms.
That still seems much slower than it should be. Am I accidentally
doing a copy? Now to go learn how to profile C++ code... 
						
						
					 
					
						2018-03-15 21:37:52 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						6648ff5968 
					 
					
						
						
							
							add string == row group filter  
						
						 
						
						... 
						
						
						
						For the statscan census set filtering on `== 'Dawson Creek'`, the query
goes from 980ms to 660ms.
This is expected, since the data isn't sorted by that column.
I'll try adding some scaffolding to do filtering at the row level, too.
We could also try unpacking the dictionary and testing the individual
values, although we may want some heuristics to decide whether it's
worth doing -- eg if < 10% of the rows have a unique value.
Ideally, this should be like a ~1ms query. 
						
						
					 
					
						2018-03-15 20:40:21 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						dc431aee20 
					 
					
						
						
							
							Dispatch row group filtering based on parquet type  
						
						 
						
						
						
						
					 
					
						2018-03-15 20:25:02 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						92ba5f94e0 
					 
					
						
						
							
							reuse FileMetaData  
						
						 
						
						... 
						
						
						
						For the statscan dataset, parsing the file metadata takes ~30-40ms,
so stash it away for future re-use. 
						
						
					 
					
						2018-03-15 19:57:38 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						769060dbcb 
					 
					
						
						
							
							Add stub row group filters for text/int/dbl  
						
						 
						
						... 
						
						
						
						Checkpointing to investigate why min/max stats for text aren't
present 
						
						
					 
					
						2018-03-12 23:07:41 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						110e3e3668 
					 
					
						
						
							
							row group skipping for is [not] null queries  
						
						 
						
						
						
						
					 
					
						2018-03-12 21:09:00 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						95748a5192 
					 
					
						
						
							
							Remove bool from Constraint  
						
						 
						
						
						
						
					 
					
						2018-03-12 20:50:30 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						acc15256ec 
					 
					
						
						
							
							Add rowgroup filtering for rowid  
						
						 
						
						
						
						
					 
					
						2018-03-12 20:42:50 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						1f938a005d 
					 
					
						
						
							
							More tests cases to deal with affinity  
						
						 
						
						... 
						
						
						
						I'm not sure how these manifest - whether SQLite retypes them based on
column affinity before we see them, or whether they're provided as is. 
						
						
					 
					
						2018-03-11 19:18:44 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						095b576cc2 
					 
					
						
						
							
							Scaffolding for row group filters, tests  
						
						 
						
						... 
						
						
						
						rowid is special since its column index is -1, so add
explicit tests around it 
						
						
					 
					
						2018-03-11 15:44:51 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						5559a7b563 
					 
					
						
						
							
							Fix when last rowgroup is not same size as first  
						
						 
						
						... 
						
						
						
						...change test data to use 99 rows, so that when we have
rowgroup size 10 we exercise this code. 
						
						
					 
					
						2018-03-11 15:15:27 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						830053c1fc 
					 
					
						
						
							
							Scaffolding for in-extension filtering  
						
						 
						
						... 
						
						
						
						Supports IS NULL and IS NOT NULL checks 
						
						
					 
					
						2018-03-11 13:58:10 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						d28ae86d15 
					 
					
						
						
							
							Test unusable constraints  
						
						 
						
						
						
						
					 
					
						2018-03-10 13:38:34 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						96fcafcd2f 
					 
					
						
						
							
							Add test cases  
						
						 
						
						
						
						
					 
					
						2018-03-10 13:25:13 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						b7c134efc0 
					 
					
						
						
							
							test-queries: can debug a testcase  
						
						 
						
						... 
						
						
						
						`tests/test-queries regex` filters the test cases.
If the resulting set has only one test case, run it under gdb. 
						
						
					 
					
						2018-03-10 11:54:36 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						210f322a1c 
					 
					
						
						
							
							Code to pretty print constraints  
						
						 
						
						
						
						
					 
					
						2018-03-10 10:59:53 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						2bc054a2cf 
					 
					
						
						
							
							Add crappy Makefile  
						
						 
						
						
						
						
					 
					
						2018-03-10 10:46:10 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						824a416f51 
					 
					
						
						
							
							better debug logs for xBestIndex  
						
						 
						
						
						
						
					 
					
						2018-03-08 13:21:33 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						2d616c54fb 
					 
					
						
						
							
							More tests  
						
						 
						
						
						
						
					 
					
						2018-03-07 20:30:25 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Colin Dellow 
							
						 
					 
					
						
						
							
						
						35fcde926c 
					 
					
						
						
							
							Rewrite SQL oracle harness  
						
						 
						
						
						
						
					 
					
						2018-03-07 20:20:34 -05:00