Colin Dellow
e3c6bad9f5
Add status badge, run tests
...
Fixes #16
Fixes #18
2018-07-05 09:17:19 -04:00
Colin Dellow
83db07456e
travis: revert sqlite hack
2018-07-05 09:13:37 -04:00
Colin Dellow
efb22c5c5a
travis: don't stomp CC in Makefile
2018-07-05 09:09:18 -04:00
Colin Dellow
f87d823607
travis: maybe fix cmake gcc
2018-07-05 09:01:56 -04:00
Colin Dellow
b4bf732eb1
travis: run cmake
...
This ensures `parquet_version.h.in` gets procesed
2018-07-05 08:55:46 -04:00
Colin Dellow
aaa07fe629
travis: force sqlite to use gcc
2018-07-04 23:46:36 -04:00
Colin Dellow
16da95ca29
Revert "travis: ensure gcc"
...
This reverts commit 9a0a19ea52
.
...gcc is already on the box. Hmm.
2018-07-04 23:45:01 -04:00
Colin Dellow
9a0a19ea52
travis: ensure gcc
2018-07-04 23:43:28 -04:00
Colin Dellow
ea3cd39ae6
travis: --yes
2018-07-04 23:36:59 -04:00
Colin Dellow
9789a21865
travis: PPA for cmake 3.2
2018-07-04 23:35:30 -04:00
Colin Dellow
b5de2799cd
travis: run in sudo env
2018-07-04 23:25:21 -04:00
Colin Dellow
1f729e91b7
Add .travis.yml
...
Progress towards #16
2018-07-04 23:22:39 -04:00
Colin Dellow
100edb7015
Support prebuilt binaries
...
...to make Travis CI builds reasonable.
Progress towards #16
2018-07-04 23:17:57 -04:00
Colin Dellow
806d87c8e7
Derp, fix makefile
2018-07-04 22:36:43 -04:00
Colin Dellow
0aaff745f8
Stomp timestamp of arrow/pq directory
...
Otherwise they'll be deemed to be newer than
the prebuilt libraries, and make will gallantly
kick off a build.
2018-07-04 22:13:10 -04:00
Colin Dellow
9516955717
Separate targets for fetching/building arrow/pq
...
This will allow us to use pre-built libs in Travis for
linking while still having the header files available for compilation.
2018-07-04 22:01:13 -04:00
Colin Dellow
b8df4b720b
Provide `publish_libs` target
...
...which publishes the newly-built supporting libs to my S3 bucket, for
future use by a Travis CI build.
Progress towards #16
2018-07-04 21:45:59 -04:00
Colin Dellow
005d7a451f
Add g++ as prerequisite
...
Progress towards #16
2018-07-04 21:30:51 -04:00
Colin Dellow
8084f14379
Add libicu for Ubuntu 14.04
...
Progress towards #16
2018-07-04 21:30:05 -04:00
Colin Dellow
7e961c4802
Skip tests for arrow
...
Partial progress towards #16
2018-07-04 20:50:47 -04:00
Colin Dellow
ebb0eb7710
Add test case for #30
2018-07-04 19:59:55 -04:00
Colin Dellow
2e1ac92882
Revert "Add other random test case"
...
This reverts commit 3bdc6f7078
.
2018-07-04 19:49:42 -04:00
Colin Dellow
3bdc6f7078
Add other random test case
2018-07-04 19:48:36 -04:00
Colin Dellow
33f8dbe4f4
Add test case for #26
2018-07-04 19:45:08 -04:00
Colin Dellow
5b26a78c1f
Improve random query generation
...
...throw in the occasional `NOT`
2018-07-04 19:14:35 -04:00
Colin Dellow
373616ad1e
Don't try to optimize IsNot
...
Doesn't handle NULLs correctly, will open separate ticket
for it. Fixes the IS NOT case of #26
2018-07-04 18:59:16 -04:00
Colin Dellow
0aa98ae1a5
Skip shared parquet/arrow libs, fix icu versions
2018-06-27 23:20:33 -04:00
Colin Dellow
2167d102b4
Add `make-linux-pgo`
...
fixes #23 , with perhaps some open questions about why PGO on
arrow/parquet-cpp regressed things.
2018-06-27 22:23:22 -04:00
Colin Dellow
1a4f540e18
Stub PGO code in
...
Incremental progress on #23 - should probably add a dedicated flag that
creates the instrumented binary, runs a test suite, then creates the
optimized binary.
2018-06-26 23:50:11 -04:00
Colin Dellow
1d0d4c08b8
Build sqlite in parallel
2018-06-26 23:05:30 -04:00
Colin Dellow
76fb058dd7
Link Boost statically
...
Fixes #15
2018-06-26 22:44:50 -04:00
Colin Dellow
263a6af7ec
Use Arrow's compression libraries
...
Fixes #27
2018-06-26 08:17:18 -04:00
Colin Dellow
129ff4e694
Merge pull request #25 from evansd/makefile-fixes
...
Makefile fixes
2018-06-25 13:54:29 -04:00
David Evans
b7da04433b
Include header locations we need
2018-06-25 18:20:24 +01:00
David Evans
ab87b13b75
Reverse prereqs order to get build to work
2018-06-25 18:20:04 +01:00
Colin Dellow
bc0be71546
Add brotli/snappy/gzip test files
...
`test/test-supported` verifies they can be opened
2018-06-25 08:32:36 -04:00
Colin Dellow
0bdcc9895e
All-in-one build command
...
`./make-linux` clones and builds:
- arrow
- brotli
- lz4
- parquet
- snappy
- zlib
- zstd
- this project
as a statically linked binary. Two Boost libs are still pulled in as
shared libs, should probably fix that, too, for ultimate portability.
2018-06-24 21:11:07 -04:00
Colin Dellow
ec6e970bbc
Fix `order by rowid` to apply w/o clause
...
Fixes #12 , first screen of datasette is fast now
2018-06-24 15:20:06 -04:00
Colin Dellow
5b59ba02fe
Make ORDER BY ROWID fast
...
Fixes #11
2018-06-24 15:07:27 -04:00
Colin Dellow
b774973852
Avoid row filter check when no constraints
...
The function call overhead is expensive!
This makes count(*) on the census data 175ms instead
of 225ms, while not significantly impacting other use cases.
2018-06-24 14:51:54 -04:00
Colin Dellow
84a22e6e77
link to blog
2018-06-24 11:39:44 -04:00
Colin Dellow
16cdd70f2b
Short-circuit row group evaluation
...
We can avoid eagerly computing bitmasks for other constraints this way.
Possible future work - order the constraints such that we evaluate the
one that is cheapest/most likely to prune a row group first.
This reduces the cyclist query from ~65ms to ~60ms
2018-06-24 11:08:56 -04:00
Colin Dellow
fd87c44ccd
Add link to csv2parquet
2018-06-23 23:58:13 -04:00
Colin Dellow
e1a86954e5
Revert "Don't eagerly evaluate constraints"
...
This reverts commit cbde3c73b6
.
This regresses:
```
WITH inputs AS (
SELECT
geo_name,
CASE WHEN profile_id = 1930 THEN 'total' ELSE 'cyclist' END AS mode,
female,
male
FROM census
WHERE profile_id IN ( '1930', '1935') AND
csd_type_name = 'CY' AND
geo_name IN ('Victoria', 'Dawson Creek', 'Kitchener')
)
SELECT
total.geo_name,
cyclist.male,
cyclist.female,
100.0 * cyclist.male / total.male,
100.0 * cyclist.female / total.female
FROM inputs AS total
JOIN inputs AS cyclist USING (geo_name)
WHERE total.mode = 'total' AND cyclist.mode = 'cyclist';
```
while improving:
```
select count(*) from census where geo_name in ('Dawson Creek', 'Kitchener', 'Victoria') and csd_type_name = 'CY' and profile_id = '1930';
```
which seems like a bad tradeoff.
2018-06-23 20:48:39 -04:00
Colin Dellow
603153c36c
avoid looking up physical type
2018-06-23 20:42:38 -04:00
Colin Dellow
cbde3c73b6
Don't eagerly evaluate constraints
...
...to avoid decompressing columns when we know from previous
columns that the row can't match.
Fixes #10
2018-06-23 20:31:03 -04:00
Colin Dellow
d7c5002cee
Move some code out of ensureColumn
...
Saves ~4% on the cold census needle query (~425ms -> ~405ms)
2018-06-23 19:10:23 -04:00
Colin Dellow
b9c58bd97e
persist row group clauses on EOF
...
...not on close. Fixes #9
2018-06-23 16:25:56 -04:00
Colin Dellow
6d4be61261
tweak Makefile
2018-06-23 16:13:18 -04:00
Colin Dellow
596496c9cb
rejig README
2018-03-25 00:07:56 -04:00