f7f1ed03d1
This gets the census `== 'Dawson Creek'` query down to ~410ms from ~650ms. That still seems much slower than it should be. Am I accidentally doing a copy? Now to go learn how to profile C++ code... |
||
---|---|---|
parquet | ||
parquet-generator | ||
tests | ||
.gitignore | ||
LICENSE | ||
README.md | ||
build-sqlite |
README.md
sqlite-parquet-vtable
A SQLite virtual table extension to expose Parquet files as SQL tables.
Caveats
I'm not an experienced C/C++ programmer. This library is definitely not bombproof. It's good enough for my use case, and may be good enough for yours, too.
- I don't use
sqlite3_malloc
andsqlite3_free
for C++ objects- Maybe this doesn't matter, since portability isn't a goal
- The C -> C++ interop definitely leaks some C++ exceptions
- Obvious cases like file not found and unsupported Parquet types are OK
- Low memory conditions aren't handled gracefully.
Building
- Install
parquet-cpp
- Run
./build-sqlite
to fetch and build the SQLite dev bits - Run
./parquet/make
to build the module - You will need to fixup the paths in this file to point at your local parquet-cpp folder.
Use
$ sqlite/sqlite3
sqlite> .load parquet/libparquet
sqlite> CREATE VIRTUAL TABLE demo USING parquet('parquet-generator/100-rows-1.parquet');
sqlite> SELECT * FROM demo;
...if all goes well, you'll see data here!...
Supported features
Index
Only full table scans are supported.
Types
These types are supported:
- INT96 timestamps (exposed as milliseconds since the epoch)
- INT8/INT16/INT32/INT64
- UTF8 strings
- BOOLEAN
- FLOAT
- DOUBLE
- Variable- and fixed-length byte arrays
These are not supported:
- UINT8/UINT16/UINT32/UINT64
- DECIMAL