A SQLite vtable extension to read Parquet files
Go to file
Colin Dellow 5559a7b563 Fix when last rowgroup is not same size as first
...change test data to use 99 rows, so that when we have
rowgroup size 10 we exercise this code.
2018-03-11 15:15:27 -04:00
parquet Fix when last rowgroup is not same size as first 2018-03-11 15:15:27 -04:00
parquet-generator Fix when last rowgroup is not same size as first 2018-03-11 15:15:27 -04:00
tests Fix when last rowgroup is not same size as first 2018-03-11 15:15:27 -04:00
.gitignore test-queries: can debug a testcase 2018-03-10 11:54:36 -05:00
LICENSE Initial commit 2018-03-02 18:37:08 -05:00
README.md Support BLOBs 2018-03-04 17:20:59 -05:00
build-sqlite Add script to fetch+build sqlite 2018-03-02 18:46:40 -05:00

README.md

sqlite-parquet-vtable

A SQLite virtual table extension to expose Parquet files as SQL tables.

Caveats

I'm not an experienced C/C++ programmer. This library is definitely not bombproof. It's good enough for my use case, and may be good enough for yours, too.

  • I don't use sqlite3_malloc and sqlite3_free for C++ objects
    • Maybe this doesn't matter, since portability isn't a goal
  • The C -> C++ interop definitely leaks some C++ exceptions
    • Obvious cases like file not found and unsupported Parquet types are OK
    • Low memory conditions aren't handled gracefully.

Building

  1. Install parquet-cpp
  2. Run ./build-sqlite to fetch and build the SQLite dev bits
  3. Run ./parquet/make to build the module
  4. You will need to fixup the paths in this file to point at your local parquet-cpp folder.

Use

$ sqlite/sqlite3
sqlite> .load parquet/libparquet
sqlite> CREATE VIRTUAL TABLE demo USING parquet('parquet-generator/100-rows-1.parquet');
sqlite> SELECT * FROM demo;
...if all goes well, you'll see data here!...

Supported features

Index

Only full table scans are supported.

Types

These types are supported:

  • INT96 timestamps (exposed as milliseconds since the epoch)
  • INT8/INT16/INT32/INT64
  • UTF8 strings
  • BOOLEAN
  • FLOAT
  • DOUBLE
  • Variable- and fixed-length byte arrays

These are not supported:

  • UINT8/UINT16/UINT32/UINT64
  • DECIMAL