A SQLite vtable extension to read Parquet files
Go to file
Colin Dellow 67b0d96967 float support 2018-03-03 20:57:09 -05:00
parquet float support 2018-03-03 20:57:09 -05:00
.gitignore Boolean, INT96, INT64 2018-03-03 20:00:50 -05:00
LICENSE Initial commit 2018-03-02 18:37:08 -05:00
README.md More defensive, add caveats 2018-03-03 20:30:46 -05:00
build-sqlite Add script to fetch+build sqlite 2018-03-02 18:46:40 -05:00

README.md

sqlite-parquet-vtable

A SQLite virtual table extension to expose Parquet files as SQL tables.

Caveats

I'm not a professional C/C++ programmer. These are the caveats I'm aware of, but there are probably others:

  • I don't use sqlite3_malloc and sqlite3_free for C++ objects
    • Maybe this doesn't matter, since portability isn't a goal
  • The C (SQLite API implementation) -> C++ interop (to talk to parquet-cpp) probably leaks some C++ exceptions
    • Your process may crash due to my error. Sorry!
    • I handle the obvious cases like file not found and unsupported Parquet types but I suspect low memory conditions aren't handled gracefully

Building

  1. Install parquet-cpp
  2. Run ./build-sqlite to fetch and build the SQLite dev bits
  3. Run ./parquet/make to build the module
  4. You will need to fixup the paths in this file to point at your local parquet-cpp folder.

Use

$ sqlite/sqlite3
sqlite> .load parquet/libparquet
sqlite> create virtual table demo USING parquet('demo.parquet');
sqlite> select * from demo limit 1;
...if all goes well, you'll see data here!...

Supported features

Index

Only full table scans are supported.

Types

These types are supported:

  • INT96 timestamps (exposed as milliseconds since the epoch)
  • INT8/INT16/INT32/INT64
  • UTF8 strings
  • BOOLEAN
  • FLOAT
  • DOUBLE

These are not supported:

  • UINT8/UINT16/UINT32/UINT64
  • Fixed length byte arrays, including JSON and BSON subtypes
  • DECIMAL