A SQLite vtable extension to read Parquet files
Go to file
Colin Dellow 0d4806ca6f Rejig parquet generation
- "fixed_size_binary" -> "binary_10"
- make null parquet use rowgroups of sie 10: first rowgroup
  has no nulls, 2nd has all null, 3rd-10th have alternating
  nulls

This is prep for making a Postgres layer to use as an oracle
for generating test cases so that we have good coverage before
implementing advanced `xBestIndex` and `xFilter` modes.
2018-03-06 21:02:26 -05:00
parquet Rejig parquet generation 2018-03-06 21:02:26 -05:00
parquet-generator Rejig parquet generation 2018-03-06 21:02:26 -05:00
tests Rejig parquet generation 2018-03-06 21:02:26 -05:00
.gitignore `ensureColumn` catches up when rows are skipped 2018-03-04 22:29:35 -05:00
LICENSE Initial commit 2018-03-02 18:37:08 -05:00
README.md Support BLOBs 2018-03-04 17:20:59 -05:00
build-sqlite Add script to fetch+build sqlite 2018-03-02 18:46:40 -05:00

README.md

sqlite-parquet-vtable

A SQLite virtual table extension to expose Parquet files as SQL tables.

Caveats

I'm not an experienced C/C++ programmer. This library is definitely not bombproof. It's good enough for my use case, and may be good enough for yours, too.

  • I don't use sqlite3_malloc and sqlite3_free for C++ objects
    • Maybe this doesn't matter, since portability isn't a goal
  • The C -> C++ interop definitely leaks some C++ exceptions
    • Obvious cases like file not found and unsupported Parquet types are OK
    • Low memory conditions aren't handled gracefully.

Building

  1. Install parquet-cpp
  2. Run ./build-sqlite to fetch and build the SQLite dev bits
  3. Run ./parquet/make to build the module
  4. You will need to fixup the paths in this file to point at your local parquet-cpp folder.

Use

$ sqlite/sqlite3
sqlite> .load parquet/libparquet
sqlite> CREATE VIRTUAL TABLE demo USING parquet('parquet-generator/100-rows-1.parquet');
sqlite> SELECT * FROM demo;
...if all goes well, you'll see data here!...

Supported features

Index

Only full table scans are supported.

Types

These types are supported:

  • INT96 timestamps (exposed as milliseconds since the epoch)
  • INT8/INT16/INT32/INT64
  • UTF8 strings
  • BOOLEAN
  • FLOAT
  • DOUBLE
  • Variable- and fixed-length byte arrays

These are not supported:

  • UINT8/UINT16/UINT32/UINT64
  • DECIMAL