1
0
mirror of https://github.com/cldellow/sqlite-parquet-vtable.git synced 2025-04-03 09:39:47 +00:00
Colin Dellow 5559a7b563 Fix when last rowgroup is not same size as first
...change test data to use 99 rows, so that when we have
rowgroup size 10 we exercise this code.
2018-03-11 15:15:27 -04:00
2018-03-10 11:54:36 -05:00
2018-03-02 18:46:40 -05:00
2018-03-02 18:37:08 -05:00
2018-03-04 17:20:59 -05:00

sqlite-parquet-vtable

A SQLite virtual table extension to expose Parquet files as SQL tables.

Caveats

I'm not an experienced C/C++ programmer. This library is definitely not bombproof. It's good enough for my use case, and may be good enough for yours, too.

  • I don't use sqlite3_malloc and sqlite3_free for C++ objects
    • Maybe this doesn't matter, since portability isn't a goal
  • The C -> C++ interop definitely leaks some C++ exceptions
    • Obvious cases like file not found and unsupported Parquet types are OK
    • Low memory conditions aren't handled gracefully.

Building

  1. Install parquet-cpp
  2. Run ./build-sqlite to fetch and build the SQLite dev bits
  3. Run ./parquet/make to build the module
  4. You will need to fixup the paths in this file to point at your local parquet-cpp folder.

Use

$ sqlite/sqlite3
sqlite> .load parquet/libparquet
sqlite> CREATE VIRTUAL TABLE demo USING parquet('parquet-generator/100-rows-1.parquet');
sqlite> SELECT * FROM demo;
...if all goes well, you'll see data here!...

Supported features

Index

Only full table scans are supported.

Types

These types are supported:

  • INT96 timestamps (exposed as milliseconds since the epoch)
  • INT8/INT16/INT32/INT64
  • UTF8 strings
  • BOOLEAN
  • FLOAT
  • DOUBLE
  • Variable- and fixed-length byte arrays

These are not supported:

  • UINT8/UINT16/UINT32/UINT64
  • DECIMAL
Description
A SQLite vtable extension to read Parquet files
Readme Apache-2.0 570 KiB
Languages
C++ 55.8%
PLpgSQL 22.2%
Python 11.2%
Shell 10.8%