sqlite-parquet-vtable/README.md

55 lines
1.5 KiB
Markdown
Raw Normal View History

2018-03-04 01:26:41 +00:00
# sqlite-parquet-vtable
A SQLite [virtual table](https://sqlite.org/vtab.html) extension to expose Parquet files as SQL tables.
2018-03-04 01:26:41 +00:00
## Caveats
2018-03-04 18:04:58 +00:00
I'm not an experienced C/C++ programmer. This library is definitely not bombproof. It's good enough for my use case,
and may be good enough for yours, too.
2018-03-04 01:26:41 +00:00
* I don't use `sqlite3_malloc` and `sqlite3_free` for C++ objects
* Maybe this doesn't matter, since portability isn't a goal
2018-03-04 18:04:58 +00:00
* The C -> C++ interop definitely leaks some C++ exceptions
* Obvious cases like file not found and unsupported Parquet types are OK
* Low memory conditions aren't handled gracefully.
2018-03-04 01:26:41 +00:00
## Building
1. Install [`parquet-cpp`](https://github.com/apache/parquet-cpp)
2. Run `./build-sqlite` to fetch and build the SQLite dev bits
3. Run `./parquet/make` to build the module
1. You will need to fixup the paths in this file to point at your local parquet-cpp folder.
## Use
```
$ sqlite/sqlite3
sqlite> .load parquet/libparquet
2018-03-04 18:06:50 +00:00
sqlite> CREATE VIRTUAL TABLE demo USING parquet('parquet-generator/100-rows-1.parquet');
sqlite> SELECT * FROM demo;
...if all goes well, you'll see data here!...
```
2018-03-04 01:26:41 +00:00
## Supported features
### Index
Only full table scans are supported.
### Types
These types are supported:
* INT96 timestamps (exposed as milliseconds since the epoch)
* INT8/INT16/INT32/INT64
* UTF8 strings
* BOOLEAN
* FLOAT
* DOUBLE
These are not supported:
* UINT8/UINT16/UINT32/UINT64
* Fixed length byte arrays, including JSON and BSON subtypes
* DECIMAL