mirror of
https://github.com/cldellow/sqlite-parquet-vtable.git
synced 2025-09-14 22:39:59 +00:00
Cache clauses -> row group mapping
Create a shadow table. For `stats`, it'd be `_stats_rowgroups`. It contains three columns: - the clause (eg `city = 'Dawson Creek'`) - the initial estimate, as a bitmap of rowgroups based on stats - the actual observed rowgroups, as a bitmap This papers over poorly sorted parquet files, at the cost of some disk space. It makes interactive queries much more natural -- drilldown style queries are much faster, as they can leverage work done by previous queries. eg 'SELECT * FROM stats WHERE city = 'Dawson Creek' and question_id >= 1935 and question_id <= 1940` takes ~584ms on first run, but 9ms on subsequent runs. We only create entries when the estimates don't match the actual results. Fixes #6
This commit is contained in:
@@ -6,16 +6,19 @@
|
||||
#include "parquet/api/reader.h"
|
||||
|
||||
class ParquetTable {
|
||||
std::string file;
|
||||
std::string tableName;
|
||||
std::vector<std::string> columnNames;
|
||||
std::shared_ptr<parquet::FileMetaData> metadata;
|
||||
|
||||
|
||||
public:
|
||||
ParquetTable(std::string file);
|
||||
ParquetTable(std::string file, std::string tableName);
|
||||
std::string CreateStatement();
|
||||
std::string file;
|
||||
std::string columnName(int idx);
|
||||
std::shared_ptr<parquet::FileMetaData> getMetadata();
|
||||
const std::string& getFile();
|
||||
const std::string& getTableName();
|
||||
};
|
||||
|
||||
#endif
|
||||
|
Reference in New Issue
Block a user