Repo that tracks pre-prints, long abstracts and other scholarly content for easy sharing
A collection of long abstracts, pre-prints and other scholarly content that is not yet published or is in the process of being published (or will never be published!), but it is still ready enough to be shared.
All content here is written with "research-y vibe", so expect an academic tone, citations etc. as opposed to the more casual approach that we take with our engineering blog posts.
If anything rings a bell, please reach out to Apo.
- DAG Lakehouse Planning with an Ephemeral and Embedded Graph Database, Luca Bigon, Jacopo Tagliabue, Semih Salihoğlu. TL;DR: we present a graph-based planning module that uses an embedded graph database to provide static checks and planning to a multi-language FaaS lakehouse.
- The Deconstructed Warehouse: an Ephemeral Query Engine Design for Apache Iceberg, Ryan Curtin, Jacopo Tagliabue. TL;DR: we present a novel design for integrating data catalogs, open formats and single-node engines into a "deconstructed warehouse" running over a FaaS-based infrastructure. We motivate a new command for DuckDB, EXPLAIN SCANS, which sits in between the logical and the physical plan as an intermediate optimization.