December & January Bulletin

The work mentioned in the previous bulletin to revamp Array evaluation to be fully lazy was released on January 6th. This happens by converting their execution to an Operator model that evaluates into Vectors (fully decompressed, zero-copy to Arrow representation). As a reminder, this work enables many more optimizations, and also provides unified abstractions for evaluating on different processor types (CPUs & GPUs).

GPU Support

Speaking of, focus is now on support for reading Vortex files into GPUs. To achieve this, the team is

Adding GPU decompression for existing kernels, as well as some new encodings optimized for GPU; and
Integrating with NVIDIA's CUDA toolkit for high performance I/O.

The first supported output types will be Arrow Device Arrays and cuDF, and a few key PRs have already landed.

Add CUDA kernels for decoding bitpacked arrays on GPU (#6145)
GPU filter kernel for primitive types (#6188)
GPU filter kernel for decimals, strings and binary (#6196)
GPU Scans (#6199)

As with all of Vortex, these capabilities are fully exposed to plugins so advanced users can extend and customize for their own use.

Around the Ecosystem

DuckDB Labs published performance benchmarks comparing Vortex (using the vortex extension) versus Parquet. Spoiler alert: Vortex reduced total query time by ~18-35%!

Spice AI began a blog series about building their data accelerato Cayenne with Vortex and Datafusion:

DataFusion: Over the past two months we've released support for DataFusion 51, and then merged support for DataFusion 52.

Expression conversion is now extendable, allowing users to push down custom UDFs or any other DataFusion expressions.
Support for schema evolution between files within the same table has improved dramatically, fixing many bugs and and allowing deeper pushdown over nested structs columns.
We now support tables that have more complex arrow types (Like Dict, REE), while also doing less work exporting from Vortex to Arrow.
Delegate some caching to to DataFusion's built in caching, allowing users to tune it using familiar configurations.

Acknowledgments

We want to thank to anyone who has tried Vortex, provided feedback, asked question and filed issues.

The following contributed to the December & January releases.

Joe Isaacs
Adam Gutglick
Connor Tsui
Nicholas Gates
Alexander Droste
Robert Kruszewski
Alfonso Subiotto Marqués
Andrew Duffy
Cancai Cai
Onur Satici
Dmitrii Blaginin
Dan King
Baris Palaska
godnight10061
sherlockbeard
Frederic Branczyk
paultiq
Pratham Agarwal
Hao Huaijin
Dave Bunten
Harry Scholes