Vortex Logo
GitHubDocsBenchBlog
← Back to blog

September Bulletin #2

byCommunity Team

Editor's note: Its been pointed out to us that while this post is published in October, it covers work done in September, so this will be the second September Bulletin.

This is the second monthly issue of all that has been happening in Vortex!

Development keeps moving quickly, we released 0.54.0 with many new features and improvements to the Rust API. This month we accepted 227 commits from 14 different contributors, with the relentless renovate-bot taking the crown for most changes merged.

Core

  1. Published an RFC presenting our planned changes to how arrays and compute interact with each other, introducing what we call Operators.
  2. Started work on GPU-powered kernels.
  3. Merged a few big changes to the IO APIs (#4557, #4608), which introduce new push-based write API (in addition to the existing stream-based one), and writes now return both the file's footer (which can now be cached, reused and even de/serialized #4598), and the file's total compressed size.
  4. Added a new encoding - fastlanes-based RLE #4588, #4789.
  5. The work on the new FixedSizeList keeps moving forward - #4590, #4601.
  6. Added a new canonical encoding for list types, see the tracking issue for full details and work.
  7. Fixed a long-standing issue where we used to underestimate the uncompressed size of arrays, causing inaccurate array stats and a consistent underestimation of the overall data size. #4963.
  8. Improved performance for very wide tables (1K+ columns), improving a new compression benchmark by over 80%. #4863 #4868 #4877
  9. Allow users to write with different compression strategies from Python #4825

Integrations

DuckDB

  1. Landed a release of the Vortex extension for the most recent DuckDB release (1.4). As always, you can install and load the extension with:

    INSTALL vortex FROM community;
    LOAD vortex;
    
  2. Improved the testing of our DuckDB extension, making it easier and faster to run tests, using a debug build of DuckDB.

  3. Added zero-copy exporting of arrays to DuckDB #4812 #4804

Apache DataFusion

  1. Use the built-in FilePruner to prune file based on the full expression, even for expressions we can't push down yet like dynamic expressions.
  2. Support for tables with hive-style partitioning.
  3. Updated our Apache DataFusion integration to the most recent release (v50.1.0) #4577

Acknowledgments

We want to thank to anyone who has tried Vortex, provided feedback, asked question and filed issues.

Special thanks go for all the contributors who took the time and care to contribute to Vortex this month (in descending count of commits):

    41  Adam Gutglick
    32  Connor Tsui
    22  Joe Isaacs
    21  Robert Kruszewski
    21  Alexander Droste
    20  Onur Satici
    20  Nicholas Gates
    10  Dan King
     4  Dmitrii Blaginin
     2  Andrew Duffy
     1  Will Manning
     1  Maksim Dergousov
     1  Evan Martin
     1  Alfonso Subiotto Marqués
Copyright © Vortex a Series of LF Projects, LLC.
Spiral LogoDonated by Spiral.