feat: support rangebitmap read and write#185
feat: support rangebitmap read and write#185fafacao86 wants to merge 13 commits intoalibaba:mainfrom
Conversation
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Adds RangeBitmap file index support (read/write) and validates it with new unit/integration tests plus embedded test datasets to close #146.
Changes:
- Implement RangeBitmap file index reader/writer and factory registration.
- Add comprehensive UTs for RangeBitmap behavior across types and edge-cases, plus IT coverage using PaIOn-generated datasets.
- Add ORC/Parquet test datasets (single-chunk and multi-chunk) with range-bitmap index metadata.
Reviewed changes
Copilot reviewed 31 out of 51 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/snapshot-1 | Adds Parquet multi-chunk snapshot metadata for ITs. |
| test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/LATEST | Adds Parquet multi-chunk latest snapshot pointer. |
| test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/EARLIEST | Adds Parquet multi-chunk earliest snapshot pointer. |
| test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/schema/schema-0 | Adds schema/options enabling range-bitmap with small chunk size for multi-chunk behavior. |
| test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/README | Documents Parquet multi-chunk dataset rows and index config. |
| test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/snapshot-1 | Adds Parquet single-chunk snapshot metadata for ITs. |
| test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/LATEST | Adds Parquet single-chunk latest snapshot pointer. |
| test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/EARLIEST | Adds Parquet single-chunk earliest snapshot pointer. |
| test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/schema/schema-0 | Adds schema/options enabling range-bitmap for single-chunk case. |
| test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/README | Documents Parquet single-chunk dataset rows and index config. |
| test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/snapshot-1 | Adds ORC multi-chunk snapshot metadata for ITs. |
| test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/LATEST | Adds ORC multi-chunk latest snapshot pointer. |
| test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/EARLIEST | Adds ORC multi-chunk earliest snapshot pointer. |
| test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/schema/schema-0 | Adds schema/options enabling range-bitmap + ORC format + small chunk size for multi-chunk behavior. |
| test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/README | Documents ORC multi-chunk dataset rows and index config. |
| test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/snapshot-1 | Adds ORC single-chunk snapshot metadata for ITs. |
| test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/LATEST | Adds ORC single-chunk latest snapshot pointer. |
| test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/EARLIEST | Adds ORC single-chunk earliest snapshot pointer. |
| test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/schema/schema-0 | Adds schema/options enabling range-bitmap + ORC format for single-chunk case. |
| test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/README | Documents ORC single-chunk dataset rows and index config. |
| test/inte/read_inte_with_index_test.cpp | Adds IT assertions for RangeBitmap index across predicates and data patterns (single/multi-chunk). |
| src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp | Adds UTs that roundtrip writer/reader and validate predicate behavior and edge cases. |
| src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_factory.h | Declares factory for RangeBitmap file index. |
| src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_factory.cpp | Implements factory creation and registration. |
| src/paimon/common/file_index/rangebitmap/range_bitmap_file_index.h | Declares RangeBitmap file index, reader, and writer APIs. |
| src/paimon/common/file_index/rangebitmap/range_bitmap_file_index.cpp | Implements index reader/writer based on RangeBitmap serialization. |
| src/paimon/common/file_index/rangebitmap/range_bitmap.h | Declares RangeBitmap query API and append/serialize builder. |
| src/paimon/common/file_index/rangebitmap/range_bitmap.cpp | Implements RangeBitmap read path and serialization format. |
| src/paimon/common/file_index/rangebitmap/dictionary/key_factory.h | Renames default chunk size constant to match naming conventions. |
| src/paimon/common/file_index/CMakeLists.txt | Adds RangeBitmap sources to build. |
| src/paimon/CMakeLists.txt | Registers new RangeBitmap unit test in test build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp
Outdated
Show resolved
Hide resolved
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp
Show resolved
Hide resolved
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp
Show resolved
Hide resolved
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp
Show resolved
Hide resolved
|
@fafacao86 Thanks for the contribution! You can refer to issue #188 and take a look at this example test as a pattern. The goal is to ensure the range index gracefully handles I/O failures (e.g., read errors, file not found, etc.), similar to how Let me know if you need any help — happy to assist! Thanks! |
Sure, let me take a look. |
|
Is my understanding of the test purpose of |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Great contribution — the range index is high quality and well-designed! 👍 |
I plan to first support String type in the near future, rangebitmap not only has the ability to do range query on numerics, but also supports string type EQ lookup which is quite useful I think.Timestamp will be later work. |
| Result<PAIMON_UNIQUE_PTR<Bytes>> RangeBitmap::Appender::Serialize() const { | ||
| int32_t code = 0; | ||
| const int32_t max_code = bitmaps_.empty() ? 0 : static_cast<int32_t>(bitmaps_.size() - 1); | ||
| PAIMON_ASSIGN_OR_RAISE(auto bsi, BitSliceIndexBitmap::Appender::Create(0, max_code, pool_)); |
There was a problem hiding this comment.
please change "auto" to "std::unique_ptrBitSliceIndexBitmap::Appender"
| return Status::Invalid(fmt::format( | ||
| "Chunk size cannot be larger than 2GB, current bytes: {}", chunk_size_bytes_limit_)); | ||
| } | ||
| PAIMON_ASSIGN_OR_RAISE(auto dictionary, |
Purpose
Linked issue: close #146
Tests
UT in
rangebtimap_file_index_test.cppIT in
paimon::test::ReadInteWithIndexTest::CheckResultForRangeBitmapdata is generated using paimon-java v1.3.1.
Same data, same queries, with single-chunk and multi-chunk, result should be the same.
tests are mainly written by AI, reviewed by human.
test coverage:
range_bitmap_file_index.cpp is a little low(82.7%) is because No write integration test to cover CreateWriter Method.
API and Format
Documentation
Generative AI tooling
Generated-by: Kimi K2.5