Releases: apache/orc
v2.0.0
Milestone
Branch
This is a new major release which we cannot provide a changelog.
Summary of notable changes
ORC-1547: Spin-off ORC Format
ORC-1572: Use Apache ORC Format 1.0.0
ORC-1507: Support Java 21
ORC-1512: Drop Java 8/11 and make Java 17 by default
ORC-1577: Use ZSTD as the default compression
ORC-1430: Use Hadoop 3.3.5 shaded clients
ORC-1456: Update Hadoop to 3.3.6
ORC-1251: Use Hadoop Vectored IO
ORC-1463: Support brotli codec
ORC-1100: Support vcpkg
ORC-1620: Add Apple Silicon Test Coverage
New Feature
ORC-998: Refactor compression output buffer within OutStream for better portability
ORC-1088: Suport ZSTD_JNI and columnn compress to set compression level
ORC-1100: Support vcpkg
ORC-1251: Use Hadoop Vectored IO
ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal
ORC-1440: Check for protobuf config based module
ORC-1463: Support brotli codec
ORC-1507: Use Zulu JDK distribution and switch from 21-ea to 21
ORC-1512: Drop Java 8/11 and make Java 17 by default
ORC-1531: Create orc-format module and repo
ORC-1545: Use orc-format 1.0.0-SNAPSHOT
ORC-1546: Use orc-format 1.0.0-alpha
ORC-1547: Spin-off ORC Format
ORC-1551: Use orc-format 1.0.0-beta
ORC-1572: Use Apache ORC Format 1.0.0
ORC-1585: [C++] Add orc-format_ep as a dependency of orc
Improvement
ORC-1459: Mark DataBuffer::size() and DataBuffer::capacity() as const
ORC-1460: specification: Clarify how dictionary entries are sorted
ORC-1461: Mark Int128::getHighBits() and Int128::getLowBits() as const
ORC-1472: Replace deprecated method in TestMurmur3.java
ORC-1479: Enhance example usage message to use Uber jar
ORC-1481: [C++] Better error message when TZDB is unavailable
ORC-1504: Add lower bound check in get API for DynamicIntArray
ORC-1506: Replacing deprecated valueOf() with recommended forNumber()
ORC-1509: Auto grant contributor role to first-time contributors
ORC-1520: Remove JDK 8 settings from pom
ORC-1567: Add the -ignoreExtension
configuration to the sizes
and count
commands of orc-tools
ORC-1570: Add supportVectoredIO
API to HadoopShimsCurrent
and use it
ORC-1571: Supports displaying raw data size in the meta command of orc-tools
ORC-1577: Use ZSTD as the default compression
ORC-1580: Change default DataBuffer constructor to use reserve instead of resize
ORC-1595: Add a short-cut to skip tiny inputs for ZstdCodec.compress
ORC-1596: Remove redundant Zstd.isError
JNI usage
ORC-1597: Set bloom filter fpp to 1%
ORC-1600: Reduce getStaticMemoryManager sync block in OrcFile
ORC-1601: Reduce get HadoopShims sync block in HadoopShimsFactory
ORC-1610: Reduce the number of hash computation in CuckooSetBytes
ORC-1613: Zstd decompression supports direct buffer
ORC-1631: Supports summary output in sizes command
ORC-1637: [C++] Port conan recipe from upstream conan center
ORC-1638: Avoid System.exit(0) in count command
ORC-1639: [C++] Reduce unnecessary compiler flags in CMake
ORC-1641: Remove sourceFileExcludes
from maven-javadoc-plugin
ORC-1642: Avoid System.exit(0)
in scan
command
ORC-1593: Set orc.compression.zstd.level to 3 by default
Bug Fix
ORC-634: Fix the json output for double NaN and infinite
ORC-1455: [C++] Fix build failure on non-x86 with unused macro in CpuInfoUtil.cc
ORC-1473: Zero-copy zeroCopyReadRanges and releaseBuffer bugs
ORC-1476: Maven build fail with unsupported platform: protoc-3.17.3-osx-aarch_64.exe
ORC-1480: [C++] Build failed when the BUILD_CPP_ENABLE_METRICS is ON
ORC-1500: [C++] The partition field does not support English special characters
ORC-1528: When using the orc.min.disk.seek.size configuration to read extremely large ORC files, a java.nio.BufferOverflowException may occur.
ORC-1553: Reading information from Row group, where there are 0 records of SArg column
ORC-1563: Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
ORC-1568: Use readDiskRanges
if orc.use.zerocopy
is enabled
ORC-1575: Use ASF Archive URL instead Download URL
ORC-1578: Fix SparkBenchmark according to SPARK-40918
ORC-1588: Fix incorrect Decimal assert in LeafFilterFactory
ORC-1602: [C++] limit compression block size
Task
ORC-1422: Setting version to 2.0.0-SNAPSHOT
ORC-1434: Remove org.apache.hadoop
from dependabot.yml
ORC-1484: Use JIRA_ACCESS_TOKEN in merge_orc_pr.py
ORC-1485: Enable checkstyle checks for test classes
ORC-1486: Fix checkstyle violations for tests in orc-core module
ORC-1492: Fix checkstyle violations for tests in mapreduce
, tools
, bench
modules
ORC-1496: Use iterator to suggest backporting branches
ORC-1515: Skip publishing orc-example module
ORC-1516: Fix minor typo in comments in IOUtils
ORC-1518: Remove findbugs folders
ORC-1529: Fix minor typos in pom.xml
ORC-1530: Rename variables in RecordReaderUtils.ChunkReader#create
ORC-1535: Remove generated Java docs from source tree
ORC-1536: Remove hive-storage-api
link from maven-javadoc-plugin
ORC-1540: Remove MacOS 11 from GitHub Action CI
ORC-1542: Use Pattern Matching for instanceof
(JEP-394)
ORC-1549: Update libhdfspp.tar.gz
by adding #include <cstdint>
ORC-1569: Remove HadoopShimsPre2_3, HadoopShimsPre2_6, HadoopShimsPre2_7 classes
ORC-1579: Add ASF Generative Tooling Guidance
to PR template
ORC-1591: Lower log level from INFO ...
v1.9.2
Milestone
Changelog
Bug
ORC-1475: [C++] Fix the failure of UT when char is unsigned
ORC-1480: [C++] Fix build break w/ BUILD_CPP_ENABLE_METRICS=ON
ORC-1482: Adaptation to read ORC files created by CUDF
ORC-1489: Assign a writer id to CUDF
ORC-1525: Fix bad read in RleDecoderV2::readByte
Test
ORC-1431: Use parquet to 1.13.1 in bench module
ORC-1454: Update Spark to 3.4.1
ORC-1487: Enable checkstyle on src/test with checkstyle-suppressions.xml
ORC-1498: Add Debian 12
Docker test
ORC-1502: Upgrade Maven to 3.9.4
ORC-1505: Upgrade Spark to 3.5.0
ORC-1511: Bump Avro to 1.11.3 in bench module
ORC-1513: Upgrade snappy-java to 1.1.10.4 in bench module
ORC-1517: Bump snappy-java to 1.1.10.5 in bench module
Task
ORC-1497: Bump maven-enforcer-plugin
to 3.4.0
ORC-1499: Add MacOS 13 and 14 to building.md
ORC-1507: Use Zulu JDK distribution and switch from 21-ea to 21
ORC-1518: Remove findbugs folders
Documentation
ORC-1503: Updated README.md with Maven version 3.9.4
v1.8.6
v1.7.10
v1.8.5
v1.9.1
Milestone
Changelog
Bug
- ORC-1455 Fix build failure on non-x86 with unused macro in
CpuInfoUtil.cc
- ORC-1457 Fix ambiguous overload of
Type::createRowBatch
- ORC-1462 Bump
aircompressor
to 0.25 to fix JDK-8081450
Test
v1.9.0
Milestone
Changelog
New Feature and Notable Changes
- ORC-961 Expose metrics of the reader
- ORC-1167 Support orc.row.batch.size configuration
- ORC-1252 Expose io metrics for write operation
- ORC-1301 Enforce C++17
- ORC-1310 allowlist Support for plugin filter
- ORC-1356 Use Intel AVX-512 instructions to accelerate the Rle-bit-packing decode
- ORC-1385 Support schema evolution from numeric to numeric
- ORC-1386 Support schema evolution from primitive to string group/decimal/timestamp
Improvement
- ORC-827 Utilize Array copyOf
- ORC-1170 Optimize the RowReader::seekToRow function
- ORC-1232 Disable metrics collector by default
- ORC-1278 Update Readme.md cmake to 3.12
- ORC-1279 Update cmake version
- ORC-1286 Replace DataBuffer with BlockBuffer in the BufferedOutputStream
- ORC-1298 Support dedicated ColumnVectorBatch of numeric types
- ORC-1302 Upgrade Github workflow to build on Windows
- ORC-1306 Fixed indented code style for Java modules
- ORC-1307 Add coding style enforcement
- ORC-1314 Remove macros defined before C++11
- ORC-1347 Use make_unique and make_shared when creating unique_ptr and shared_ptr
- ORC-1348 TimezoneImpl constructor should pass std::vector<> & instead of std::vector<>
- ORC-1349 Remove useless bufStream definition
- ORC-1352 Remove ORC_[NOEXCEPT|NULLPTR|OVERRIDE|UNIQUE_PTR] macro usages
- ORC-1355 Writer::addUserMetadata change parameter to reference
- ORC-1373 Add log when DynamicByteArray length overflow
- ORC-1401 Allow writing an intermediate footer
- ORC-1421 Use PyArrow 12.0.0 in document
Bug
- ORC-1225 Bump maven-assembly-plugin to 3.4.2
- ORC-1266 DecimalColumnVector resets the isRepeating flag in the nextVector method
- ORC-1273 Bump opencsv to 5.7.0
- ORC-1297 Bump opencsv to 5.7.1
- ORC-1304 throw ParseError when using SearchArgument with nested struct
- ORC-1315 Byte to integer conversions fail on platforms with unsigned char type
- ORC-1320 Fix build break of C++ code on docker images
- ORC-1363 Upgrade
zookeeper
to 3.8.1 - ORC-1368 Bump commons-csv to 1.10.0
- ORC-1398 Bump
aircompressor
to 0.24 - ORC-1399 Fix boolean type with useTightNumericVector enabled
- ORC-1433 Fix comment in the Vector.hh
- ORC-1447 Fix a bug in CpuInfoUtil.cc to support ARM platform
- ORC-1449 Add
-Wno-unused-macros
for Clang 14.0 - ORC-1450 Stop enforcing override keyword
- ORC-1453 Fix
fall-through
warning cases
Task
- ORC-1164 Setting version to 1.9.0-SNAPSHOT
- ORC-1218 Bump apache pom to 27
- ORC-1219 Remove redundant
toString
- ORC-1237 Remove a wrong image link to
article-footer.png
- ORC-1239 Upgrade maven-shade-plugin to 3.3.0
- ORC-1256 Publish test-jar to maven central
- ORC-1259 Bump
slf4j
to 2.0.0 - ORC-1269 Remove FindBugs
- ORC-1270 Move opencsv dependency to the tools module.
- ORC-1274 Add a checkstyle rule to ban starting LAND and LOR
- ORC-1275 Bump maven-jar-plugin to 3.3.0
- ORC-1276 Bump
slf4j
to 2.0.1 - ORC-1277 Bump maven-shade-plugin to 3.4.0
- ORC-1284 Add permissions to GitHub Action labeler
- ORC-1296 Bump reproducible-build-maven-plugin to 0.16
- ORC-1311 Bump maven-shade-plugin to 3.4.1
- ORC-1316 Bump slf4j.version to 2.0.4
- ORC-1334 Bump slf4j.version to 2.0.6
- ORC-1335 Bump netty-all to 4.1.86.Final
- ORC-1351 Update PR Labeler definition
- ORC-1358 Use spotless to format pom files
- ORC-1371 Remove unsupported SLF4J bindings from classpath
- ORC-1372 Bump
zstd
to v1.5.4 - ORC-1375 Cancel old running ci tasks when a pr has a new commit
- ORC-1377 Enforce override keyword
- ORC-1383 Upgrade
aircompressor
to 0.22 - ORC-1395 Enforce license check
- ORC-1396 Bump
slf4j
to 2.0.7 - ORC-1410 Bump
zstd
to v1.5.5 - ORC-1411 Remove Ubuntu18.04 from docker-based tests
- ORC-1419 Bump
protobuf-java
to 3.22.3 - ORC-1428 Setup GitHub Action CI on
branch-1.9
- ORC-1443 Enforce Java version
- ORC-1444 Enforce JDK Bytecode version
- ORC-1446 Publish snapshot from branch-1.9
Test
- ORC-1231 Update supported OS list in
building.md
- ORC-1233 Bump
junit
to 5.9.0 - ORC-1234 Upgrade
objenesis
to 3.2 in Spark benchmark - ORC-1235 Bump
avro
to 1.11.1 - ORC-1240 Update site README to use apache/orc-dev
- ORC-1241 Use
apache/orc-dev
DockerHub repository in Docker tests - ORC-1250 Bump
mockito
to 4.7.0 - ORC-1254 Add
spotbugs
check - ORC-1258 Bump
byte-buddy
to 1.12.14 - ORC-1262 Bump
maven-checkstyle-plugin
to 3.2.0 - ORC-1265 Upgrade
spotbugs
to 4.7.2 - ORC-1267 Bump
mockito
to 4.8.0 - ORC-1271 Bump
spotbugs-maven-plugin
to 4.7.2.0 - ORC-1272 Bump
byte-buddy
to 1.12.16 - ORC-1300 Update Spark to 3.3.1 and its dependencies
- ORC-1303 Upgrade
GoogleTest
to 1.12.1 - ORC-1318 Upgrade mockito.version to 4.9.0
- ORC-1319 Upgrade byte-buddy to 1.12.19
- ORC-1321 Bump checkstyle to 10.5.0
- ORC-1322 Upgrade centos7 docker image to use gcc9
- ORC-1324 Use Java 19 instead of 18 in GHA
- ORC-1333 Bump
mockito
to 4.10.0 - ORC-1341 Bump
mockito
to 4.11.0 - ORC-1353 Bump
byte-buddy
to 1.12.21 - ORC-1359 Bump
byte-buddy
to 1.12.22 - ORC-1366 Bump
checkstyle
to 10.7.0 - ORC-1367 Bump
maven-enforcer-plugin
to 3.2.1 - ORC-1369 Bump
byte-buddy
to 1.12.23 - ORC-1370 Bump
snappy-java
to 1.1.9.1 - ORC-1374 Update Spark to 3.3.2
- ORC-1378 Add slf4j impl to avoid warning message in example module
- ORC-1379 Upgrade
spotbugs
to 4.7.3.2 - ORC-1380 Upgrade
checkstyle
to 10.8.0 - ORC-1394 Bump
maven-assembly-plugin
to 3.5.0 - ORC-1397 Bump
checkstyle
to 10.9.2 - ORC-1405 Bump
spotbugs-maven-plugin
to 4.7.3.4 - ORC-1406 Bump
maven-enforcer-plugin
to 3.3.0 - ORC-1408 Add
testVectorBatchHasNull
test case and comment - ORC-1415 Add Java 20 to GitHub Action CI
- ORC-1417 Bump
checkstyle
to 10.10.0 - ORC-1418 Bump
junit
to 5.9.3 - ORC-1426 Use Java
21-ea
instead of 20 in GitHub Action - ORC-1435 Bump
maven-checkstyle-plugin
to 3.3.0 - ORC-1436 Bump
snappy-java
to 1.1.10.0 - ORC-1452 Use the latest OS versions in variant tests
v1.8.4
Milestone
Changelog
Bug
ORC-1304: [C++] Fix seeking over empty PRESENT stream
ORC-1400: Use Hadoop 3.3.5 on Java 17+ and benchmark
ORC-1413: Fix for ORC row level filter issue with ACID table
Test:
ORC-1404 Bump parquet to 1.13.0
ORC-1414 Upgrade java bench module to spark3.4
ORC-1416 Upgrade Jackson dependency to 2.14.2 in bench module
ORC-1420 Pin net.bytebuddy package to 1.12.x
Task:
ORC-1395 Enforce license check via github action
v1.7.9
v1.8.3
Milestone
Changelog
Bug
- ORC-1357 Handle missing compression block size
- ORC-1382 Fix secondary config names org.sarg.* to orc.sarg.*
- ORC-1384 Fix ArrayIndexOutOfBoundsException when reading dictionary stream bigger then dictionary
- ORC-1393 Add reset(DiskRangeList input, long length) to InStream impl class
Test
Tasks
- ORC-1358 Use spotless to format pom files