|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Apache Arrow 19.0.0 Release" |
| 4 | +date: "2025-01-16 00:00:00" |
| 5 | +author: pmc |
| 6 | +categories: [release] |
| 7 | +--- |
| 8 | +<!-- |
| 9 | +{% comment %} |
| 10 | +Licensed to the Apache Software Foundation (ASF) under one or more |
| 11 | +contributor license agreements. See the NOTICE file distributed with |
| 12 | +this work for additional information regarding copyright ownership. |
| 13 | +The ASF licenses this file to you under the Apache License, Version 2.0 |
| 14 | +(the "License"); you may not use this file except in compliance with |
| 15 | +the License. You may obtain a copy of the License at |
| 16 | +
|
| 17 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 18 | +
|
| 19 | +Unless required by applicable law or agreed to in writing, software |
| 20 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 21 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 22 | +See the License for the specific language governing permissions and |
| 23 | +limitations under the License. |
| 24 | +{% endcomment %} |
| 25 | +--> |
| 26 | + |
| 27 | +The Apache Arrow team is pleased to announce the 19.0.0 release. This release |
| 28 | +covers over 2 months of development work and includes [**202 resolved |
| 29 | +issues**][1] on [**330 distinct commits**][2] from [**67 distinct |
| 30 | +contributors**][2]. See the [Install Page](https://arrow.apache.org/install/) to |
| 31 | +learn how to get the libraries for your platform. |
| 32 | + |
| 33 | +The release notes below are not exhaustive and only expose selected highlights |
| 34 | +of the release. Many other bugfixes and improvements have been made: we refer |
| 35 | +you to the [complete changelog][3]. |
| 36 | + |
| 37 | +## Community |
| 38 | + |
| 39 | +Since the 18.1.0 release, Adam Reeve and Laurent Goujon have been invited to |
| 40 | +become committers. Gang Wu has been invited to join the Project Management |
| 41 | +Committee (PMC). |
| 42 | + |
| 43 | +Thanks for your contributions and participation in the project! |
| 44 | + |
| 45 | +## Release Highlights |
| 46 | + |
| 47 | +A [bug](https://github.com/apache/arrow/issues/45283) has been identified in the |
| 48 | +19.0.0 versions of the C++ and Python libraries which prevents reading Parquet |
| 49 | +files written by Arrow Rust v53.0.0 or higher. The files written by Arrow Rust |
| 50 | +are correct and the bug was in the patch adding support for Parquet's |
| 51 | +[SizeStatistics](https://github.com/apache/parquet-format/pull/197) feature to |
| 52 | +Arrow C++ and Python. See [#45293](https://github.com/apache/arrow/issues/45283) |
| 53 | +for more details including a potential workaround. |
| 54 | + |
| 55 | +As a result, we plan to create a 19.0.1 release to include a fix for this which |
| 56 | +should be available in next few weeks. |
| 57 | + |
| 58 | +## Columnar Format |
| 59 | + |
| 60 | +We've added a new experimental specification for representing statistics on |
| 61 | +Arrow Arrays as Arrow Arrays. This is useful for preserving and exchanging |
| 62 | +statistics between systems such as when converting Parquet data to Arrow. See |
| 63 | +[the statistics schema |
| 64 | +documentation](https://arrow.apache.org/docs/format/StatisticsSchema.html) for |
| 65 | +details. |
| 66 | + |
| 67 | +We've expanded the Arrow C Device Data Interface to include an experimental |
| 68 | +Async Device Stream Interface. While the existing Arrow C Device Data Interface |
| 69 | +is a pull-oriented API, the Async interface provides a push-oriented design for |
| 70 | +other workflows. See the |
| 71 | +[documentation](https://arrow.apache.org/docs/format/CDeviceDataInterface.html#async-device-stream-interface) |
| 72 | +for more information. It currently has implementations in the C++ and Go |
| 73 | +libraries. |
| 74 | + |
| 75 | +## Arrow Flight RPC Notes |
| 76 | + |
| 77 | +The precision of a Timestamp (used for timeouts) is now nanoseconds on all |
| 78 | +platforms; previously it was platform-dependent. This may be a breaking change |
| 79 | +depending on your use case. |
| 80 | +([#44679](https://github.com/apache/arrow/issues/44679)) |
| 81 | + |
| 82 | +The Python bindings now support various new fields that were added to |
| 83 | +FlightEndpoint/FlightInfo (like `expiration_time`). |
| 84 | +([#36954](https://github.com/apache/arrow/issues/36954)) |
| 85 | + |
| 86 | +## C++ Notes |
| 87 | + |
| 88 | +### Compute |
| 89 | + |
| 90 | +- It is now possible to cast from a struct type to another struct type with |
| 91 | +additional columns, provided the additional columns are nullable |
| 92 | +([#44555)](https://github.com/apache/arrow/issues/44555). |
| 93 | +- The compute function `expm1` has been added to compute `exp(x) - 1` with better |
| 94 | +accuracy when the input value is close to 0 |
| 95 | +([#44903](https://github.com/apache/arrow/issues/44903)). |
| 96 | +- Hyperbolic trigonometric functions and their reciprocals have also been added. |
| 97 | +([#44952](https://github.com/apache/arrow/issues/44952)). |
| 98 | +- The new Decimal32 and Decimal64 types have been further supported by allowing |
| 99 | +casting between numeric, string, and other decimal types |
| 100 | +([#43956](https://github.com/apache/arrow/issues/43956)). |
| 101 | + |
| 102 | +### Acero |
| 103 | + |
| 104 | +- Added AVX2 support for decoding row tables in the Swiss join specialization of |
| 105 | +hash joins, enabling up to 40% performance improvement for build-heavy |
| 106 | +workloads. ([#43693](https://github.com/apache/arrow/issues/43693)) |
| 107 | + |
| 108 | +### Filesystems |
| 109 | + |
| 110 | +- The S3 filesystem has gained support for server-side encryption with customer |
| 111 | +provided keys, aka SSE-C. |
| 112 | +([#43535](https://github.com/apache/arrow/issues/43535)) |
| 113 | +- The S3 filesystem also gained an option to disable the SIGPIPE signals that |
| 114 | +may be emitted on some network events. |
| 115 | +([#44695](https://github.com/apache/arrow/issues/44695)) |
| 116 | +- The Azure filesystem now supports SAS token authentication. |
| 117 | +([#44308](https://github.com/apache/arrow/issues/44308)). |
| 118 | + |
| 119 | +### Flight RPC |
| 120 | + |
| 121 | +- The precision of a Timestamp (used for timeouts) is now nanoseconds on all |
| 122 | + platforms; previously it was platform-dependent. This may be a breaking change |
| 123 | +depending on your use case. |
| 124 | + ([#44679](https://github.com/apache/arrow/issues/44679)) |
| 125 | +- The Python bindings now support various new fields that were added to |
| 126 | + FlightEndpoint/FlightInfo (like `expiration_time`). |
| 127 | + ([#36954](https://github.com/apache/arrow/issues/36954)) |
| 128 | +- The UCX backend has been deprecated and is scheduled for removal. |
| 129 | + ([#45079](https://github.com/apache/arrow/issues/45079)) |
| 130 | + |
| 131 | +### Parquet |
| 132 | + |
| 133 | +- The initial footer read size can now be configured to reduce the number of |
| 134 | +potential round-trips on hi-latency filesystems such as S3. |
| 135 | +([#45015](https://github.com/apache/arrow/issues/45015)) |
| 136 | +- The new `SizeStatistics` format feature has been implemented, though it is |
| 137 | +disabled by default when writing. |
| 138 | +([#40592](https://github.com/apache/arrow/issues/40592)) |
| 139 | +- We've added a new method to the ParquetFileReader class, |
| 140 | +[GetReadRanges](https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet17ParquetFileReader13GetReadRangesERKNSt6vectorIiEERKNSt6vectorIiEE7int64_t7int64_t), |
| 141 | +which can calculate the byte ranges necessary to read a given set of columns and |
| 142 | +row groups. This may be useful to pre-buffer file data via caching mechanisms. |
| 143 | +([#45092](https://github.com/apache/arrow/issues/45092)) |
| 144 | +- We've added `arrow::Result`-returning variants for |
| 145 | +`parquet::arrow::OpenFile()` and |
| 146 | +`parquet::arrow::FileReader::GetRecordBatchReader()`. |
| 147 | +([#44784](https://github.com/apache/arrow/issues/44784), |
| 148 | +[#44808](https://github.com/apache/arrow/issues/44808)) |
| 149 | + |
| 150 | +## C# Notes |
| 151 | + |
| 152 | +- The `PrimitiveArrayBuilder` constructor has been made public to allow writing |
| 153 | + custom builders. ([#23995](https://github.com/apache/arrow/issues/23995)) |
| 154 | +- Improved the performance of looking up schema fields by name. |
| 155 | + ([#44575](https://github.com/apache/arrow/issues/44575)) |
| 156 | + |
| 157 | +## Java, Go, and Rust Notes |
| 158 | + |
| 159 | +The Java, Go, and Rust Go projects have moved to separate repositories outside |
| 160 | +the main Arrow [monorepo](https://github.com/apache/arrow). |
| 161 | + |
| 162 | +- For notes on the latest release of the [Java |
| 163 | +implementation](https://github.com/apache/arrow-java), see the latest [Arrow |
| 164 | +Java changelog][7]. |
| 165 | +- For notes on the latest release of the [Rust |
| 166 | + implementation](https://github.com/apache/arrow-rs) see the latest [Arrow Rust |
| 167 | + changelog][5]. |
| 168 | +- For notes on the latest release of the [Go |
| 169 | +implementation](https://github.com/apache/arrow-go), see the latest [Arrow Go |
| 170 | +changelog][6]. |
| 171 | + |
| 172 | +## Linux Packaging Notes |
| 173 | + |
| 174 | +- Debian: Fixed keyring format to support newer libapt (e.g., used by |
| 175 | + Trixie). ([#45118](https://github.com/apache/arrow/issues/45118)) |
| 176 | + |
| 177 | +## Python Notes |
| 178 | + |
| 179 | +New features: |
| 180 | + |
| 181 | +- The upcoming pandas 3.0 [string |
| 182 | + dtype](https://pandas.pydata.org/pdeps/0014-string-dtype.html) is now |
| 183 | + supported by PyArrow's `to_pandas` routine. In the future, when using pandas >=3.0, |
| 184 | + the new pandas behavior will be enabled by default. You can opt into |
| 185 | + the new behavior under pandas >=2.3 by setting `pd.options.future.infer_string |
| 186 | + = True`. This may be considered a breaking change. |
| 187 | + ([#43683](https://github.com/apache/arrow/issues/43683)) |
| 188 | +- Support for 32-bit and 64-bit decimal types was added. |
| 189 | + ([#44713](https://github.com/apache/arrow/issues/44713)) |
| 190 | +- Arrow PyCapsule stream objects are supported in `write_dataset`. |
| 191 | + ([#43410](https://github.com/apache/arrow/issues/43410)) |
| 192 | +- New Flight features have been exposed. |
| 193 | + ([#36954](https://github.com/apache/arrow/issues/36954)) |
| 194 | +- Bindings for `JsonExtensionType` and `JsonArray` were added. |
| 195 | + ([#44066](https://github.com/apache/arrow/issues/44066)) |
| 196 | +- Hyperbolic trigonometry functions added to the Arrow C++ compute kernels are |
| 197 | + also available in PyArrow. |
| 198 | + ([#44952](https://github.com/apache/arrow/issues/44952)) |
| 199 | + |
| 200 | +Other improvements: |
| 201 | + |
| 202 | +- `strings_to_categorical` keyword in `to_pandas` can now be used for string |
| 203 | + view type. ([#45175](https://github.com/apache/arrow/issues/45175)) |
| 204 | +- `from_buffers` is updated to work with `StringView`. |
| 205 | + ([#44651](https://github.com/apache/arrow/issues/44651)) |
| 206 | +- Version suffixes are also set for Arrow Python C++ (`libarrow_python*`) |
| 207 | + libraries. ([#44614](https://github.com/apache/arrow/issues/44614)) |
| 208 | + |
| 209 | +## Ruby and C GLib Notes |
| 210 | + |
| 211 | +### Ruby |
| 212 | + |
| 213 | +- Added basic support for JRuby with an implementation based on Arrow Java |
| 214 | + ([#44346](https://github.com/apache/arrow/pull/44346)). The plan is to release |
| 215 | + this as a gem once it covers a base set of features. See |
| 216 | + [#45324](https://github.com/apache/arrow/issues/45324) for more information. |
| 217 | +- Added support for 32bit and 64bit decimal, binary view, and string view. See |
| 218 | + [issues |
| 219 | + listed](https://github.com/apache/arrow/issues?q=is%3Aclosed%20milestone%3A19.0.0%20label%3A%22Component%3A%20GLib%22) |
| 220 | + in the 19.0.0 milestone for more details. |
| 221 | +- Fixed a bug that empty struct list can't be built. |
| 222 | + ([#44742](https://github.com/apache/arrow/issues/44742)) |
| 223 | +- Fixed a bug that `record_batch[:column].size` raises an exception. |
| 224 | + ([#45119](https://github.com/apache/arrow/issues/45119)) |
| 225 | + |
| 226 | +### C GLib |
| 227 | + |
| 228 | +- Added support for 32bit and 64bit decimal, binary view, and string view. See |
| 229 | + [issues listed in the 19.0.0 |
| 230 | + milestone](https://github.com/apache/arrow/issues?q=is%3Aclosed%20milestone%3A19.0.0%20label%3A%22Component%3A%20GLib%22) |
| 231 | + for more details. |
| 232 | + |
| 233 | +[1]: https://github.com/apache/arrow/milestone/66?closed=1 |
| 234 | +[2]: {{ site.baseurl }}/release/19.0.0.html#contributors |
| 235 | +[3]: {{ site.baseurl }}/release/19.0.0.html#changelog |
| 236 | +[4]: {{ site.baseurl }}/docs/r/news/ |
| 237 | +[5]: https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md |
| 238 | +[6]: https://github.com/apache/arrow-go/releases |
| 239 | +[7]: https://github.com/apache/arrow-java/releases |
0 commit comments