Skip to content

Commit 88e172d

Browse files
amoebapitroukouadamreevelidavidm
authored
Website: Add blog post for 19.0.0 (#580)
Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Adam Reeve <[email protected]> Co-authored-by: David Li <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Matt Topol <[email protected]> Co-authored-by: Rossi Sun <[email protected]> Co-authored-by: Alenka Frim <[email protected]>
1 parent 732b746 commit 88e172d

File tree

1 file changed

+239
-0
lines changed

1 file changed

+239
-0
lines changed

Diff for: _posts/2025-01-16-19.0.0-release.md

+239
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
---
2+
layout: post
3+
title: "Apache Arrow 19.0.0 Release"
4+
date: "2025-01-16 00:00:00"
5+
author: pmc
6+
categories: [release]
7+
---
8+
<!--
9+
{% comment %}
10+
Licensed to the Apache Software Foundation (ASF) under one or more
11+
contributor license agreements. See the NOTICE file distributed with
12+
this work for additional information regarding copyright ownership.
13+
The ASF licenses this file to you under the Apache License, Version 2.0
14+
(the "License"); you may not use this file except in compliance with
15+
the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing, software
20+
distributed under the License is distributed on an "AS IS" BASIS,
21+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
22+
See the License for the specific language governing permissions and
23+
limitations under the License.
24+
{% endcomment %}
25+
-->
26+
27+
The Apache Arrow team is pleased to announce the 19.0.0 release. This release
28+
covers over 2 months of development work and includes [**202 resolved
29+
issues**][1] on [**330 distinct commits**][2] from [**67 distinct
30+
contributors**][2]. See the [Install Page](https://arrow.apache.org/install/) to
31+
learn how to get the libraries for your platform.
32+
33+
The release notes below are not exhaustive and only expose selected highlights
34+
of the release. Many other bugfixes and improvements have been made: we refer
35+
you to the [complete changelog][3].
36+
37+
## Community
38+
39+
Since the 18.1.0 release, Adam Reeve and Laurent Goujon have been invited to
40+
become committers. Gang Wu has been invited to join the Project Management
41+
Committee (PMC).
42+
43+
Thanks for your contributions and participation in the project!
44+
45+
## Release Highlights
46+
47+
A [bug](https://github.com/apache/arrow/issues/45283) has been identified in the
48+
19.0.0 versions of the C++ and Python libraries which prevents reading Parquet
49+
files written by Arrow Rust v53.0.0 or higher. The files written by Arrow Rust
50+
are correct and the bug was in the patch adding support for Parquet's
51+
[SizeStatistics](https://github.com/apache/parquet-format/pull/197) feature to
52+
Arrow C++ and Python. See [#45293](https://github.com/apache/arrow/issues/45283)
53+
for more details including a potential workaround.
54+
55+
As a result, we plan to create a 19.0.1 release to include a fix for this which
56+
should be available in next few weeks.
57+
58+
## Columnar Format
59+
60+
We've added a new experimental specification for representing statistics on
61+
Arrow Arrays as Arrow Arrays. This is useful for preserving and exchanging
62+
statistics between systems such as when converting Parquet data to Arrow. See
63+
[the statistics schema
64+
documentation](https://arrow.apache.org/docs/format/StatisticsSchema.html) for
65+
details.
66+
67+
We've expanded the Arrow C Device Data Interface to include an experimental
68+
Async Device Stream Interface. While the existing Arrow C Device Data Interface
69+
is a pull-oriented API, the Async interface provides a push-oriented design for
70+
other workflows. See the
71+
[documentation](https://arrow.apache.org/docs/format/CDeviceDataInterface.html#async-device-stream-interface)
72+
for more information. It currently has implementations in the C++ and Go
73+
libraries.
74+
75+
## Arrow Flight RPC Notes
76+
77+
The precision of a Timestamp (used for timeouts) is now nanoseconds on all
78+
platforms; previously it was platform-dependent. This may be a breaking change
79+
depending on your use case.
80+
([#44679](https://github.com/apache/arrow/issues/44679))
81+
82+
The Python bindings now support various new fields that were added to
83+
FlightEndpoint/FlightInfo (like `expiration_time`).
84+
([#36954](https://github.com/apache/arrow/issues/36954))
85+
86+
## C++ Notes
87+
88+
### Compute
89+
90+
- It is now possible to cast from a struct type to another struct type with
91+
additional columns, provided the additional columns are nullable
92+
([#44555)](https://github.com/apache/arrow/issues/44555).
93+
- The compute function `expm1` has been added to compute `exp(x) - 1` with better
94+
accuracy when the input value is close to 0
95+
([#44903](https://github.com/apache/arrow/issues/44903)).
96+
- Hyperbolic trigonometric functions and their reciprocals have also been added.
97+
([#44952](https://github.com/apache/arrow/issues/44952)).
98+
- The new Decimal32 and Decimal64 types have been further supported by allowing
99+
casting between numeric, string, and other decimal types
100+
([#43956](https://github.com/apache/arrow/issues/43956)).
101+
102+
### Acero
103+
104+
- Added AVX2 support for decoding row tables in the Swiss join specialization of
105+
hash joins, enabling up to 40% performance improvement for build-heavy
106+
workloads. ([#43693](https://github.com/apache/arrow/issues/43693))
107+
108+
### Filesystems
109+
110+
- The S3 filesystem has gained support for server-side encryption with customer
111+
provided keys, aka SSE-C.
112+
([#43535](https://github.com/apache/arrow/issues/43535))
113+
- The S3 filesystem also gained an option to disable the SIGPIPE signals that
114+
may be emitted on some network events.
115+
([#44695](https://github.com/apache/arrow/issues/44695))
116+
- The Azure filesystem now supports SAS token authentication.
117+
([#44308](https://github.com/apache/arrow/issues/44308)).
118+
119+
### Flight RPC
120+
121+
- The precision of a Timestamp (used for timeouts) is now nanoseconds on all
122+
platforms; previously it was platform-dependent. This may be a breaking change
123+
depending on your use case.
124+
([#44679](https://github.com/apache/arrow/issues/44679))
125+
- The Python bindings now support various new fields that were added to
126+
FlightEndpoint/FlightInfo (like `expiration_time`).
127+
([#36954](https://github.com/apache/arrow/issues/36954))
128+
- The UCX backend has been deprecated and is scheduled for removal.
129+
([#45079](https://github.com/apache/arrow/issues/45079))
130+
131+
### Parquet
132+
133+
- The initial footer read size can now be configured to reduce the number of
134+
potential round-trips on hi-latency filesystems such as S3.
135+
([#45015](https://github.com/apache/arrow/issues/45015))
136+
- The new `SizeStatistics` format feature has been implemented, though it is
137+
disabled by default when writing.
138+
([#40592](https://github.com/apache/arrow/issues/40592))
139+
- We've added a new method to the ParquetFileReader class,
140+
[GetReadRanges](https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet17ParquetFileReader13GetReadRangesERKNSt6vectorIiEERKNSt6vectorIiEE7int64_t7int64_t),
141+
which can calculate the byte ranges necessary to read a given set of columns and
142+
row groups. This may be useful to pre-buffer file data via caching mechanisms.
143+
([#45092](https://github.com/apache/arrow/issues/45092))
144+
- We've added `arrow::Result`-returning variants for
145+
`parquet::arrow::OpenFile()` and
146+
`parquet::arrow::FileReader::GetRecordBatchReader()`.
147+
([#44784](https://github.com/apache/arrow/issues/44784),
148+
[#44808](https://github.com/apache/arrow/issues/44808))
149+
150+
## C# Notes
151+
152+
- The `PrimitiveArrayBuilder` constructor has been made public to allow writing
153+
custom builders. ([#23995](https://github.com/apache/arrow/issues/23995))
154+
- Improved the performance of looking up schema fields by name.
155+
([#44575](https://github.com/apache/arrow/issues/44575))
156+
157+
## Java, Go, and Rust Notes
158+
159+
The Java, Go, and Rust Go projects have moved to separate repositories outside
160+
the main Arrow [monorepo](https://github.com/apache/arrow).
161+
162+
- For notes on the latest release of the [Java
163+
implementation](https://github.com/apache/arrow-java), see the latest [Arrow
164+
Java changelog][7].
165+
- For notes on the latest release of the [Rust
166+
implementation](https://github.com/apache/arrow-rs) see the latest [Arrow Rust
167+
changelog][5].
168+
- For notes on the latest release of the [Go
169+
implementation](https://github.com/apache/arrow-go), see the latest [Arrow Go
170+
changelog][6].
171+
172+
## Linux Packaging Notes
173+
174+
- Debian: Fixed keyring format to support newer libapt (e.g., used by
175+
Trixie). ([#45118](https://github.com/apache/arrow/issues/45118))
176+
177+
## Python Notes
178+
179+
New features:
180+
181+
- The upcoming pandas 3.0 [string
182+
dtype](https://pandas.pydata.org/pdeps/0014-string-dtype.html) is now
183+
supported by PyArrow's `to_pandas` routine. In the future, when using pandas >=3.0,
184+
the new pandas behavior will be enabled by default. You can opt into
185+
the new behavior under pandas >=2.3 by setting `pd.options.future.infer_string
186+
= True`. This may be considered a breaking change.
187+
([#43683](https://github.com/apache/arrow/issues/43683))
188+
- Support for 32-bit and 64-bit decimal types was added.
189+
([#44713](https://github.com/apache/arrow/issues/44713))
190+
- Arrow PyCapsule stream objects are supported in `write_dataset`.
191+
([#43410](https://github.com/apache/arrow/issues/43410))
192+
- New Flight features have been exposed.
193+
([#36954](https://github.com/apache/arrow/issues/36954))
194+
- Bindings for `JsonExtensionType` and `JsonArray` were added.
195+
([#44066](https://github.com/apache/arrow/issues/44066))
196+
- Hyperbolic trigonometry functions added to the Arrow C++ compute kernels are
197+
also available in PyArrow.
198+
([#44952](https://github.com/apache/arrow/issues/44952))
199+
200+
Other improvements:
201+
202+
- `strings_to_categorical` keyword in `to_pandas` can now be used for string
203+
view type. ([#45175](https://github.com/apache/arrow/issues/45175))
204+
- `from_buffers` is updated to work with `StringView`.
205+
([#44651](https://github.com/apache/arrow/issues/44651))
206+
- Version suffixes are also set for Arrow Python C++ (`libarrow_python*`)
207+
libraries. ([#44614](https://github.com/apache/arrow/issues/44614))
208+
209+
## Ruby and C GLib Notes
210+
211+
### Ruby
212+
213+
- Added basic support for JRuby with an implementation based on Arrow Java
214+
([#44346](https://github.com/apache/arrow/pull/44346)). The plan is to release
215+
this as a gem once it covers a base set of features. See
216+
[#45324](https://github.com/apache/arrow/issues/45324) for more information.
217+
- Added support for 32bit and 64bit decimal, binary view, and string view. See
218+
[issues
219+
listed](https://github.com/apache/arrow/issues?q=is%3Aclosed%20milestone%3A19.0.0%20label%3A%22Component%3A%20GLib%22)
220+
in the 19.0.0 milestone for more details.
221+
- Fixed a bug that empty struct list can't be built.
222+
([#44742](https://github.com/apache/arrow/issues/44742))
223+
- Fixed a bug that `record_batch[:column].size` raises an exception.
224+
([#45119](https://github.com/apache/arrow/issues/45119))
225+
226+
### C GLib
227+
228+
- Added support for 32bit and 64bit decimal, binary view, and string view. See
229+
[issues listed in the 19.0.0
230+
milestone](https://github.com/apache/arrow/issues?q=is%3Aclosed%20milestone%3A19.0.0%20label%3A%22Component%3A%20GLib%22)
231+
for more details.
232+
233+
[1]: https://github.com/apache/arrow/milestone/66?closed=1
234+
[2]: {{ site.baseurl }}/release/19.0.0.html#contributors
235+
[3]: {{ site.baseurl }}/release/19.0.0.html#changelog
236+
[4]: {{ site.baseurl }}/docs/r/news/
237+
[5]: https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md
238+
[6]: https://github.com/apache/arrow-go/releases
239+
[7]: https://github.com/apache/arrow-java/releases

0 commit comments

Comments
 (0)