-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(r): Add bindings for IPC reader (#390)
This PR adds bindings to nanoarrow's IPC reader from R. The entrypoint for a user is `read_nanoarrow()`, which accepts raw vectors, connections, and file paths (thin wrapper around connections). It also fixes a number of compiler warnings in the IPC extension. The implementation is not particularly complicated from the R side, but the main drawback of adding IPC support is that the flatbuffers implementation (flatcc) actively does not care about gcc compiler warnings (whereas CRAN actively cares about them). These are all slower than the arrow package, which has more tools at its disposal to prevent copies. ``` r library(arrow, warn.conflicts = FALSE) library(nanoarrow) # Basic read example tf <- tempfile() write_ipc_stream(dplyr::starwars, tf) read_nanoarrow(tf) |> tibble::as_tibble() #> # A tibble: 87 × 14 #> name height mass hair_color skin_color eye_color birth_year sex gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> #> 1 Luke Sk… 172 77 blond fair blue 19 male mascu… #> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu… #> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu… #> 4 Darth V… 202 136 none white yellow 41.9 male mascu… #> 5 Leia Or… 150 49 brown light brown 19 fema… femin… #> 6 Owen La… 178 120 brown, gr… light blue 52 male mascu… #> 7 Beru Wh… 165 75 brown light blue 47 fema… femin… #> 8 R5-D4 97 32 <NA> white, red red NA none mascu… #> 9 Biggs D… 183 84 black light brown 24 male mascu… #> 10 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu… #> # ℹ 77 more rows #> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list<chr>>, #> # vehicles <list<chr>>, starships <list<chr>> df_bigish <- nanoarrow:::vec_gen(data.frame(x = character()), n = 1e6) write_ipc_stream(df_bigish, tf) # Wrapper because mmap is apparently not passed through from read_ipc_stream() # and this is pretty significant read_ipc_stream_wrap <- function(f, ..., mmap) { arrow::read_ipc_stream( arrow:::make_readable_file(f, mmap = mmap, random_access = FALSE), ... ) } tf_raw <- brio::read_file_raw(tf) # Slower than arrow for raw vector input because of C implementation, # which doesn't currently share the global buffer (just shares buffers # between columns within a single batch) bench::mark( nanoarrow = read_nanoarrow(tf_raw) |> collect_array_stream(), arrow = read_ipc_stream(buffer(tf_raw), as_data_frame = FALSE), check = FALSE ) #> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 nanoarrow 1.27ms 1.84ms 439. 41.5KB 2.18 #> 2 arrow 509.26µs 528.65µs 1821. 3.6MB 79.3 # Slower than arrow, maybe because of C implementation, but definitely # because it uses base::readBin() which necessiates an extra copy bench::mark( nanoarrow = read_nanoarrow(tf) |> collect_array_stream(), arrow_mmap = read_ipc_stream_wrap(tf, mmap = TRUE, as_data_frame = FALSE), arrow = read_ipc_stream_wrap(tf, mmap = FALSE, as_data_frame = FALSE), check = FALSE ) #> # A tibble: 3 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 nanoarrow 5.18ms 5.66ms 174. 16.2MB 189. #> 2 arrow_mmap 613.48µs 640.5µs 1526. 528.6KB 13.9 #> 3 arrow 2.18ms 2.84ms 339. 551.6KB 4.06 ``` <sup>Created on 2024-02-19 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
- Loading branch information
1 parent
c66ddc3
commit e4f0754
Showing
17 changed files
with
797 additions
and
54 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,7 +51,7 @@ jobs: | |
R_KEEP_PKG_SOURCE: yes | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- uses: actions/checkout@v4 | ||
|
||
- uses: r-lib/actions/setup-pandoc@v2 | ||
- uses: r-lib/actions/setup-r@v2 | ||
|
@@ -66,14 +66,14 @@ jobs: | |
if: matrix.config.os != 'windows-latest' | ||
env: | ||
PKG_CPPFLAGS: "-DNANOARROW_DEBUG" | ||
PKG_CFLAGS: "-Werror -Wall -Wextra -Wpedantic -Wconversion -Wno-unused-parameter -Wno-sign-conversion -Wno-cast-function-type" | ||
PKG_CFLAGS: "-Werror -Wall -Wextra -Wpedantic -Wconversion -Wno-unused-parameter -Wno-sign-conversion -Wno-cast-function-type -Wno-misleading-indentation -Wno-conversion -Wno-unused-const-variable" | ||
run: | | ||
R CMD INSTALL r --preclean | ||
shell: bash | ||
|
||
- uses: r-lib/actions/setup-r-dependencies@v2 | ||
with: | ||
extra-packages: any::rcmdcheck, arrow=?ignore-before-r=4.0.0, github::r-lib/[email protected] | ||
extra-packages: any::rcmdcheck, arrow=?ignore-before-r=4.0.0 | ||
needs: check | ||
working-directory: r | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,3 +17,6 @@ | |
|
||
src/nanoarrow.c | ||
src/nanoarrow.h | ||
src/nanoarrow_ipc.h | ||
src/nanoarrow_ipc.c | ||
src/flatcc* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.