You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
apacheGH-42146: [MATLAB] Add IPC RecordBatchFileReader and RecordBatchFileWriter MATLAB classes (apache#42201)
### Rationale for this change
To enable initial IPC I/O support in the MATLAB interface, we should add a `RecordBatchFileReader` class and a `RecordBatchFileWriter` class.
### What changes are included in this PR?
1. Added a new `arrow.io.ipc.RecordBatchFileWriter` class.
2. Added a new `arrow.io.ipc.RecordBatchFileReader` class.
**Example**
```matlab
>> city = ["Boston" "Seattle" "Denver" "Juno" "Anchorage" "Chicago"]';
>> daylength = duration(["15:17:01" "15:59:16" "14:59:14" "19:21:23" "14:18:24" "15:13:39"])';
>> matlabTable = table(city, daylength, VariableNames=["City", "DayLength"]);
>> recordBatch1 = arrow.recordBatch(matlabTable(1:4, :))
>> recordBatch2 = arrow.recordBatch(matlabTable(5:end, :));
>> writer = arrow.io.ipc.RecordBatchFileWriter("daylight.arrow", recordBatch1.Schema);
>> writer.writeRecordBatch(recordBatch1);
>> writer.writeRecordBatch(recordBatch2);
>> writer.close();
>> reader = arrow.io.ipc.RecordBatchFileReader("daylight.arrow");
reader =
RecordBatchFileReader with properties:
NumRecordBatches: 2
Schema: [1×1 arrow.tabular.Schema]
>> reader.Schema
ans =
Arrow Schema with 2 fields:
City: String | DayLength: Time64
>> rb1 = reader.read(1);
>> isequal(rb1, recordBatch1)
ans =
logical
1
>> rb2 = reader.read(2);
>> isequal(rb2, recordBatch2)
ans =
logical
1
```
### Are these changes tested?
Yes. Added two new test files:
1. `arrow/matlab/test/io/ipc/tRecordBatchFileWriter.m`
2. `arrow/matlab/test/io/ipc/tRecordBatchFileReader.m`
### Are there any user-facing changes?
Yes. Users can now serialize `RecordBatch`es and `Table`s to files using the Arrow IPC data format as well as read in `RecordBatch`es from Arrow IPC data files.
### Future Directions
1. Add `RecordBatchStreamWriter` and `RecordBatchStreamReader`
2. Expose options for [controlling](https://github.com/apache/arrow/blob/main/cpp/src/arrow/ipc/options.h) IPC reading and writing in MATLAB.
3. Add more methods to `RecordBatchFileReader` to read in multiple record batches at once as well as importing the data as an Arrow `Table`.
* GitHub Issue: apache#42146
Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Sarah Gilmore <[email protected]>
0 commit comments