Skip to content

Commit ce11e56

Browse files
authored
apacheGH-38659: [CI][MATLAB][Packaging] Add MATLAB packaging task to crossbow tasks.yml (apache#38660)
### Rationale for this change Per the following mailing list discussion: https://lists.apache.org/thread/0xyow40h7b1bptsppb0rxd4g9r1xpmh6 to integrate the MATLAB interface code with the existing Arrow release tooling, we first need to add a task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) to crossbow. This packaging task will automatically create a [MLTBX file](https://www.mathworks.com/help/matlab/creating-help.html?s_tid=CRUX_lftnav) (the MATLAB equivalent to a Python binary wheel or Ruby gem) that can be installed via a "one-click" workflow in MATLAB. This will enable MATLAB users to install the interface without needing to build from source. ### Licensing For more information about licensing of the MLTBX file contents, please refer to the mailing list discussion and ASF Legal ticket linked below: 1. https://lists.apache.org/thread/zlpnncgvo6l4cvkxfxn7zt4q7qhptotw 2. https://issues.apache.org/jira/browse/LEGAL-665 ### What changes are included in this PR? 1. Added a `matlab` task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) in `dev/tasks/tasks.yml`. 4. Added a new GitHub Actions workflow called `dev/tasks/matlab/github.yml` which builds the MATLAB interface code on all platforms (Windows, macOS, and Ubuntu 20.04) and packages the generated build artifacts into a single MLTBX file using [`matlab.addons.toolbox.packageToolbox`](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html). 5. Changed the GitHub-hosted runner to `ubuntu-20.04` from `ubuntu-latest` for the MATLAB CI check (i.e. `.github/workflows/matlab.yml`). The rationale for this change is that we primarily develop and qualify against Debian 11 locally, but the CI check has been building against `ubuntu-latest` (i.e. `ubuntu-22.04`). There are two issues with using `ubuntu-22.04`. The first is that the version of `GLIBC` shipped with `ubuntu-22.04` is not fully compatible with the version of `GLIBC` shipped with `Debian 11`. This results in a runtime linker error when qualifying the packaged MATLAB interface code locally on Debian 11. The second issue with using `ubuntu-22.04` is that the system version of `GLIBCXX` is not fully compatible with the version of `GLIBCXX` bundled with MATLAB R2023a (this is a relatively common issue - e.g. see: https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found). Previously, we worked around this issue in GitHub Actions by using `LD_PRELOAD` before starting up MATLAB to run the unit tests. On the other hand, the version of `GLIBCXX` shipped with `ubuntu-20.04` **is** binary compatible with the version bundled with MATLAB R2023a. Therefore, we believe it would be better to use `ubuntu-20.04` in the MATLAB CI checks for the time being until we can qualify the MATLAB interface against `ubuntu-22.04`. ### Are these changes tested? Yes. 1. Successfully submitted a crossbow `packaging` job for the MATLAB interface by commenting `@ github-actions crossbow submit matlab`. Example of a successful packaging job: https://github.com/ursacomputing/crossbow/actions/runs/6893506432/job/18753227453. 2. Manually installed the resulting MLTBX file on macOS, Windows, Debian 11, and Ubuntu 20.04. Ran all tests under `matlab/test` using `runtests . IncludeSubFolders 1`. ### Are there any user-facing changes? No. ### Notes 1. While qualifying, we discovered that [MATLAB's programmatic packaging interface](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html) does not properly include symbolic link files in the packaged MLTBX file. We've reported this bug to the relevant MathWorks development team. As a temporary workaround, we included a step to change the expected name of the Arrow C++ libraries (using `patchelf`/`install_name_tool`) which `libarrowproxy.so`/`libarrowproxy.dylib` depends on to `libarrow.so.1500.0.0`/`libarrow.1500.0.0.dylib` instead of `libarrow.so.1500`/`libarrow.1500.dylib`, respectively. Once this bug is resolved, we will remove this step from the workflow. ### Future Directions 1. Add tooling to upload release candidate (RC) MLTBX files to apache/arrow's GitHub Releases area and mark them as "Prerelease". In other words, modify https://github.com/apache/arrow/blob/main/dev/release/05-binary-upload.sh. 2. Add a post-release script to upload release MLTBX files to apache/arrow's GitHub Releases area (similar to how https://github.com/apache/arrow/blob/main/dev/release/post-09-python.sh works). 4. Enable nightly builds for the MATLAB interface. 6. Document how to qualify a MATLAB Arrow interface release. 7. Enable building and testing the MATLAB Arrow interface on multiple Ubuntu distributions simulatneously (e.g. 20.04 *and* 22.04). * Closes: apache#38659 * GitHub Issue: apache#38659 Lead-authored-by: Sarah Gilmore <[email protected]> Co-authored-by: Kevin Gurney <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
1 parent d32e4b0 commit ce11e56

File tree

5 files changed

+273
-27
lines changed

5 files changed

+273
-27
lines changed

Diff for: .github/workflows/matlab.yml

+18-10
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,23 @@ jobs:
4242

4343
ubuntu:
4444
name: AMD64 Ubuntu 20.04 MATLAB
45-
runs-on: ubuntu-latest
45+
# Explicitly pin the Ubuntu version to 20.04 for the time being because:
46+
#
47+
# 1. The version of GLIBCXX shipped with Ubuntu 22.04 is not binary compatible
48+
# with the GLIBCXX bundled with MATLAB R2023a. This is a relatively common
49+
# issue.
50+
#
51+
# For example, see:
52+
#
53+
# https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found
54+
#
55+
# 2. The version of GLIBCXX shipped with Ubuntu 22.04 is not binary compatible with
56+
# the version of GLIBCXX shipped with Debian 11. Several of the Arrow community
57+
# members who work on the MATLAB bindings use Debian 11 locally for qualification.
58+
# Using Ubuntu 20.04 eases development workflows for these community members.
59+
#
60+
# In the future, we can investigate adding support for building against more Linux (e.g. `ubuntu-22.04`) and MATLAB versions (e.g. R2023b).
61+
runs-on: ubuntu-20.04
4662
if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
4763
steps:
4864
- name: Check out repository
@@ -74,22 +90,14 @@ jobs:
7490
run: ci/scripts/matlab_build.sh $(pwd)
7591
- name: Run MATLAB Tests
7692
env:
77-
# libarrow.so requires a more recent version of libstdc++.so
78-
# than is bundled with MATLAB under <matlabroot>/sys/os/glnxa64.
79-
# Therefore, if a MEX function that depends on libarrow.so
80-
# is executed within the MATLAB address space, runtime linking
81-
# errors will occur. To work around this issue, we can explicitly
82-
# force MATLAB to use the system libstdc++.so via LD_PRELOAD.
83-
LD_PRELOAD: /usr/lib/x86_64-linux-gnu/libstdc++.so.6
84-
8593
# Add the installation directory to the MATLAB Search Path by
8694
# setting the MATLABPATH environment variable.
8795
MATLABPATH: matlab/install/arrow_matlab
8896
uses: matlab-actions/run-tests@v2
8997
with:
9098
select-by-folder: matlab/test
9199
macos:
92-
name: AMD64 macOS 11 MATLAB
100+
name: AMD64 macOS 12 MATLAB
93101
runs-on: macos-latest
94102
if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
95103
steps:

Diff for: dev/tasks/matlab/github.yml

+162
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
{% import 'macros.jinja' as macros with context %}
19+
20+
{{ macros.github_header() }}
21+
22+
jobs:
23+
24+
ubuntu:
25+
name: AMD64 Ubuntu 20.04 MATLAB
26+
runs-on: ubuntu-20.04
27+
steps:
28+
{{ macros.github_checkout_arrow()|indent }}
29+
- name: Install ninja-build
30+
run: sudo apt-get update && sudo apt-get install ninja-build
31+
- name: Install MATLAB
32+
uses: matlab-actions/setup-matlab@v1
33+
with:
34+
release: R2023a
35+
- name: Build MATLAB Interface
36+
env:
37+
{{ macros.github_set_sccache_envvars()|indent(8) }}
38+
run: arrow/ci/scripts/matlab_build.sh $(pwd)/arrow
39+
- name: Change shared library dependency name
40+
# MATLAB's programmatic packaging interface does not properly
41+
# include symbolic link files in the package MLTBX - this is a
42+
# bug. As a temporary workaround, change the expected name of the
43+
# Arrow C++ library which libarrowproxy.so depends on. For example,
44+
# change libarrow.so.1500 to libarrow.so.1500.0.0.
45+
run: |
46+
pushd arrow/matlab/install/arrow_matlab/+libmexclass/+proxy/
47+
SYMLINK_ARROW_LIB="$(find . -name 'libarrow.so.*' -type l | xargs basename)"
48+
REGULAR_ARROW_LIB="$(echo libarrow.so.*.*)"
49+
echo "SYMLINK_ARROW_LIB = ${SYMLINK_ARROW_LIB}"
50+
echo "REGULAR_ARROW_LIB = ${REGULAR_ARROW_LIB}"
51+
patchelf --replace-needed $SYMLINK_ARROW_LIB $REGULAR_ARROW_LIB libarrowproxy.so
52+
popd
53+
- name: Compress into single artifact
54+
run: tar -cvzf matlab-arrow-ubuntu.tar.gz arrow/matlab/install/arrow_matlab
55+
- name: Upload artifacts
56+
uses: actions/upload-artifact@v4
57+
with:
58+
name: matlab-arrow-ubuntu.tar.gz
59+
path: matlab-arrow-ubuntu.tar.gz
60+
61+
macos:
62+
name: AMD64 macOS 12 MATLAB
63+
runs-on: macos-latest
64+
steps:
65+
{{ macros.github_checkout_arrow()|indent }}
66+
- name: Install ninja-build
67+
run: brew install ninja
68+
- name: Install MATLAB
69+
uses: matlab-actions/setup-matlab@v1
70+
with:
71+
release: R2023a
72+
- name: Build MATLAB Interface
73+
env:
74+
{{ macros.github_set_sccache_envvars()|indent(8) }}
75+
run: arrow/ci/scripts/matlab_build.sh $(pwd)/arrow
76+
- name: Change shared library dependency name
77+
# MATLAB's programmatic packaging interface does not properly
78+
# include symbolic link files in the package MLTBX - this is a
79+
# bug. As a temporary workaround, change the expected name of the
80+
# Arrow C++ library which libarrowproxy.dylib depends on.
81+
# For example, change libarrow.1500.dylib to libarrow.1500.0.0.dylib.
82+
run: |
83+
pushd arrow/matlab/install/arrow_matlab/+libmexclass/+proxy
84+
SYMLINK_ARROW_LIB="$(find . -name 'libarrow.*.dylib' -type l | xargs basename)"
85+
REGULAR_ARROW_LIB="$(echo libarrow.*.*.dylib)"
86+
echo "SYMLINK_ARROW_LIB = ${SYMLINK_ARROW_LIB}"
87+
echo "REGULAR_ARROW_LIB = ${REGULAR_ARROW_LIB}"
88+
install_name_tool -change @rpath/$SYMLINK_ARROW_LIB @rpath/$REGULAR_ARROW_LIB libarrowproxy.dylib
89+
popd
90+
- name: Compress into single artifact
91+
run: tar -cvzf matlab-arrow-macos.tar.gz arrow/matlab/install/arrow_matlab
92+
- name: Upload artifacts
93+
uses: actions/upload-artifact@v4
94+
with:
95+
name: matlab-arrow-macos.tar.gz
96+
path: matlab-arrow-macos.tar.gz
97+
98+
windows:
99+
name: AMD64 Windows 2022 MATLAB
100+
runs-on: windows-2022
101+
steps:
102+
{{ macros.github_checkout_arrow()|indent }}
103+
- name: Install MATLAB
104+
uses: matlab-actions/setup-matlab@v1
105+
with:
106+
release: R2023a
107+
- name: Install sccache
108+
shell: bash
109+
run: arrow/ci/scripts/install_sccache.sh pc-windows-msvc $(pwd)/sccache
110+
- name: Build MATLAB Interface
111+
shell: cmd
112+
env:
113+
{{ macros.github_set_sccache_envvars()|indent(8) }}
114+
run: |
115+
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64
116+
bash -c "arrow/ci/scripts/matlab_build.sh $(pwd)/arrow"
117+
- name: Compress into single artifact
118+
shell: bash
119+
run: tar -cvzf matlab-arrow-windows.tar.gz arrow/matlab/install/arrow_matlab
120+
- name: Upload artifacts
121+
uses: actions/upload-artifact@v4
122+
with:
123+
name: matlab-arrow-windows.tar.gz
124+
path: matlab-arrow-windows.tar.gz
125+
126+
package-mltbx:
127+
name: Package MATLAB Toolbox (MLTBX) Files
128+
runs-on: ubuntu-latest
129+
needs:
130+
- ubuntu
131+
- macos
132+
- windows
133+
steps:
134+
{{ macros.github_checkout_arrow(fetch_depth=0)|indent }}
135+
- name: Download Artifacts
136+
uses: actions/download-artifact@v4
137+
with:
138+
path: artifacts-downloaded
139+
- name: Decompress Artifacts
140+
run: |
141+
mv artifacts-downloaded/*/*.tar.gz .
142+
tar -xzvf matlab-arrow-ubuntu.tar.gz
143+
tar -xzvf matlab-arrow-macos.tar.gz
144+
tar -xzvf matlab-arrow-windows.tar.gz
145+
- name: Copy LICENSE.txt and NOTICE.txt for packaging
146+
run: |
147+
cp arrow/LICENSE.txt arrow/matlab/install/arrow_matlab/LICENSE.txt
148+
cp arrow/NOTICE.txt arrow/matlab/install/arrow_matlab/NOTICE.txt
149+
- name: Install MATLAB
150+
uses: matlab-actions/setup-matlab@v1
151+
with:
152+
release: R2023a
153+
- name: Run commands
154+
env:
155+
MATLABPATH: arrow/matlab/tools
156+
ARROW_MATLAB_TOOLBOX_FOLDER: arrow/matlab/install/arrow_matlab
157+
ARROW_MATLAB_TOOLBOX_OUTPUT_FOLDER: artifacts/matlab-dist
158+
ARROW_MATLAB_TOOLBOX_VERSION: {{ arrow.no_rc_version }}
159+
uses: matlab-actions/run-command@v1
160+
with:
161+
command: packageMatlabInterface
162+
{{ macros.github_upload_releases(["artifacts/matlab-dist/*.mltbx"])|indent }}

Diff for: dev/tasks/tasks.yml

+9
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ groups:
5959
- conan-*
6060
- debian-*
6161
- java-jars
62+
- matlab
6263
- nuget
6364
- python-sdist
6465
- r-binary-packages
@@ -665,6 +666,14 @@ tasks:
665666
params:
666667
formula: apache-arrow.rb
667668

669+
############################## MATLAB Packages ################################
670+
671+
matlab:
672+
ci: github
673+
template: matlab/github.yml
674+
artifacts:
675+
- matlab-arrow-{no_rc_version}.mltbx
676+
668677
############################## Arrow JAR's ##################################
669678

670679
java-jars:

Diff for: matlab/CMakeLists.txt

-17
Original file line numberDiff line numberDiff line change
@@ -201,9 +201,6 @@ get_filename_component(ARROW_SHARED_LIB_DIR ${ARROW_SHARED_LIB} DIRECTORY)
201201
get_filename_component(ARROW_SHARED_LIB_FILENAME ${ARROW_SHARED_LIB} NAME_WE)
202202

203203
if(NOT Arrow_FOUND)
204-
# If Arrow_FOUND is false, Arrow is built by the arrow_shared target and needs
205-
# to be copied to CMAKE_PACKAGED_INSTALL_DIR.
206-
207204
if(APPLE)
208205
# Install libarrow.dylib (symlink) and the real files it points to.
209206
# on macOS, we need to match these files: libarrow.dylib
@@ -226,20 +223,6 @@ if(NOT Arrow_FOUND)
226223
set(SHARED_LIBRARY_VERSION_REGEX
227224
${ARROW_SHARED_LIB_FILENAME}${CMAKE_SHARED_LIBRARY_SUFFIX})
228225
endif()
229-
230-
# The subfolders cmake and pkgconfig are excluded as they will be empty.
231-
# Note: The following CMake Issue suggests enabling an option to exclude all
232-
# folders that would be empty after installation:
233-
# https://gitlab.kitware.com/cmake/cmake/-/issues/17122
234-
235-
set(CMAKE_PACKAGED_INSTALL_DIR "${CMAKE_INSTALL_DIR}/+arrow")
236-
237-
install(DIRECTORY "${ARROW_SHARED_LIB_DIR}/"
238-
DESTINATION ${CMAKE_PACKAGED_INSTALL_DIR}
239-
FILES_MATCHING
240-
REGEX ${SHARED_LIBRARY_VERSION_REGEX}
241-
PATTERN "cmake" EXCLUDE
242-
PATTERN "pkgconfig" EXCLUDE)
243226
endif()
244227

245228
# MATLAB_ADD_INSTALL_DIR_TO_STARTUP_FILE toggles whether an addpath command to add the install

Diff for: matlab/tools/packageMatlabInterface.m

+84
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
% Licensed to the Apache Software Foundation (ASF) under one
2+
% or more contributor license agreements. See the NOTICE file
3+
% distributed with this work for additional information
4+
% regarding copyright ownership. The ASF licenses this file
5+
% to you under the Apache License, Version 2.0 (the
6+
% "License"); you may not use this file except in compliance
7+
% with the License. You may obtain a copy of the License at
8+
%
9+
% http://www.apache.org/licenses/LICENSE-2.0
10+
%
11+
% Unless required by applicable law or agreed to in writing,
12+
% software distributed under the License is distributed on an
13+
% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
% KIND, either express or implied. See the License for the
15+
% specific language governing permissions and limitations
16+
% under the License.
17+
18+
toolboxFolder = string(getenv("ARROW_MATLAB_TOOLBOX_FOLDER"));
19+
outputFolder = string(getenv("ARROW_MATLAB_TOOLBOX_OUTPUT_FOLDER"));
20+
toolboxVersionRaw = string(getenv("ARROW_MATLAB_TOOLBOX_VERSION"));
21+
22+
appendLicenseText(fullfile(toolboxFolder, "LICENSE.txt"));
23+
appendNoticeText(fullfile(toolboxFolder, "NOTICE.txt"));
24+
25+
% Output folder must exist.
26+
mkdir(outputFolder);
27+
28+
disp("Toolbox Folder: " + toolboxFolder);
29+
disp("Output Folder: " + outputFolder);
30+
disp("Toolbox Version Raw: " + toolboxVersionRaw);
31+
32+
33+
% Note: This string processing heuristic may not be robust to future
34+
% changes in the Arrow versioning scheme.
35+
dotIdx = strfind(toolboxVersionRaw, ".");
36+
numDots = numel(dotIdx);
37+
if numDots >= 3
38+
toolboxVersion = extractBefore(toolboxVersionRaw, dotIdx(3));
39+
else
40+
toolboxVersion = toolboxVersionRaw;
41+
end
42+
43+
disp("Toolbox Version:" + toolboxVersion);
44+
45+
identifier = "ad1d0fe6-22d1-4969-9e6f-0ab5d0f12ce3";
46+
opts = matlab.addons.toolbox.ToolboxOptions(toolboxFolder, identifier);
47+
opts.ToolboxName = "MATLAB Arrow Interface";
48+
opts.ToolboxVersion = toolboxVersion;
49+
opts.AuthorName = "The Apache Software Foundation";
50+
opts.AuthorEmail = "[email protected]";
51+
52+
% Set the SupportedPlatforms
53+
opts.SupportedPlatforms.Win64 = true;
54+
opts.SupportedPlatforms.Maci64 = true;
55+
opts.SupportedPlatforms.Glnxa64 = true;
56+
opts.SupportedPlatforms.MatlabOnline = true;
57+
58+
% Interface is only qualified against R2023a at the moment
59+
opts.MinimumMatlabRelease = "R2023a";
60+
opts.MaximumMatlabRelease = "R2023a";
61+
62+
opts.OutputFile = fullfile(outputFolder, compose("matlab-arrow-%s.mltbx", toolboxVersionRaw));
63+
disp("Output File: " + opts.OutputFile);
64+
matlab.addons.toolbox.packageToolbox(opts);
65+
66+
function appendLicenseText(filename)
67+
licenseText = [ ...
68+
newline + "--------------------------------------------------------------------------------" + newline
69+
"3rdparty dependency mathworks/libmexclass is redistributed as a dynamically"
70+
"linked shared library in certain binary distributions, like the MATLAB"
71+
"distribution." + newline
72+
"Copyright: 2022-2024 The MathWorks, Inc. All rights reserved."
73+
"Homepage: https://github.com/mathworks/libmexclass"
74+
"License: 3-clause BSD" ];
75+
writelines(licenseText, filename, WriteMode="append");
76+
end
77+
78+
function appendNoticeText(filename)
79+
noticeText = [ ...
80+
newline + "---------------------------------------------------------------------------------" + newline
81+
"This product includes software from The MathWorks, Inc. (Apache 2.0)"
82+
" * Copyright (C) 2024 The MathWorks, Inc."];
83+
writelines(noticeText, filename, WriteMode="append");
84+
end

0 commit comments

Comments
 (0)