-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix unable to build Pandas with xlc on z/OS #35829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The test failures from the CI build seem to the same as on master. |
How common is z/OS? Is it worthwhile/feasible to test it in the CI? |
We've been seeing lots of interest from users in being able to run Pandas on z/OS, and xlc (which this PR fixes) is the most common compiler found on that platform. For CI - on z/OS we're unable to use miniconda/anaconda right now, so at least in it's current form this is not feasible yet - but it's something that we're looking into getting fixed. |
Hi Steve, I am with Rocket Software. We have a z/OS Miniconda in beta now. Should be released in a couple weeks. We also have ported Python and pandas to z/OS. If the pandas community is interested in formally adopting z/OS as a platform we would be happy to offer our sources to assist. Which python are you using on z/OS? IBM recently released their own Python port. You can reach me directly at [email protected]. -Peter |
@PeterFandelAtRocket see #33971 for some discussion. "Officially" supporting a platform would need (binary) NumPy packages and freely available CI systems ( ideally through TravisCI or azure-pipelines since that's what we're using), and a champion to handle issues as they come in. |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
There are some labels that can be added to make it ignore the PR; right now those are "Needs Review", "Blocked", and "Needs Discussion'". I'm not sure if there's a way of setting conditions based on who made the last action. |
Is there anything that I could do/provide to help get a reviewer to take a look at this PR? |
@pitmanst the content here looks pretty benign; the limiting factor is our inability to include this in the CI. If we had a good strategy to make that feasible, this would move much more quickly. Absent that, @TomAugspurger is there a plan B? |
Is there a way to automatically test support for the OS? Perhaps a docker container that can be set up in CI? #30641 may be a good template to work off of |
@pitmanst happy to take this if you can merge master, move the note to 1.3, and confirm that this works. note we won't have official support, but can mention this as experimental. |
@pitmanst can you merge master to resolve conflicts |
Thanks for taking it into consideration. The failures that we see on the checks seem to be unrelated to this change. It should be ready for review. |
I've updated this again, and looks like all checks are passing now. Would someone be able to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, ping on green.
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -903,6 +903,9 @@ Styler | |||
|
|||
Other | |||
^^^^^ | |||
- Bug in :class:`Index` constructor sometimes silently ignorning a a specified ``dtype`` (:issue:`38879`) | |||
- | |||
======= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated entry, pls revert
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -913,6 +916,7 @@ Other | |||
- Bug in :func:`pandas.testing.assert_series_equal`, :func:`pandas.testing.assert_frame_equal`, :func:`pandas.testing.assert_index_equal` and :func:`pandas.testing.assert_extension_array_equal` incorrectly raising when an attribute has an unrecognized NA type (:issue:`39461`) | |||
- Bug in :meth:`DataFrame.equals`, :meth:`Series.equals`, :meth:`Index.equals` with object-dtype containing ``np.datetime64("NaT")`` or ``np.timedelta64("NaT")`` (:issue:`39650`) | |||
- Bug in :func:`pandas.util.show_versions` where console JSON output was not proper JSON (:issue:`39701`) | |||
- Fixed Pandas not being able to compile on z/OS when using xlc (:issue:`35826`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a link to xlc
Thanks for the review @jreback. I've addressed the comments and the checks are green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pitmanst. Added some ideas to make things more readable and be mindful of the already huge size of our setup.py
.
setup.py
Outdated
comp_macros = data.get("macros", macros) | ||
undef_macros = [] | ||
|
||
if is_platform_zos(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we need this function, why not simply if sys.platform == "zos":
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added it to be consistent with how the other platforms handled it. I've minimized the change as suggested.
setup.py
Outdated
undef_macros = [] | ||
|
||
if is_platform_zos(): | ||
language = data.get("language", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
is already the default. Also, since you aren't using the language more than in the if
, I'd simply write if data.get('language') == 'c++':
and make this more compact.
This file is huge, so would prefer to not make this unnecessarily longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
setup.py
Outdated
if is_platform_zos(): | ||
language = data.get("language", None) | ||
if language == "c++": | ||
compiler = os.environ.get("CXX", "/bin/xlc++") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we do assume that if CXX
is not in env, the compiler is xlc++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On z/OS, xlc/xlc++ is the default compiler that is provided, so most users will have it.
setup.py
Outdated
compiler = os.environ.get("CXX", "/bin/xlc++") | ||
compiler_name = os.path.basename(compiler) | ||
|
||
if (compiler_name == "xlc") or (compiler_name == "xlc++"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels that all these if could be just a single condition:
if sys.platform == "zos" and data.get('language') == 'c++' and os.path.basename(os.environ.get('CXX')) in ('xlc', 'xlc++'):
Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup - I've made this change. Thanks
setup.py
Outdated
define_macros=data.get("macros", macros), | ||
extra_compile_args=extra_compile_args, | ||
define_macros=comp_macros, | ||
extra_compile_args=extra_comp_args, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is extra_compile_args
being used elsewhere now? Not easy to see in the diff, but just in case you can simply append to extra_compile_args
and not make a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra_compile_args
isn't used elsewhere. I've removed the copy and just appended directly to it now.
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -913,6 +913,7 @@ Other | |||
- Bug in :func:`pandas.testing.assert_series_equal`, :func:`pandas.testing.assert_frame_equal`, :func:`pandas.testing.assert_index_equal` and :func:`pandas.testing.assert_extension_array_equal` incorrectly raising when an attribute has an unrecognized NA type (:issue:`39461`) | |||
- Bug in :meth:`DataFrame.equals`, :meth:`Series.equals`, :meth:`Index.equals` with object-dtype containing ``np.datetime64("NaT")`` or ``np.timedelta64("NaT")`` (:issue:`39650`) | |||
- Bug in :func:`pandas.util.show_versions` where console JSON output was not proper JSON (:issue:`39701`) | |||
- Fixed Pandas not being able to compile on z/OS when using `xlc <https://www.ibm.com/products/xl-cpp-compiler-zos>`_ (:issue:`35826`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small personal preference, but I'd rephrase as "Let pandas compile with xlc in z/OS" or similar. I don't think it was broken, and now it's fixed, since that was never supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've swapped up the wording for this similar to as you've put it
Thanks @pitmanst, great changes, look much clearer and compact now IMO. Can you run I guess there is not CI (external to pandas) that can test this before we merge it, right? |
Thanks for the reviews @datapythonista! It didn't look like |
For CI - is there any documentation anywhere on what's required (env, any packages, how to connect) to add a machine to the CI (i.e. a z/OS machine)? I see that it's running on Azure, but any extra docs would be helpful. |
Thanks for the fix @pitmanst. I think in Python is preferred to use brackets instead os backslashes for multiline, Like: foo = (long_var
+ next_line_var)
foo = long_var + \
next_line_var And also, I think it's more readable to have a condition per line. No big deal, but if we need to iterate further, feel free to check and see if that makes the code more readable. For the CI, probably more than documentation you can have a look at |
Thanks for the pointers on CI - I'll take a look. As for the change - I've applied the suggested change. |
thanks @pitmanst |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This is an attempt at being able to build Pandas using xlc on z/OS. There's two issues being fixed here:
Raise the default standard that being used with xlc++ (
-qlanglvl
) only when compiling c++ code, and fix other macros that otherwise cause errors when compiling with that option. I put the change directly above where it's used, but an alternate spot that it could go is directly within the declarations of each cpp file + it's respective macros. But since this would need to be added to any additional c++ code that's added, and all extra options are the same, I left it here.Fix the compilation errors that occur - it's currently missing two functions that aren't in the std:: namespace. The first is that
signbit
is declared as a macro and not a function, so we cannot redeclare the function without changing the macro. The next,isnan
, is declared as both a macro and a function. According to the xlc docs, to use the function you can simply undefineisnan
, which is done here, and then it's also placed in the std namespace. To use the macro form we could take the same approach as signbit.If we don't want to touch the std namespace, an alternative could be to swap the declarations (for z/OS only) within
pandas/_libs/window/aggregations.pyx
to not use std for those two functions.