[python-package] Documentation on setting up development environment #6350

nicklamiller · 2024-03-03T18:21:29Z

Summary

Add documentation describing how to setup an environment for developing in the python-package.

Motivation

Currently there's no specification on how to setup an environment for developing in the python-package #6310 (comment), adding this would make the contribution process smoother.

Description

Add documentation to CONTRIBUTING.md with step-by-step instructions on how to setup developer environment (as outlined in [python-package] Add feature_names_in_ attribute for scikit-learn estimators (fixes #6279) #6310 (comment)), including:
- Installing python developer dependencies
- Building C++ library
- Installing the python package into the environment once changes are made
Potentially add a dev section to the existing pyproject.toml under project.optional-dependencies with dev dependencies, so one can install via pip install '.[dev]'
Potentially reference setup for OpenMP given there's been some issues in the past setting this up.

References

The text was updated successfully, but these errors were encountered:

jameslamb · 2024-03-03T21:41:35Z

Thanks for writing this up!

We could give better guidance here. Until a doc like that's added, please post comments on this issue with specific questions and one of us will help.

nicklamiller · 2024-03-13T05:46:12Z

Thanks for outlining the dev environment setup steps! That helped a lot and seems like a great foundation for the developer setup documentation.

Following these initial steps, running pytest tests/python_package_test resulted in segmentation fault errors. This appeared to be an openMP issue and the original gcc and g++ compilers on my OS didn't have OpenMP support, so I brew install libomp and aliased my original gcc to gcc-13 and g++ to g++-13. These compilers did have openMP support, as I could compile dummy c and cpp files that used openMP. However, running pytest tests/python_package_test resulted in the same segfault errors as before.

After this, I followed the steps outlined in #4229 (comment) and am now getting aTypeError: Wrong type(ChunkedArray) error for several tests, all in test_arrow.py. I'd appreciate any feedback if you know a good way to handle this error/have seen it before and if it's indicative of a faulty setup 🙏.

Original segmentation fault message

tests/python_package_test/test_arrow.py Fatal Python error: Segmentation fault

Thread 0x00000001fd04e080 (most recent call first):
  File "/Users/nick/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py", line 2377 in __init_from_csr
  File "/Users/nick/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py"[1]    2968 segmentation fault  pytest tests/python_package_test

Segmentation fault message in LLDB

============================= test session starts ==============================
platform darwin -- Python 3.11.8, pytest-8.0.1, pluggy-1.4.0
rootdir: /Users/nick/development/LightGBM
plugins: cov-4.1.0
collected 710 items / 1 skipped                                      		 

Process 3635 stopped
* thread #20, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
    frame #0: 0x0000000175b1f7d4 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
->  0x175b1f7d4 <+32>: ldr    w8, [x0, #0x540]
    0x175b1f7d8 <+36>: nop    
    0x175b1f7dc <+40>: ldr    w9, 0x175b51308  		 ; _MergedGlobals + 8
    0x175b1f7e0 <+44>: add    w20, w9, #0x1
  thread #21, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
    frame #0: 0x0000000175b1f7d4 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
->  0x175b1f7d4 <+32>: ldr    w8, [x0, #0x540]
    0x175b1f7d8 <+36>: nop    
    0x175b1f7dc <+40>: ldr    w9, 0x175b51308  		 ; _MergedGlobals + 8
    0x175b1f7e0 <+44>: add    w20, w9, #0x1
Target 0: (python) stopped.

TypeError: Wrong type(ChunkedArray) errors


========================================================================= short test summary info ==========================================================================
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params0] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params1] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params2] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params3] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params4] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params5] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fields_fuzzy - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-array-label_data0] - TypeError: Wrong type(Int8Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-array-label_data0] - TypeError: Wrong type(Int16Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-array-label_data0] - TypeError: Wrong type(Int32Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-array-label_data0] - TypeError: Wrong type(Int64Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-array-label_data0] - TypeError: Wrong type(UInt8Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-array-label_data0] - TypeError: Wrong type(UInt16Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-array-label_data0] - TypeError: Wrong type(UInt32Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-array-label_data0] - TypeError: Wrong type(UInt64Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-array-label_data0] - TypeError: Wrong type(FloatArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-array-label_data0] - TypeError: Wrong type(DoubleArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights_none - TypeError: Wrong type(Int64Array) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-array-weight_data0] - TypeError: Wrong type(FloatArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-chunked_array-weight_data1] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-chunked_array-weight_data2] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-chunked_array-weight_data3] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-array-weight_data0] - TypeError: Wrong type(DoubleArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-chunked_array-weight_data1] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-chunked_array-weight_data2] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-chunked_array-weight_data3] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-array-group_data0] - TypeError: Wrong type(Int8Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-array-group_data0] - TypeError: Wrong type(Int16Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-array-group_data0] - TypeError: Wrong type(Int32Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-array-group_data0] - TypeError: Wrong type(Int64Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-array-group_data0] - TypeError: Wrong type(UInt8Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-array-group_data0] - TypeError: Wrong type(UInt16Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-array-group_data0] - TypeError: Wrong type(UInt32Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-array-group_data0] - TypeError: Wrong type(UInt64Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_table - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_predict_regression - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_predict_binary_classification - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_predict_multiclass_classification - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_predict_ranking - TypeError: Wrong type(ChunkedArray) for label.

OS info

OS: macOS 13.5 (Ventura)
CPU: M2 chip
compiler: AppleClang 15.0.0
Python: 3.11.7
OpenMP (libomp): 18.1.1

jameslamb · 2024-03-14T21:29:52Z

Thanks for the very detailed write-up and for working through this! Sorry it isn't easier.

I'm on mobile right now so apologies for being brief, but wanted to help unblock you. Try replacing this

cmake ..

with this

cmake -DUSE_OPENMP=OFF

nicklamiller · 2024-03-15T00:17:13Z

Thanks for the prompt response, and no worries, uncovering these speed bumps is giving a lot of fodder for the developer env documentation.

After making that change, I'm still getting the TypeError: Wrong type(ChunkedArray) errors and they're still in test_arrow.py. I noticed without the -DUSE_OPENMP=OFF (i.e. following the instructions in #4229 (comment) exactly)

otool -L lib_lightgbm.so

returns

lib_lightgbm.so:
        @rpath/lib_lightgbm.so (compatibility version 0.0.0, current version 0.0.0)
        /opt/homebrew/opt/libomp/lib/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1600.151.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)

Whereas building with -DUSE_OPENMP=OFF

otool -L lib_lightgbm.so

returns

lib_lightgbm.so:
        @rpath/lib_lightgbm.so (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1600.151.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)

So it looks like lib_lightgbm.so no longer links to OpenMP and addition of -DUSE_OPENMP=OFF was successful. This makes me wonder if this problem has to do with something outside of OpenMP. I see ChunkedArray is defined in a cpp file, and taking test_predict_ranking as an example, it looks like a python method _list_to_1d_numpy fails as it doesn't expect a ChunkedArray. Maybe there's an incompatibility between the cpp code and the python code, though I'm not sure how this could happen as I'm building the package with the most up-to-date code pulled.

test_predict_ranking specific error

/Users/nick/development/LightGBM/tests/python_package_test/test_arrow.py::test_predict_ranking failed: def test_predict_ranking():
        data = generate_random_arrow_table(10, 10000, 42)
        dataset = lgb.Dataset(
            data,
            label=generate_random_arrow_array(10000, 43, generate_nulls=False, values=np.arange(4)),
            group=np.array([1000, 2000, 3000, 4000]),
            params=dummy_dataset_params(),
        )
>       booster = lgb.train(
            {"objective": "lambdarank", "num_leaves": 7},
            dataset,
            num_boost_round=5,
        )

tests/python_package_test/test_arrow.py:372: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/engine.py:260: in train
    booster = Booster(params=params, train_set=train_set)
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:3624: in __init__
    train_set.construct()
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:2563: in construct
    self._lazy_init(
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:2177: in _lazy_init
    self.set_label(label)
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:3050: in set_label
    label_array = _list_to_1d_numpy(label, dtype=np.float32, name="label")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

data = <pyarrow.lib.ChunkedArray object at 0x288e0bab0>
[
  [
    2,
    2,
    1,
    0,
    2,
    ...
    3,
    2,
    0,
    0,
    0
  ]
]
dtype = <class 'numpy.float32'>, name = 'label'

    def _list_to_1d_numpy(
        data: Any,
        dtype: "np.typing.DTypeLike",
        name: str,
    ) -> np.ndarray:
        """Convert data to numpy 1-D array."""
        if _is_numpy_1d_array(data):
            return _cast_numpy_array_to_dtype(data, dtype)
        elif _is_numpy_column_array(data):
            _log_warning("Converting column-vector to 1d array")
            array = data.ravel()
            return _cast_numpy_array_to_dtype(array, dtype)
        elif _is_1d_list(data):
            return np.array(data, dtype=dtype, copy=False)
        elif isinstance(data, pd_Series):
            _check_for_bad_pandas_dtypes(data.to_frame().dtypes)
            return np.array(data, dtype=dtype, copy=False)  # SparseArray should be supported as well
        else:
>           raise TypeError(
                f"Wrong type({type(data).__name__}) for {name}.\n" "It should be list, numpy 1-D array or pandas Series"
            )
E           TypeError: Wrong type(ChunkedArray) for label.
E           It should be list, numpy 1-D array or pandas Series

../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:362: TypeError

jameslamb · 2024-03-15T00:58:10Z

if this problem has to do with something outside of OpenMP.

Turning off OpenMP linking was to help with the segfaults you reported. As you found with #4229 and similar, LightGBM's Python package has some outstanding issues with OpenMP support on macOS.

TypeError: Wrong type(ChunkedArray) errors and they're still in test_arrow.py

Please install pyarrow in your development environment and try again.

conda install -c conda-forge --yes pyarrow

Alternatively, ignore the Arrow tests if you're just working on the scikit-learn interface (as I suspect you are, given our discussion in #6310).

pytest tests/python_package_tests/test_sklearn.py

jameslamb added the doc label Mar 3, 2024

This was referenced Mar 15, 2024

[ci] [python-package] Python tests leave files behind #6361

Closed

[docs] [ci] encourage use of cmake --build #6368

Merged

jameslamb mentioned this issue Mar 22, 2024

[ci] raise floors on CI dependencies #6375

Merged

jameslamb mentioned this issue Apr 1, 2024

[python-package] Early stopping callback added when early_stopping_round = 0 #6401

Closed

jameslamb mentioned this issue Apr 25, 2024

[python-package] verbose does not supress warnings with custom objectives #6014

Closed

jameslamb mentioned this issue Jun 14, 2024

LightGBM contributions #6482

Closed

jameslamb mentioned this issue Jan 7, 2025

[python-package] Workflow to install locally and make contributions #6777

Closed

jameslamb added the question label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] Documentation on setting up development environment #6350

[python-package] Documentation on setting up development environment #6350

nicklamiller commented Mar 3, 2024

jameslamb commented Mar 3, 2024

nicklamiller commented Mar 13, 2024 •

edited

Loading

jameslamb commented Mar 14, 2024

nicklamiller commented Mar 15, 2024 •

edited

Loading

jameslamb commented Mar 15, 2024

[python-package] Documentation on setting up development environment #6350

[python-package] Documentation on setting up development environment #6350

Comments

nicklamiller commented Mar 3, 2024

Summary

Motivation

Description

References

jameslamb commented Mar 3, 2024

nicklamiller commented Mar 13, 2024 • edited Loading

jameslamb commented Mar 14, 2024

nicklamiller commented Mar 15, 2024 • edited Loading

jameslamb commented Mar 15, 2024

nicklamiller commented Mar 13, 2024 •

edited

Loading

nicklamiller commented Mar 15, 2024 •

edited

Loading