-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Documentation on setting up development environment #6350
Comments
Thanks for writing this up! We could give better guidance here. Until a doc like that's added, please post comments on this issue with specific questions and one of us will help. |
Thanks for outlining the dev environment setup steps! That helped a lot and seems like a great foundation for the developer setup documentation. Following these initial steps, running After this, I followed the steps outlined in #4229 (comment) and am now getting a Original segmentation fault message
Segmentation fault message in LLDB
TypeError: Wrong type(ChunkedArray) errors
OS info
|
Thanks for the very detailed write-up and for working through this! Sorry it isn't easier. I'm on mobile right now so apologies for being brief, but wanted to help unblock you. Try replacing this cmake .. with this cmake -DUSE_OPENMP=OFF |
Thanks for the prompt response, and no worries, uncovering these speed bumps is giving a lot of fodder for the developer env documentation. After making that change, I'm still getting the
returns
Whereas building with
returns
So it looks like test_predict_ranking specific error
/Users/nick/development/LightGBM/tests/python_package_test/test_arrow.py::test_predict_ranking failed: def test_predict_ranking():
data = generate_random_arrow_table(10, 10000, 42)
dataset = lgb.Dataset(
data,
label=generate_random_arrow_array(10000, 43, generate_nulls=False, values=np.arange(4)),
group=np.array([1000, 2000, 3000, 4000]),
params=dummy_dataset_params(),
)
> booster = lgb.train(
{"objective": "lambdarank", "num_leaves": 7},
dataset,
num_boost_round=5,
)
tests/python_package_test/test_arrow.py:372:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/engine.py:260: in train
booster = Booster(params=params, train_set=train_set)
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:3624: in __init__
train_set.construct()
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:2563: in construct
self._lazy_init(
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:2177: in _lazy_init
self.set_label(label)
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:3050: in set_label
label_array = _list_to_1d_numpy(label, dtype=np.float32, name="label")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
data = <pyarrow.lib.ChunkedArray object at 0x288e0bab0>
[
[
2,
2,
1,
0,
2,
...
3,
2,
0,
0,
0
]
]
dtype = <class 'numpy.float32'>, name = 'label'
def _list_to_1d_numpy(
data: Any,
dtype: "np.typing.DTypeLike",
name: str,
) -> np.ndarray:
"""Convert data to numpy 1-D array."""
if _is_numpy_1d_array(data):
return _cast_numpy_array_to_dtype(data, dtype)
elif _is_numpy_column_array(data):
_log_warning("Converting column-vector to 1d array")
array = data.ravel()
return _cast_numpy_array_to_dtype(array, dtype)
elif _is_1d_list(data):
return np.array(data, dtype=dtype, copy=False)
elif isinstance(data, pd_Series):
_check_for_bad_pandas_dtypes(data.to_frame().dtypes)
return np.array(data, dtype=dtype, copy=False) # SparseArray should be supported as well
else:
> raise TypeError(
f"Wrong type({type(data).__name__}) for {name}.\n" "It should be list, numpy 1-D array or pandas Series"
)
E TypeError: Wrong type(ChunkedArray) for label.
E It should be list, numpy 1-D array or pandas Series
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:362: TypeError |
Turning off OpenMP linking was to help with the segfaults you reported. As you found with #4229 and similar, LightGBM's Python package has some outstanding issues with OpenMP support on macOS.
Please install conda install -c conda-forge --yes pyarrow Alternatively, ignore the Arrow tests if you're just working on the scikit-learn interface (as I suspect you are, given our discussion in #6310). pytest tests/python_package_tests/test_sklearn.py |
Summary
Add documentation describing how to setup an environment for developing in the python-package.
Motivation
Currently there's no specification on how to setup an environment for developing in the python-package #6310 (comment), adding this would make the contribution process smoother.
Description
CONTRIBUTING.md
with step-by-step instructions on how to setup developer environment (as outlined in [python-package] Addfeature_names_in_
attribute for scikit-learn estimators (fixes #6279) #6310 (comment)), including:dev
section to the existingpyproject.toml
underproject.optional-dependencies
with dev dependencies, so one can install viapip install '.[dev]'
References
The text was updated successfully, but these errors were encountered: