-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DEPR: Deprecate using xlrd
engine for read_excel
#35029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
3a76a36
101aa97
081ecf8
ada4354
3233381
0f4c8a1
499f9a0
825c61c
88093f6
44f157b
fffbacb
bb53725
d8dcb04
f9876dd
bc3ec47
fe10a89
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,20 @@ including other versions of pandas. | |
|
||
{{ header }} | ||
|
||
.. warning:: | ||
|
||
Previously, the default argument ``engine=None`` to ``pd.read_excel`` | ||
would result in using the xlrd engine in many cases. The engine xlrd is no longer | ||
maintained, and is not supported with python >= 3.9. When ``engine=None``, the | ||
following logic is now used to determine the engine. | ||
|
||
- If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt), then odf will be used. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. link for odf: https://pypi.org/project/odfpy/ |
||
- Otherwise if ``path_or_buffer`` is a bytes stream, the file has the extension ``.xls``, or is an xlrd Book instance, then xlrd will be used. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. double backtick xlrd / odf (only put the docs link on L14) |
||
- Otherwise if openpyxl is installed, then openpyxl will be used. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
- Otherwise xlrd will be used and a ``FutureWarning`` will be raised. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would maybe rearrange this list: the most important piece of information we want to convey here is that for xlsx files the default changed from xlrd to openpyxl, if installed. So I would also put that on top of the list (or keep it just to this for the whatsnew, as the other items didn't change. The full list is still in the actual docs). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't actually look at the extension or the file format when determining the engine in various cases, so it isn't just changing for xlsx files, right? What do you think of this:
|
||
|
||
Specifying ``engine="xlrd"`` will continue to be allowed for the indefinite future. | ||
|
||
.. --------------------------------------------------------------------------- | ||
|
||
Enhancements | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,17 @@ | ||
import abc | ||
import datetime | ||
import inspect | ||
from io import BufferedIOBase, BytesIO, RawIOBase | ||
import os | ||
from textwrap import fill | ||
from typing import Any, Dict, Mapping, Union, cast | ||
import warnings | ||
|
||
from pandas._config import config | ||
|
||
from pandas._libs.parsers import STR_NA_VALUES | ||
from pandas._typing import Buffer, FilePathOrBuffer, StorageOptions | ||
from pandas.compat._optional import import_optional_dependency | ||
from pandas.errors import EmptyDataError | ||
from pandas.util._decorators import Appender, deprecate_nonkeyword_arguments | ||
|
||
|
@@ -99,12 +102,29 @@ | |
of dtype conversion. | ||
engine : str, default None | ||
If io is not a buffer or path, this must be set to identify io. | ||
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb", default "xlrd". | ||
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb". | ||
Engine compatibility : | ||
|
||
- "xlrd" supports most old/new Excel file formats. | ||
rhshadrach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- "openpyxl" supports newer Excel file formats. | ||
- "odf" supports OpenDocument file formats (.odf, .ods, .odt). | ||
- "pyxlsb" supports Binary Excel files. | ||
|
||
.. versionchanged:: 1.2.0 | ||
The engine xlrd is no longer maintained, and is not supported with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think need a blank line here to render (make this section the same as in the whatsnew as per formatting) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, for versionchanged this is OK (rst .. ;-)) |
||
python >= 3.9. When ``engine=None``, the following logic will be | ||
used to determine the engine. | ||
|
||
- If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt), | ||
then odf will be used. | ||
- Otherwise if ``path_or_buffer`` is a bytes stream, the file has the | ||
extension ``.xls``, or is an xlrd Book instance, then xlrd will be used. | ||
- Otherwise if openpyxl is installed, then openpyxl will be used. | ||
- Otherwise xlrd will be used and a ``FutureWarning`` will be raised. | ||
|
||
Specifying ``engine="xlrd"`` will continue to be allowed for the | ||
indefinite future. | ||
|
||
converters : dict, default None | ||
Dict of functions for converting values in certain columns. Keys can | ||
either be integers or column labels, values are functions that take one | ||
|
@@ -877,13 +897,29 @@ class ExcelFile: | |
.xls, .xlsx, .xlsb, .xlsm, .odf, .ods, or .odt file. | ||
engine : str, default None | ||
If io is not a buffer or path, this must be set to identify io. | ||
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``, | ||
default ``xlrd``. | ||
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb`` | ||
Engine compatibility : | ||
|
||
- ``xlrd`` supports most old/new Excel file formats. | ||
- ``openpyxl`` supports newer Excel file formats. | ||
- ``odf`` supports OpenDocument file formats (.odf, .ods, .odt). | ||
- ``pyxlsb`` supports Binary Excel files. | ||
|
||
.. versionchanged:: 1.2.0 | ||
|
||
The engine xlrd is no longer maintained, and is not supported with | ||
python >= 3.9. When ``engine=None``, the following logic will be | ||
used to determine the engine. | ||
|
||
- If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. obviously as much of the formatting you can do here as well |
||
then odf will be used. | ||
- Otherwise if ``path_or_buffer`` is a bytes stream, the file has the | ||
extension ``.xls``, or is an xlrd Book instance, then xlrd will be used. | ||
- Otherwise if openpyxl is installed, then openpyxl will be used. | ||
- Otherwise xlrd will be used and a ``FutureWarning`` will be raised. | ||
|
||
Specifying ``engine="xlrd"`` will continue to be allowed for the | ||
indefinite future. | ||
""" | ||
|
||
from pandas.io.excel._odfreader import ODFReader | ||
|
@@ -902,14 +938,59 @@ def __init__( | |
self, path_or_buffer, engine=None, storage_options: StorageOptions = None | ||
): | ||
if engine is None: | ||
engine = "xlrd" | ||
# Determine ext and use odf for ods stream/file | ||
if isinstance(path_or_buffer, (BufferedIOBase, RawIOBase)): | ||
ext = None | ||
if _is_ods_stream(path_or_buffer): | ||
engine = "odf" | ||
else: | ||
ext = os.path.splitext(str(path_or_buffer))[-1] | ||
if ext == ".ods": | ||
engine = "odf" | ||
|
||
WillAyd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if ( | ||
import_optional_dependency( | ||
"xlrd", raise_on_missing=False, on_version="ignore" | ||
) | ||
is not None | ||
): | ||
from xlrd import Book | ||
|
||
if isinstance(path_or_buffer, Book): | ||
engine = "xlrd" | ||
|
||
# GH 35029 - Prefer openpyxl except for xls files | ||
if engine is None: | ||
if ext is None or isinstance(path_or_buffer, bytes) or ext == ".xls": | ||
engine = "xlrd" | ||
elif ( | ||
import_optional_dependency( | ||
"openpyxl", raise_on_missing=False, on_version="ignore" | ||
) | ||
is not None | ||
): | ||
engine = "openpyxl" | ||
else: | ||
caller = inspect.stack()[1] | ||
if ( | ||
caller.filename.endswith("pandas/io/excel/_base.py") | ||
and caller.function == "read_excel" | ||
): | ||
stacklevel = 4 | ||
else: | ||
stacklevel = 2 | ||
warnings.warn( | ||
"The xlrd engine is no longer maintained and is not " | ||
"supported when using pandas with python >= 3.9. However, " | ||
"the engine xlrd will continue to be allowed for the " | ||
"indefinite future. Beginning with pandas 1.2.0, the " | ||
"openpyxl engine will be used if it is installed and the " | ||
"engine argument is not specified. Either install openpyxl " | ||
"or specify engine='xlrd' to silence this warning.", | ||
FutureWarning, | ||
stacklevel=stacklevel, | ||
) | ||
engine = "xlrd" | ||
if engine not in self._engines: | ||
raise ValueError(f"Unknown engine: {engine}") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double back-tick on xlrd (alt can put a link to xlrd itself, e.g. https://xlrd.readthedocs.io/en/latest/)