Skip to content

Commit 97dc088

Browse files
authored
update dependencies and prepare v1.7.0 (#486)
* update dependencies and prepare v1.7.0 * check docs
1 parent a19bb5d commit 97dc088

File tree

5 files changed

+23
-9
lines changed

5 files changed

+23
-9
lines changed

HISTORY.md

+14
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,20 @@
11
## History / Changelog
22

33

4+
### 1.7.0
5+
6+
Extraction:
7+
- improved `html2txt()` function
8+
9+
Downloads:
10+
- add advanced `fetch_response()` function
11+
→ pending deprecation for `fetch_url(decode=False)`
12+
13+
Maintenance:
14+
- support for LXML v5+ (#484 by @knit-bee, #485)
15+
- update [htmldate](https://github.com/adbar/htmldate/releases/tag/v1.7.0)
16+
17+
418
### 1.6.4
519

620
Maintenance:

README.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ Evaluation and alternatives
8787

8888
Trafilatura consistently outperforms other open-source libraries in text extraction benchmarks, showcasing its efficiency and accuracy in extracting web content. The extractor tries to strike a balance between limiting noise and including all valid parts.
8989

90-
For more detailed results see the `benchmark <https://trafilatura.readthedocs.io/en/latest/evaluation.html>`_. The results can be reproduced, see the `evaluation readme <https://github.com/adbar/trafilatura/blob/master/tests/README.rst>_` for instructions.
90+
For more detailed results see the `benchmark <https://trafilatura.readthedocs.io/en/latest/evaluation.html>`_. The results can be reproduced, see the `evaluation readme <https://github.com/adbar/trafilatura/blob/master/tests/README.rst>`_ for instructions.
9191

9292
=============================== ========= ========== ========= ========= ======
9393
750 documents, 2236 text & 2250 boilerplate segments (2022-05-18), Python 3.8

docs/usage-python.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -313,18 +313,18 @@ The function ``bare_extraction`` can be used to bypass output conversion, it ret
313313
Raw HTTP response objects
314314
^^^^^^^^^^^^^^^^^^^^^^^^^
315315

316-
The ``fetch_url()`` function can pass a urllib3 response object straight to the extraction by setting the optional ``decode`` argument to ``False``.
316+
The ``fetch_response()`` function can pass a response object straight to the extraction.
317317

318318
This can be useful to get the final redirection URL with ``response.url`` and then pass is directly as a URL argument to the extraction function:
319319

320320
.. code-block:: python
321321
322322
# necessary components
323-
>>> from trafilatura import fetch_url, bare_extraction
323+
>>> from trafilatura import fetch_response, bare_extraction
324324
# load an example
325-
>>> response = fetch_url("https://www.example.org", decode=False)
325+
>>> response = fetch_response("https://www.example.org")
326326
# perform extract() or bare_extraction() on Trafilatura's response object
327-
>>> bare_extraction(response, url=response.url) # here is the redirection URL
327+
>>> bare_extraction(response.data, url=response.url) # here is the redirection URL
328328
329329
330330
LXML objects

setup.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ def get_long_description():
3030
"all": [
3131
"brotli",
3232
"cchardet >= 2.1.7; python_version < '3.11'", # build issue
33-
"faust-cchardet >= 2.1.18; python_version >= '3.11'", # fix for build
34-
"htmldate[speed] >= 1.6.0",
33+
"faust-cchardet >= 2.1.19; python_version >= '3.11'",
34+
"htmldate[speed] >= 1.7.0",
3535
"py3langid >= 0.2.2",
3636
"pycurl >= 7.45.2",
3737
],
@@ -112,7 +112,7 @@ def get_long_description():
112112
"charset_normalizer >= 3.0.1; python_version < '3.7'",
113113
"charset_normalizer >= 3.2.0; python_version >= '3.7'",
114114
"courlan >= 0.9.5",
115-
"htmldate >= 1.6.1",
115+
"htmldate >= 1.7.0",
116116
"importlib_metadata; python_version < '3.8'",
117117
"justext >= 3.0.0",
118118
# see tests on Github Actions

trafilatura/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
__author__ = 'Adrien Barbaresi and contributors'
1010
__license__ = 'GNU GPL v3+'
1111
__copyright__ = 'Copyright 2019-2024, Adrien Barbaresi'
12-
__version__ = '1.6.4'
12+
__version__ = '1.7.0'
1313

1414

1515
import logging

0 commit comments

Comments
 (0)