forked from Aloisius/ia-web-commons
-
Notifications
You must be signed in to change notification settings - Fork 6
Integrate upstream changes from iipc/webarchive-commons 3.0.0 #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
HttpClient 3 was discontinued in 2007 and frequently triggers alerts in dependency vulnerability scanners. We're also not using much of it anymore, with one big exception. The URI class is the foundation of UsableURI and central to Heritrix which has made removing the library difficult. URIException in particular appears a lot in client code. HttpClient 4+ has switched to java.net.URI and the main reason Heritrix was built on HttpClient URI instead was because java.net.URI is not flexible and differs from how browsers behave. (Although, how browsers behave has shifted over time.) Eventually we'll probably need to rework Heritrix's URI handling to follow the WhatWG URL spec. However, to let us remove the dependency while keeping UsableURI working, this copies HttpClient 3's URI, URIException and ChunkedInputStream with some small tweaks remove their dependency on other classes in HttpClient. The HttpClient Header class is replaced with our existing HttpHeader. URI and ChunkedInputStream are marked package private for now. This is a breaking API change and will trigger a bump of the major version number.
Upgrade to JUnit 5
Remove dependency on Apache Commons HttpClient 3.1
Remove deprecated code for 2.0.0 release
Using RecordingInputStream requires an awkward workaround when the API being recorded is not in the form of an InputStream, for example, if it's asynchronous. This adds a method to access the underlying RecordingOutputStream so you can write to it directly when that would be easier.
…tream Add RecordingInputStream.asOutputStream()
We'll do this in settings instead.
Turns out this is used quite a bit.
…lic suffixes file to avoid collisions with other jars
…parsing Fix: public suffixes tld parsing
commons-lang 2.x was last released in 2011 and has unpatched vulnerabilities.
Upgrade from commons-lang 2.6 to commons-lang3 3.18.0
Author
|
Successfully tested converting 8 WARC files to WAT and WET.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This merges the upstream branch master into the code base, updating the version from 1.3.1-SNAPSHOT to 3.0.1-SNAPSHOT. Changes from upstream:
effective_tld_names.datand move toorg/archive/effective_tld_names.datto prevent conflict with the list shipped together with "crawler-commons"