Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalize video IDs before checking validity #114

Open
TheTechRobo opened this issue Dec 31, 2024 · 0 comments
Open

Canonicalize video IDs before checking validity #114

TheTechRobo opened this issue Dec 31, 2024 · 0 comments
Labels
enhancement New feature or request module:youtube

Comments

@TheTechRobo
Copy link
Owner

TheTechRobo commented Dec 31, 2024

The last character of a YouTube video ID can only be one of 16 characters. Until sometime in 2020, YouTube accepted video IDs where this last character was close enough, so -sHIYAaJ7CK is invalid but may previously have been accepted as -sHIYAaJ7CI, which is valid.

The YouTube Video Finder should perform this canonicalization before searching. Maybe it should try it both canonicalized and uncanonicalized? Because if the URL is damaged, it may have been archived like that before YouTube stopped supporting that URL format.

It's as easy as decoding the base64 and reencoding it, because most base64 decoders will drop the last two bits (which are the ones that can get mangled). In Python:

pkey = base64.b64decode(id + "==", "-_")
canonicalized_id = base64.b64encode(a, b"-_").strip(b"=").decode()

Cf. https://wiki.archiveteam.org/index.php/YouTube/Technical_details#Videos

@TheTechRobo TheTechRobo added enhancement New feature or request module:youtube labels Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module:youtube
Projects
None yet
Development

No branches or pull requests

1 participant