New scraper to fetch a website from web.archive.org #1243

vitaly-zdanevich · 2024-12-21T02:05:19Z

No description provided.

benoit74 · 2025-01-06T08:55:23Z

This is not the purpose of Zimit, but it is definitely doable. Most probably a different scraper is needed. I will move this issue to zim-requests repo.

I don't know if Internet Archive is proposing to download a website as a WARC file, it would simplify the thing since we would just have to download the WARC and reuse warc2zim to transform the WARC into a ZIM.

Note that financial sponsoring or contributor will be needed to move this issue forward, Kiwix does not have the resources to implement this in the coming months.

Do you have website(s) in mind we should use to test the scraper?

hamoudak · 2025-01-24T22:20:34Z

I've some websites that vanished from the web, they were once on the top of others, but they're in arabic, if you don't mind.
I have tried "Webrecorder ArchiveWeb.page" but failed, so may be you need to talk to Internet Archive about this someday.
for me I used "SingleFile" extension to save the most valuable topics as possible as I could. here is the websites;
https://ahlalhdeeth.com
https://web.archive.org/web/20140122061007/http://ahlalhdeeth.com/vb/index.php

https://www.ahlalloghah.com
https://web.archive.org/web/20111011184930/http://www.ahlalloghah.com/index.php

benoit74 transferred this issue from openzim/zimit Jan 6, 2025

benoit74 added the Scraper Needed We need to build a dedicated scraper for this website label Jan 6, 2025

benoit74 changed the title ~~Please provide a way to fetch a website from web.archive.org~~ New scraper to fetch a website from web.archive.org Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New scraper to fetch a website from web.archive.org #1243

New scraper to fetch a website from web.archive.org #1243

vitaly-zdanevich commented Dec 21, 2024

benoit74 commented Jan 6, 2025

hamoudak commented Jan 24, 2025 •

edited

Loading

New scraper to fetch a website from web.archive.org #1243

New scraper to fetch a website from web.archive.org #1243

Comments

vitaly-zdanevich commented Dec 21, 2024

benoit74 commented Jan 6, 2025

hamoudak commented Jan 24, 2025 • edited Loading

hamoudak commented Jan 24, 2025 •

edited

Loading