fix(mangakakalot): exclude banner ads and non-CDN images from chapter pages by malaquiasdev · Pull Request #143 · malaquiasdev/scanldr

malaquiasdev · 2026-05-13T04:09:03Z

What this PR does

Adds a pure predicate isMangaPageImage(url) to src/integrations/mangakakalot/client/parser.ts and wires it into parseChapterImages. The parser now rejects images by three rules:

.gif file extension
mangakakalot.gg / www.mangakakalot.gg host (real pages come from external CDN like *.2xstorage.com)
/images/bns/ path segment (banner namespace)

After filtering, page is renumbered sequentially starting at 1 so there are no gaps.

Why it exists

P0 bug (#102). Real-session log showed every mangakakalot CBZ contained at least one banner ad as a "page":

{"event":"mangakakalot.image_fetch","url":"https://www.mangakakalot.gg/images/bns/common/ehentaiai.gif"}

Root cause: $(SELECTORS.chapterReaderImage).each(...) pulled every <img> inside .container-chapter-reader and used the loop index as the page number, with no URL filter. Affected 100% of CBZs produced via the mangakakalot fallback today.

How it works internally

New helper isMangaPageImage(url: string): boolean near the bottom of parser.ts. Pure, exported for unit testing. Uses new URL(url) to extract host + pathname; URLs that fail to parse are rejected (safer default).
parseChapterImages builds the result array via a pageNum counter that only increments after the predicate passes — sequential renumbering is real, not a side-effect of the loop index.
Regression fixture tests/fixtures/mangakakalot/chapter-with-banners.html: 4 real *.2xstorage.com page imgs interleaved with 3 banners (one for each rejection rule). Test asserts exactly 4 results with pages [1,2,3,4].

What did not change

fetchChapterImages HTTP code untouched — parser-only fix.
mangadex and fallback-http integrations untouched.
No new logger events, no scope creep into other parsers in the same file.
Existing fixture chapter-naruto-1.html and its test path remain valid.

Test checklist

bun test — 436 pass, 0 fail
bun run typecheck — clean
bun run check — clean
Predicate unit tests cover accept path (webp/jpg/jpeg/png on CDN host) and each rejection rule independently
chapter-with-banners.html fixture asserts banner exclusion + sequential renumbering [1,2,3,4]
No regression on chapter-naruto-1.html (3 pages, [1,2,3])

Reviewer notes

QA flagged two P1 test gaps (non-blocking, behavior is correct; tests just don't document it):

Uppercase .GIF — regex /\.gif(?:[?#].*)?$/i is already case-insensitive, but no test pins the flag.
.gif buried inside a query string (e.g. ?src=banner.gif) — current regex anchors on pathname, so this is not rejected by extension rule. Host or path rule may still catch it. Worth deciding whether to harden.

Inspector flagged two P2 forward risks (not bugs in this fix):

Relative data-src URLs would throw inside new URL(rawUrl) and be silently dropped. new URL(rawUrl, chapterUrl) would survive a future DOM change.
SITE_HOSTS is an exact-match Set — s3.mangakakalot.gg-style subdomains wouldn't be caught. A regex /(^|\.)mangakakalot\.gg$/i would be more defensive but is scope creep relative to fix(mangakakalot): chapter parser includes banner ads as pages in CBZ #102.

Both items intentionally left for follow-up.

Closes

Closes #102

✅ Checklist

Tested locally
Self-review complete
No console errors
Code follows project conventions
Docs updated in docs/ (not applicable — bug fix, no behavior surface change)

… pages Adds isMangaPageImage predicate that rejects images from the mangakakalot.gg host, the /images/bns/ banner namespace, or with a .gif extension. Page numbers are renumbered sequentially after filtering. Closes #102.

malaquiasdev marked this pull request as ready for review May 13, 2026 04:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mangakakalot): exclude banner ads and non-CDN images from chapter pages#143

fix(mangakakalot): exclude banner ads and non-CDN images from chapter pages#143
malaquiasdev wants to merge 1 commit into
mainfrom
fix/102-mangakakalot-banner-filter

malaquiasdev commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

malaquiasdev commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why it exists

How it works internally

What did not change

Test checklist

Reviewer notes

Closes

✅ Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

malaquiasdev commented May 13, 2026 •

edited

Loading