Skip to content

fix(mangakakalot): exclude banner ads and non-CDN images from chapter pages#143

Open
malaquiasdev wants to merge 1 commit into
mainfrom
fix/102-mangakakalot-banner-filter
Open

fix(mangakakalot): exclude banner ads and non-CDN images from chapter pages#143
malaquiasdev wants to merge 1 commit into
mainfrom
fix/102-mangakakalot-banner-filter

Conversation

@malaquiasdev

@malaquiasdev malaquiasdev commented May 13, 2026

Copy link
Copy Markdown
Owner

What this PR does

Adds a pure predicate isMangaPageImage(url) to src/integrations/mangakakalot/client/parser.ts and wires it into parseChapterImages. The parser now rejects images by three rules:

  • .gif file extension
  • mangakakalot.gg / www.mangakakalot.gg host (real pages come from external CDN like *.2xstorage.com)
  • /images/bns/ path segment (banner namespace)

After filtering, page is renumbered sequentially starting at 1 so there are no gaps.

Why it exists

P0 bug (#102). Real-session log showed every mangakakalot CBZ contained at least one banner ad as a "page":

{"event":"mangakakalot.image_fetch","url":"https://www.mangakakalot.gg/images/bns/common/ehentaiai.gif"}

Root cause: $(SELECTORS.chapterReaderImage).each(...) pulled every <img> inside .container-chapter-reader and used the loop index as the page number, with no URL filter. Affected 100% of CBZs produced via the mangakakalot fallback today.

How it works internally

  • New helper isMangaPageImage(url: string): boolean near the bottom of parser.ts. Pure, exported for unit testing. Uses new URL(url) to extract host + pathname; URLs that fail to parse are rejected (safer default).
  • parseChapterImages builds the result array via a pageNum counter that only increments after the predicate passes — sequential renumbering is real, not a side-effect of the loop index.
  • Regression fixture tests/fixtures/mangakakalot/chapter-with-banners.html: 4 real *.2xstorage.com page imgs interleaved with 3 banners (one for each rejection rule). Test asserts exactly 4 results with pages [1,2,3,4].

What did not change

  • fetchChapterImages HTTP code untouched — parser-only fix.
  • mangadex and fallback-http integrations untouched.
  • No new logger events, no scope creep into other parsers in the same file.
  • Existing fixture chapter-naruto-1.html and its test path remain valid.

Test checklist

  • bun test — 436 pass, 0 fail
  • bun run typecheck — clean
  • bun run check — clean
  • Predicate unit tests cover accept path (webp/jpg/jpeg/png on CDN host) and each rejection rule independently
  • chapter-with-banners.html fixture asserts banner exclusion + sequential renumbering [1,2,3,4]
  • No regression on chapter-naruto-1.html (3 pages, [1,2,3])

Reviewer notes

QA flagged two P1 test gaps (non-blocking, behavior is correct; tests just don't document it):

  • Uppercase .GIF — regex /\.gif(?:[?#].*)?$/i is already case-insensitive, but no test pins the flag.
  • .gif buried inside a query string (e.g. ?src=banner.gif) — current regex anchors on pathname, so this is not rejected by extension rule. Host or path rule may still catch it. Worth deciding whether to harden.

Inspector flagged two P2 forward risks (not bugs in this fix):

  • Relative data-src URLs would throw inside new URL(rawUrl) and be silently dropped. new URL(rawUrl, chapterUrl) would survive a future DOM change.
  • SITE_HOSTS is an exact-match Sets3.mangakakalot.gg-style subdomains wouldn't be caught. A regex /(^|\.)mangakakalot\.gg$/i would be more defensive but is scope creep relative to fix(mangakakalot): chapter parser includes banner ads as pages in CBZ #102.

Both items intentionally left for follow-up.

Closes

Closes #102

✅ Checklist

  • Tested locally
  • Self-review complete
  • No console errors
  • Code follows project conventions
  • Docs updated in docs/ (not applicable — bug fix, no behavior surface change)

… pages

Adds isMangaPageImage predicate that rejects images from the mangakakalot.gg
host, the /images/bns/ banner namespace, or with a .gif extension. Page numbers
are renumbered sequentially after filtering. Closes #102.
@malaquiasdev malaquiasdev marked this pull request as ready for review May 13, 2026 04:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(mangakakalot): chapter parser includes banner ads as pages in CBZ

1 participant