diff --git a/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/.openspec.yaml b/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/.openspec.yaml
new file mode 100644
index 0000000..5aae5cf
--- /dev/null
+++ b/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/.openspec.yaml
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-03-04
diff --git a/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/design.md b/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/design.md
new file mode 100644
index 0000000..41581c2
--- /dev/null
+++ b/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/design.md
@@ -0,0 +1,59 @@
+## Context
+
+The link graph is populated during `chunk_document(document_id)` by extracting markdown links and persisting unique `(from_document_id, to_document_id)` edges. Current filtering excludes self-links but still records links between different documents on the same canonical domain, which can over-amplify intra-site structures in the link-based ranking signal.
+
+This change is constrained to search-core link extraction and tests. No schema change is required.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Persist outbound edges only when source and target canonical domains differ.
+- Treat relative outbound links as same-domain by resolving against the source document canonical URL before filtering.
+- Preserve existing behavior for dedupe, unknown target handling (`document_id_for_url`), and lifecycle refresh semantics.
+- Update tests so link graph and ranking assertions reflect cross-domain-only edge persistence.
+
+**Non-Goals:**
+- Redefining canonical domain normalization (`netloc` remains the canonical domain key).
+- Changing DB schema or link table shape.
+- Introducing weighting by domain category or any other ranking formula changes beyond input edge set.
+
+## Decisions
+
+1. **Apply domain filtering in URL-to-target derivation before insertion**
+   - Decision: Extend the outbound link derivation helper to accept the source document canonical URL and skip targets whose normalized domain equals the source domain.
+   - Rationale: Keeps filtering in one place with self-link exclusion and dedupe, minimizing lifecycle-path drift.
+   - Alternative considered: Filter at SQL insert time by joining source document data per edge; rejected as more complex and less explicit for relative-link handling.
+
+2. **Resolve relative links against source canonical URL for domain checks**
+   - Decision: Use `urljoin(source_canonical_url, href)` prior to canonicalization/domain extraction when deriving targets.
+   - Rationale: Ensures internal relative links are consistently treated as same-domain and skipped.
+   - Alternative considered: Ignore relative links entirely; rejected because it changes existing link extraction semantics and may drop legitimate cross-domain absolute forms in mixed content.
+
+3. **Keep unknown-target behavior unchanged after filtering**
+   - Decision: After passing domain filter, derive `to_document_id` via `document_id_for_url(_canonicalize_url(resolved_url))` even if no target document exists.
+   - Rationale: Preserves current contract and avoids coupling to ingestion state.
+   - Alternative considered: Require target documents to exist before storing edges; rejected as scope creep and behavior regression.
+
+4. **Refactor tests to explicit multi-domain fixtures**
+   - Decision: Update link graph and link-score tests that currently rely on same-domain edges to use at least two domains.
+   - Rationale: Makes expectations align with the new rule and prevents accidental reliance on intra-domain edge persistence.
+   - Alternative considered: Keep same fixtures and loosen assertions; rejected because it obscures intended behavior.
+
+## Risks / Trade-offs
+
+- **[Risk] `netloc`-based equality treats `www.example.com` and `example.com` as different domains** → **Mitigation:** Preserve current canonical-domain contract for now; document behavior in spec scenarios and revisit in a separate normalization change if needed.
+- **[Risk] Relative-link resolution may expose malformed markdown href values** → **Mitigation:** Continue canonicalization and skip empty/unusable URLs; maintain deterministic helper-level filtering.
+- **[Risk] Fewer stored edges may reduce link-score differentiation on single-site corpora** → **Mitigation:** Intentional trade-off to avoid intra-site self-reinforcement; ranking still uses FTS/vector signals.
+
+## Migration Plan
+
+1. Update link-derivation helper signature and call sites to include source canonical URL.
+2. Implement same-domain skip with relative-link resolution.
+3. Update unit tests for link persistence/lifecycle and ranking expectations.
+4. Run test suite and verify no API/CLI contract changes.
+
+Rollback: revert helper/filter changes and corresponding test updates; schema remains unchanged so rollback is code-only.
+
+## Open Questions
+
+- None for this iteration.
diff --git a/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/proposal.md b/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/proposal.md
new file mode 100644
index 0000000..f1f28ec
--- /dev/null
+++ b/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/proposal.md
@@ -0,0 +1,26 @@
+## Why
+
+Link-based ranking currently counts links between documents on the same canonical domain. This lets intra-site link structures inflate authority and can drown out cross-site signals, reducing relevance quality.
+
+## What Changes
+
+- Update outbound link extraction to skip link creation when source and target resolve to the same canonical domain.
+- Keep existing self-link exclusion and duplicate `(from_document_id, to_document_id)` collapse behavior.
+- Resolve relative links against the source document canonical URL before domain comparison so internal relative links are also skipped.
+- Apply the same-domain skip rule even when the target document has not been ingested yet (compare using canonicalized target URL).
+- Update link-graph and ranking tests to use multi-domain fixtures and validate same-domain exclusion.
+
+## Capabilities
+
+### New Capabilities
+- `document-link-domain-filtering`: Filters outbound link graph edges so only cross-domain links are persisted for ranking.
+
+### Modified Capabilities
+- *(none)*
+
+## Impact
+
+- Affected code: `packages/search-core/src/grogbot_search/service.py` link extraction/insertion helpers.
+- Affected tests: `packages/search-core/tests/test_service.py` link-graph lifecycle tests and link-score ranking fixtures/assertions.
+- API/CLI shape: no contract changes expected; link persistence and derived `link_score` values change.
+- Dependencies/systems: no new dependencies; SQLite schema remains unchanged.
diff --git a/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/specs/document-link-domain-filtering/spec.md b/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/specs/document-link-domain-filtering/spec.md
new file mode 100644
index 0000000..3914461
--- /dev/null
+++ b/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/specs/document-link-domain-filtering/spec.md
@@ -0,0 +1,30 @@
+## ADDED Requirements
+
+### Requirement: Outbound link graph SHALL exclude same-canonical-domain targets
+When generating outbound document links, the system SHALL compare canonical domains for source and target URLs and MUST skip persistence when both domains are equal.
+
+#### Scenario: Absolute same-domain target is skipped
+- **WHEN** a chunked document at `https://example.com/a` contains an outbound link to `https://example.com/b`
+- **THEN** no `(from_document_id, to_document_id)` edge is stored for that link
+
+#### Scenario: Cross-domain target is persisted
+- **WHEN** a chunked document at `https://example.com/a` contains an outbound link to `https://other.example/b`
+- **THEN** one directed edge is stored for the source document and resolved target document id
+
+### Requirement: Relative outbound links MUST be resolved before domain filtering
+For outbound links extracted from markdown, the system MUST resolve relative href values against the source document canonical URL before canonicalization and domain comparison.
+
+#### Scenario: Relative internal path is treated as same-domain
+- **WHEN** a chunked document at `https://example.com/posts/1` contains `[x](/about)`
+- **THEN** the target resolves to `https://example.com/about` for domain comparison and no edge is stored
+
+#### Scenario: Relative traversal path is treated as same-domain
+- **WHEN** a chunked document at `https://example.com/posts/1` contains `[x](../archive)`
+- **THEN** the resolved target domain matches the source domain and no edge is stored
+
+### Requirement: Cross-domain unknown targets SHALL still derive target ids
+After applying same-domain filtering, the system MUST derive `to_document_id` from the canonicalized target URL even when no target document is currently ingested.
+
+#### Scenario: Unknown cross-domain URL stores derived target id
+- **WHEN** a chunked document links to `https://external.site/not-ingested` and no document exists for that URL
+- **THEN** the system stores the edge with `to_document_id = document_id_for_url(_canonicalize_url("https://external.site/not-ingested"))`
diff --git a/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/tasks.md b/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/tasks.md
new file mode 100644
index 0000000..fea9f5e
--- /dev/null
+++ b/openspec/changes/archive/2026-03-04-skip-same-canonical-domain-links/tasks.md
@@ -0,0 +1,23 @@
+## 1. Link derivation updates
+
+- [x] 1.1 Update outbound link target derivation helper(s) to accept the source document canonical URL.
+- [x] 1.2 Resolve markdown href values with `urljoin(source_canonical_url, href)` before canonicalization/domain comparison.
+- [x] 1.3 Skip target IDs whose normalized domain matches the source canonical domain while preserving self-link and dedupe behavior.
+- [x] 1.4 Keep unknown cross-domain target handling unchanged (`to_document_id = document_id_for_url(_canonicalize_url(resolved_url))`).
+
+## 2. Service integration
+
+- [x] 2.1 Update `chunk_document` link insertion flow to provide source canonical URL to link-derivation logic.
+- [x] 2.2 Verify no schema or API contract changes are introduced by the filtering update.
+
+## 3. Test updates
+
+- [x] 3.1 Update link-graph persistence tests to assert same-domain absolute links are skipped and cross-domain links persist.
+- [x] 3.2 Add/adjust tests for relative-link resolution (`/path`, `../path`) being treated as same-domain and skipped.
+- [x] 3.3 Update unknown-target tests to validate cross-domain unknown URLs still store derived `to_document_id`.
+- [x] 3.4 Refactor link-score ranking fixtures/assertions to multi-domain documents so expected inbound-link ordering remains deterministic.
+
+## 4. Verification
+
+- [x] 4.1 Run `packages/search-core` tests and confirm all link graph/ranking assertions pass with same-domain filtering enabled.
+- [x] 4.2 Manually sanity-check that link rows now represent cross-domain edges only for updated fixtures.
diff --git a/packages/search-core/src/grogbot_search/service.py b/packages/search-core/src/grogbot_search/service.py
index 6e0217c..db25bb3 100644
--- a/packages/search-core/src/grogbot_search/service.py
+++ b/packages/search-core/src/grogbot_search/service.py
@@ -34,23 +34,18 @@ class SearchScores:
 
 _BACKOFF_STATUS_CODES = {401, 403, 429, 503}
 _CAPTCHA_MARKERS = (
-    "captcha",
     "cf-chl",
+    "recaptcha",
     "attention required",
     "verify you are human",
 )
 
 _DEFAULT_HEADERS = {
-    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:147.0) Gecko/20100101 Firefox/147.0",
+    "User-Agent": "Mozilla/5.0 (compatible; Grogbot/1.0; +https://www.hauntedspice.com)",
     "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
     "Accept-Language": "en-US,en;q=0.9",
-    "Accept-Encoding": "gzip, deflate, br, zstd",
-    "Connection": "keep-alive",
+    "Accept-Encoding": "gzip, deflate",
     "Upgrade-Insecure-Requests": "1",
-    "Sec-Fetch-Dest": "document",
-    "Sec-Fetch-Mode": "navigate",
-    "Sec-Fetch-Site": "none",
-    "Priority": "u=0, i",
 }
 
 
@@ -151,10 +146,21 @@ def _extract_markdown_links(content_markdown: str) -> List[str]:
     return links
 
 
-def _to_document_ids_from_markdown(*, source_document_id: str, content_markdown: str) -> set[str]:
+def _to_document_ids_from_markdown(
+    *,
+    source_document_id: str,
+    source_canonical_url: str,
+    content_markdown: str,
+) -> set[str]:
     to_document_ids: set[str] = set()
+    source_domain = _normalize_domain(_canonicalize_url(source_canonical_url))
     for href in _extract_markdown_links(content_markdown):
-        to_document_id = document_id_for_url(_canonicalize_url(href))
+        resolved_url = _canonicalize_url(urljoin(source_canonical_url, href))
+        if not resolved_url:
+            continue
+        if _normalize_domain(resolved_url) == source_domain:
+            continue
+        to_document_id = document_id_for_url(_canonicalize_url(resolved_url))
         if to_document_id == source_document_id:
             continue
         to_document_ids.add(to_document_id)
@@ -467,7 +473,11 @@ def chunk_document(self, document_id: str) -> int:
         self.connection.execute("DELETE FROM chunks WHERE document_id = ?", (document_id,))
         self.connection.execute("DELETE FROM links WHERE from_document_id = ?", (document_id,))
         created = self._create_chunks(document_id, document.content_markdown)
-        self._insert_document_links(document_id=document_id, content_markdown=document.content_markdown)
+        self._insert_document_links(
+            document_id=document_id,
+            source_canonical_url=document.canonical_url,
+            content_markdown=document.content_markdown,
+        )
         self.connection.commit()
         return len(created)
 
@@ -491,9 +501,10 @@ def synchronize_document_chunks(self, maximum: Optional[int] = None) -> int:
             total_created += self.chunk_document(row["id"])
         return total_created
 
-    def _insert_document_links(self, *, document_id: str, content_markdown: str) -> None:
+    def _insert_document_links(self, *, document_id: str, source_canonical_url: str, content_markdown: str) -> None:
         to_document_ids = _to_document_ids_from_markdown(
             source_document_id=document_id,
+            source_canonical_url=source_canonical_url,
             content_markdown=content_markdown,
         )
         for to_document_id in sorted(to_document_ids):
@@ -595,76 +606,83 @@ def _next_wordpress_url(base_url: str) -> Optional[str]:
             seen_feed_urls.add(normalized_url)
             pages_processed += 1
 
+            start_time = time.monotonic() if paginate else None
             try:
-                feed = feedparser.parse(current_url)
-            except Exception:
-                if pages_processed == 1:
-                    raise
-                break
-
-            if pages_processed > 1:
-                status = getattr(feed, "status", None)
-                if status is not None and status >= 400:
-                    break
-                if getattr(feed, "bozo", 0) and not feed.entries:
+                try:
+                    feed = feedparser.parse(current_url)
+                except Exception:
+                    if pages_processed == 1:
+                        raise
                     break
 
-            page_feed_name = feed.feed.get("title")
-            if page_feed_name:
-                feed_name = feed_name or page_feed_name
-
-            for entry in feed.entries:
-                entry_url = entry.get("link") or entry.get("id")
-                if not entry_url:
-                    continue
-                canonical_url = _canonicalize_url(entry_url)
-                canonical_domain = _normalize_domain(canonical_url)
-                source = self._get_source_by_domain(canonical_domain)
-                if not source:
-                    source = self.upsert_source(
-                        canonical_domain=canonical_domain,
-                        name=feed_name,
-                        rss_feed=feed_url,
-                    )
-                else:
-                    updated_name = source.name or feed_name
-                    updated_rss_feed = source.rss_feed or feed_url
-                    if updated_name != source.name or updated_rss_feed != source.rss_feed:
+                if pages_processed > 1:
+                    status = getattr(feed, "status", None)
+                    if status is not None and status >= 400:
+                        break
+                    if getattr(feed, "bozo", 0) and not feed.entries:
+                        break
+
+                page_feed_name = feed.feed.get("title")
+                if page_feed_name:
+                    feed_name = feed_name or page_feed_name
+
+                for entry in feed.entries:
+                    entry_url = entry.get("link") or entry.get("id")
+                    if not entry_url:
+                        continue
+                    canonical_url = _canonicalize_url(entry_url)
+                    canonical_domain = _normalize_domain(canonical_url)
+                    source = self._get_source_by_domain(canonical_domain)
+                    if not source:
                         source = self.upsert_source(
                             canonical_domain=canonical_domain,
-                            name=updated_name,
-                            rss_feed=updated_rss_feed,
+                            name=feed_name,
+                            rss_feed=feed_url,
+                        )
+                    else:
+                        updated_name = source.name or feed_name
+                        updated_rss_feed = source.rss_feed or feed_url
+                        if updated_name != source.name or updated_rss_feed != source.rss_feed:
+                            source = self.upsert_source(
+                                canonical_domain=canonical_domain,
+                                name=updated_name,
+                                rss_feed=updated_rss_feed,
+                            )
+                    content = None
+                    if entry.get("content"):
+                        content = entry.content[0].value
+                    content = content or entry.get("summary") or ""
+                    content_markdown = html_to_markdown(content)
+                    if not content_markdown or not content_markdown.strip():
+                        continue
+                    title = entry.get("title")
+                    published_at = _parse_datetime(entry.get("published") or entry.get("updated"))
+                    documents.append(
+                        self.upsert_document(
+                            source_id=source.id,
+                            canonical_url=canonical_url,
+                            title=title,
+                            published_at=published_at,
+                            content_markdown=content_markdown,
                         )
-                content = None
-                if entry.get("content"):
-                    content = entry.content[0].value
-                content = content or entry.get("summary") or ""
-                content_markdown = html_to_markdown(content)
-                if not content_markdown or not content_markdown.strip():
-                    continue
-                title = entry.get("title")
-                published_at = _parse_datetime(entry.get("published") or entry.get("updated"))
-                documents.append(
-                    self.upsert_document(
-                        source_id=source.id,
-                        canonical_url=canonical_url,
-                        title=title,
-                        published_at=published_at,
-                        content_markdown=content_markdown,
                     )
-                )
 
-            if not paginate:
-                break
-            if pages_processed >= 100:
-                break
+                if not paginate:
+                    break
+                if pages_processed >= 100:
+                    break
 
-            next_url = _next_feed_url(feed, current_url)
-            if not next_url and _is_wordpress_feed(feed):
-                next_url = _next_wordpress_url(current_url)
-            if not next_url:
-                break
-            current_url = next_url
+                next_url = _next_feed_url(feed, current_url)
+                if not next_url and _is_wordpress_feed(feed):
+                    next_url = _next_wordpress_url(current_url)
+                if not next_url:
+                    break
+                current_url = next_url
+            finally:
+                if start_time is not None:
+                    elapsed = time.monotonic() - start_time
+                    if elapsed < 1.0:
+                        time.sleep(1.0 - elapsed)
 
         return documents
 
diff --git a/packages/search-core/tests/conftest.py b/packages/search-core/tests/conftest.py
index 6aff749..415ee87 100644
--- a/packages/search-core/tests/conftest.py
+++ b/packages/search-core/tests/conftest.py
@@ -391,7 +391,7 @@ def log_message(self, format, *args):  # noqa: A003 - match base signature
         "body": """
         <html>
           <head><title>Attention Required</title></head>
-          <body>Please verify you are human (captcha challenge)</body>
+          <body>Please verify you are human (reCAPTCHA challenge)</body>
         </html>
         """,
     }
diff --git a/packages/search-core/tests/test_service.py b/packages/search-core/tests/test_service.py
index 5761428..71fb9bd 100644
--- a/packages/search-core/tests/test_service.py
+++ b/packages/search-core/tests/test_service.py
@@ -238,7 +238,7 @@ def test_synchronize_document_chunks_non_positive_maximum_is_noop(service: Searc
 
 # Link graph behavior
 
-def test_chunk_document_stores_unique_outbound_links_per_target(service: SearchService):
+def test_chunk_document_skips_same_domain_links_and_dedupes_cross_domain_targets(service: SearchService):
     source = service.upsert_source("example.com", name="Example")
     document = service.upsert_document(
         source_id=source.id,
@@ -246,9 +246,10 @@ def test_chunk_document_stores_unique_outbound_links_per_target(service: SearchS
         title="Source",
         published_at=None,
         content_markdown=(
-            "[one](https://example.com/target) "
-            "[two](https://example.com/target) "
-            "[three](https://example.com/other-target)"
+            "[same](https://example.com/target) "
+            "[cross-one](https://other.example/target) "
+            "[cross-one-duplicate](https://other.example/target) "
+            "[cross-two](https://third.example/other-target)"
         ),
     )
 
@@ -267,15 +268,41 @@ def test_chunk_document_stores_unique_outbound_links_per_target(service: SearchS
     assert len(links) == 2
     assert [row["to_document_id"] for row in links] == sorted(
         [
-            service_module.document_id_for_url(service_module._canonicalize_url("https://example.com/target")),
-            service_module.document_id_for_url(service_module._canonicalize_url("https://example.com/other-target")),
+            service_module.document_id_for_url(service_module._canonicalize_url("https://other.example/target")),
+            service_module.document_id_for_url(service_module._canonicalize_url("https://third.example/other-target")),
         ]
     )
 
 
+def test_chunk_document_resolves_relative_links_before_domain_filtering(service: SearchService):
+    source = service.upsert_source("example.com", name="Example")
+    document = service.upsert_document(
+        source_id=source.id,
+        canonical_url="https://example.com/posts/entry",
+        title="Entry",
+        published_at=None,
+        content_markdown=(
+            "[root](/about) "
+            "[parent](../archive) "
+            "[cross](https://external.example/outbound)"
+        ),
+    )
+
+    service.chunk_document(document.id)
+
+    links = service.connection.execute(
+        "SELECT to_document_id FROM links WHERE from_document_id = ? ORDER BY to_document_id",
+        (document.id,),
+    ).fetchall()
+
+    assert [row["to_document_id"] for row in links] == [
+        service_module.document_id_for_url(service_module._canonicalize_url("https://external.example/outbound"))
+    ]
+
+
 def test_chunk_document_stores_unknown_targets_by_canonicalized_url(service: SearchService):
     source = service.upsert_source("example.com", name="Example")
-    target_url = "https://example.com/not-ingested"
+    target_url = "https://external.site/not-ingested"
     document = service.upsert_document(
         source_id=source.id,
         canonical_url="https://example.com/source",
@@ -305,7 +332,7 @@ def test_outbound_links_ignore_self_and_follow_content_delete_and_refresh_lifecy
         canonical_url=canonical_url,
         title="Lifecycle",
         published_at=None,
-        content_markdown=f"[self]({canonical_url}) [other](https://example.com/other)",
+        content_markdown=f"[self]({canonical_url}) [other](https://other.example/other)",
     )
 
     service.chunk_document(document.id)
@@ -315,7 +342,7 @@ def test_outbound_links_ignore_self_and_follow_content_delete_and_refresh_lifecy
         (document.id,),
     ).fetchall()
     assert [row["to_document_id"] for row in links] == [
-        service_module.document_id_for_url(service_module._canonicalize_url("https://example.com/other"))
+        service_module.document_id_for_url(service_module._canonicalize_url("https://other.example/other"))
     ]
 
     updated = service.upsert_document(
@@ -334,7 +361,7 @@ def test_outbound_links_ignore_self_and_follow_content_delete_and_refresh_lifecy
 
     service.connection.execute(
         "UPDATE documents SET content_markdown = ? WHERE id = ?",
-        ("[refreshed](https://example.com/refreshed)", updated.id),
+        ("[refreshed](https://external.example/refreshed)", updated.id),
     )
     service.connection.commit()
     service.chunk_document(updated.id)
@@ -344,7 +371,7 @@ def test_outbound_links_ignore_self_and_follow_content_delete_and_refresh_lifecy
         (updated.id,),
     ).fetchall()
     assert [row["to_document_id"] for row in refreshed_links] == [
-        service_module.document_id_for_url(service_module._canonicalize_url("https://example.com/refreshed"))
+        service_module.document_id_for_url(service_module._canonicalize_url("https://external.example/refreshed"))
     ]
 
     assert service.delete_document(updated.id) is True
@@ -456,32 +483,35 @@ def test_search_returns_empty_for_blank_query_or_non_positive_limit(service: Sea
 
 
 def test_search_includes_link_score_with_deterministic_ties_and_zero_fill(service: SearchService):
-    source = service.upsert_source("example.com", name="Example")
+    source_a = service.upsert_source("alpha.example", name="Alpha")
+    source_b = service.upsert_source("beta.example", name="Beta")
+    source_c = service.upsert_source("gamma.example", name="Gamma")
+    source_d = service.upsert_source("delta.example", name="Delta")
 
     doc_a = service.upsert_document(
-        source_id=source.id,
-        canonical_url="https://example.com/a",
+        source_id=source_a.id,
+        canonical_url="https://alpha.example/a",
         title="A",
         published_at=None,
         content_markdown="alpha",
     )
     doc_b = service.upsert_document(
-        source_id=source.id,
-        canonical_url="https://example.com/b",
+        source_id=source_b.id,
+        canonical_url="https://beta.example/b",
         title="B",
         published_at=None,
         content_markdown=f"alpha [a]({doc_a.canonical_url})",
     )
     doc_c = service.upsert_document(
-        source_id=source.id,
-        canonical_url="https://example.com/c",
+        source_id=source_c.id,
+        canonical_url="https://gamma.example/c",
         title="C",
         published_at=None,
         content_markdown=f"alpha [a]({doc_a.canonical_url})",
     )
     doc_d = service.upsert_document(
-        source_id=source.id,
-        canonical_url="https://example.com/d",
+        source_id=source_d.id,
+        canonical_url="https://delta.example/d",
         title="D",
         published_at=None,
         content_markdown=f"alpha [b]({doc_b.canonical_url}) [c]({doc_c.canonical_url})",
@@ -618,7 +648,7 @@ def test_create_document_from_url_raises_on_backoff_signals(service: SearchServi
         ("backoff-429", "status_code=429"),
         ("backoff-503", "status_code=503"),
         ("backoff-retry-after", "retry-after-header"),
-        ("backoff-captcha", "body-marker=captcha"),
+        ("backoff-captcha", "body-marker=recaptcha"),
     ],
 )
 def test_create_document_from_url_backoff_error_includes_reason(
@@ -684,6 +714,23 @@ def test_create_documents_from_feed_pagination_enabled(service: SearchService, h
     assert f"{http_server}/feed-paginated-entry-2" in urls
 
 
+def test_create_documents_from_feed_pagination_applies_minimum_one_second_interval(
+    service: SearchService,
+    http_server,
+    monkeypatch,
+):
+    monotonic_values = iter([0.0, 0.2, 1.0, 1.8])
+    sleep_calls: list[float] = []
+
+    monkeypatch.setattr(service_module.time, "monotonic", lambda: next(monotonic_values))
+    monkeypatch.setattr(service_module.time, "sleep", lambda seconds: sleep_calls.append(seconds))
+
+    documents = service.create_documents_from_feed(f"{http_server}/feed-paginated", paginate=True)
+
+    assert len(documents) == 2
+    assert sleep_calls == pytest.approx([0.8, 0.2])
+
+
 def test_create_documents_from_feed_wordpress_pagination(service: SearchService, http_server):
     documents = service.create_documents_from_feed(f"{http_server}/wp-feed", paginate=True)