Feature: Research & Learning AI Agent with Educational Source Integrations

### 🧠 Feature: Research & Learning AI Agent with Educational Source Integrations

**Summary**
Build a **Research & Learning AI Agent** that retrieves, ranks, and synthesizes answers from **academic APIs** (Semantic Scholar, arXiv, CrossRef, PubMed, OpenAlex) **and** from **trusted educational sources** (Khan Academy, Wikipedia/Wikimedia, OpenStax, MIT OCW, PhET, NASA, etc.). The agent will return **citation-grounded** answers with links/DOIs and optional “learning mode” explanations, quizzes, and follow-ups.

---

**Motivation**
Users need **authoritative yet accessible** explanations. Academic papers provide rigor; educational sources provide pedagogy. Combining both yields reliable answers that are also teachable, making PDR AI v2 more valuable for **onboarding, training, and education** in enterprises and classrooms.

---

**Scope of Sources (Phase 1 → Phase 2)**

**Academic (Phase 1):**

* Semantic Scholar, arXiv, CrossRef, PubMed, OpenAlex

**Educational (Phase 1):**

* **Wikipedia REST API** (summary + page HTML), **Wikimedia Commons** (media)
* **OpenStax** (open textbooks; OER, API/endpoints where available)
* **NASA** (factsheets, articles, imagery: open data)
* **PhET** Interactive Simulations (concept explanations, educator pages where allowed)

**Educational (Phase 2 / backlog):**

* **Khan Academy** content: use official public endpoints where permitted (e.g., topic trees, exercise metadata) and **respect TOS**. For videos, rely on **YouTube Data API** metadata + captions for **official Khan Academy channel**.
* **MIT OpenCourseWare** (open license; no official API—HTML fetch + cache with license compliance)
* **CK-12**, **OpenLearn**, **Saylor**, **OpenIntro** (OER—evaluate per-site API/TOS)
* **Stanford Encyclopedia of Philosophy** (open access; no API—HTML fetch with attribution)

> Note: Only integrate sources with **explicitly allowed API use or OER licenses**. Add per-source adapters with license notes and rate-limit guards.

---

**Core Capabilities**

1. **Multi-source Retrieval & Ranking**

   * Generate queries → call source adapters → normalize results → score by **authority**, **recency**, **pedagogical clarity**, and **topical match**.
2. **Grounded Answers with Citations**

   * Inline numbered citations `[1]` linking to DOI/URL; add a **References** section with title, year, authors, and license/attribution when required (e.g., Wikipedia/CC BY-SA).
3. **Learning Mode**

   * Simplified explanation, key takeaways, quick quiz (2–3 questions), and suggested next steps/readings from OER sources.
4. **Research Mode**

   * Concise synthesis with limitations/uncertainties and direct links to papers/sections.
5. **Caching & Dedup**

   * Cache normalized records (Redis/Supabase) by query hash + source; deduplicate by DOI/URL.

---

**Proposed Implementation**

* **Agent:** `research_learning_agent` within LangGraph (planner → retriever → ranker → synthesizer → (optional) quizzer).
* **Adapters (`/server/research/adapters/*`):**

  * `semantic_scholar.ts`, `arxiv.ts`, `crossref.ts`, `pubmed.ts`, `openalex.ts`
  * `wikipedia.ts` (REST + page summary), `wikimedia.ts` (media), `openstax.ts`, `nasa.ts`, `phet.ts`
  * (Phase 2) `khan_academy.ts`, `mit_ocw.ts`, `ck12.ts`, etc.
* **Normalizer:** Common schema `{ title, authors, year, url, doi?, snippet, license?, source, weight }`.
* **Ranker:** Heuristic + embedding re-rank (optionally via Qdrant Cloud if enabled).
* **Synthesizer:** Model composes answer; enforces citation injection and **license-aware attribution**.
* **UI/API Toggle:**

  ```json
  { "agent": "research_learning", "mode": "learning", "max_sources": 8 }
  ```

---

**API & Config (examples)**

```bash
RESEARCH_AGENT_ENABLED=true
RESEARCH_AGENT_MODE_DEFAULT=learning

# Academic
SEMANTIC_SCHOLAR_API_KEY=...
OPENALEX_API_KEY=...
ARXIV_BASE_URL=https://export.arxiv.org/api/query

# Educational
WIKIPEDIA_BASE_URL=https://en.wikipedia.org/api/rest_v1
OPENSTAX_BASE_URL=https://openstax.org/api
NASA_API_KEY=...

# Optional: YouTube for official edu channels (e.g., Khan Academy)
YOUTUBE_API_KEY=...

# Vector re-ranking / grounding
VECTOR_DB=QDRANT_CLOUD
QDRANT_URL=https://<cluster>.cloud.qdrant.io
QDRANT_API_KEY=...
```

---

**Licensing & Compliance**

* **Wikipedia/Wikimedia:** Provide attribution; note **CC BY-SA 3.0** / **GFDL** obligations in References.
* **OpenStax:** Attribute per OER license (usually **CC BY**).
* **Khan Academy / MIT OCW / others:** Only fetch content allowed by TOS/API; attribute per license; avoid scraping protected endpoints.
* Create `/docs/data-sources.md` listing each source, license, and usage limits.

---

**Rate Limits & Reliability**

* Per-adapter **retry with backoff**; respect `Retry-After`.
* Fallback sequence (if a source fails): Academic → Educational → cached.
* Circuit breaker to prevent UI latency spikes.

---

**Acceptance Criteria**

* [ ] End-to-end: question → multi-source retrieval → ranked synthesis → grounded citations.
* [ ] **Two modes** supported: `research` and `learning` (pedagogical tone + quiz).
* [ ] At least **5 sources integrated** (≥3 academic, ≥2 educational) in Phase 1.
* [ ] Citations include DOI/URL and, where applicable, **license/attribution**.
* [ ] Caching implemented; duplicate suppression by DOI/URL.
* [ ] Configuration & docs added: `/docs/agents/research_learning_agent.md` and `/docs/data-sources.md`.
* [ ] Unit tests for adapters; integration tests for ranking + synthesis pipeline.

---

**Nice-to-Have (Backlog)**

* Per-source **trust scores** surfaced in UI (hover for source details).
* **Section-level grounding** (quote + link to exact section/anchor).
* **Learning paths** built from OpenStax chapters / Khan topic trees.
* Instructor dashboard with anonymized analytics.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Research & Learning AI Agent with Educational Source Integrations #102

🧠 Feature: Research & Learning AI Agent with Educational Source Integrations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Research & Learning AI Agent with Educational Source Integrations #102

Description

🧠 Feature: Research & Learning AI Agent with Educational Source Integrations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions