BlogCollector is a naive AI/Tech blog aggregator that supports RSS feeds + web scraping, perfect for personal knowledge tracking and information monitoring.
- ✅ Multiple sources: RSS / Atom feeds & any webpage via custom CSS selectors
- ✅ Category & filter: Organization / Individual, one-click filtering + search
$ git clone https://github.com/<yourname>/BlogCollector.git
$ cd BlogCollector$ cd backend
$ npm install
$ npm start # default PORT=3000, can be overridden$ cd docs # static assets now live in docs/
$ npx serve . # or python3 -m http.server 8080Then visit http://localhost:8080.
💡 VS Code users can also install Live Server and choose Open with Live Server.
BlogCollector/
├─ backend/ # Node.js / Express backend
│ ├─ server.js
│ └─ ...
├─ docs/ # static frontend (published via GitHub Pages)
│ ├─ index.html
│ ├─ script.js
│ └─ style.css
└─ README.md
| Part | Platform | Steps |
|---|---|---|
| Backend | Render | Connect the repo → New Web Service → root dir backend → Build npm install / Start npm start → get https://<app>.onrender.com |
| Frontend | GitHub Pages | Settings → Pages → Source main / Folder /docs → Save → access https://<user>.github.io/<repo>/ |
Update docs/script.js:
const API_BASE_URL = 'https://<app>.onrender.com/api';Now anyone can open the GitHub Pages URL and the site will call your Render API.
Append entries to the rssSources array in backend/server.js:
const rssSources = [
{ name: 'OpenAI', url: 'https://openai.com/blog/rss.xml', category: 'organization' },
// new source
{ name: 'Example Blog', url: 'https://example.com/rss.xml', category: 'individual' },
];Append entries to the scrapingTargets array in server.js:
const scrapingTargets = [
{
name: 'Lilian Weng',
url: 'https://lilianweng.github.io/',
category: 'individual',
selectors: {
articleContainer: 'article.post-entry',
title: '.entry-header h2',
link: 'a.entry-link',
description: 'section.entry-content p',
time: 'footer.entry-footer',
},
},
// new source example
{
name: 'Karpathy',
url: 'https://karpathy.bearblog.dev/blog/',
category: 'individual',
selectors: {
articleContainer: 'ul.blog-posts li',
title: 'a',
link: 'a',
description: '', // this site has no summary
time: 'time',
},
},
];After editing sources, restart the backend:
$ cd backend && npm restart
| Issue | Solution |
|---|---|
| Port in use | Change the PORT env var or free port 3000 |
| CORS error | CORS is enabled globally; update the whitelist if you set a CDN |
| Scrape fail | Check anti-bot measures & verify your CSS selectors |
- Pull requests, issues and stars are welcome! 🌟
- Released under the MIT License — free for personal & commercial use.