Feature Description
Add functionality to download and convert web pages to markdown, similar to browser extensions like Markdown Web Clipper. This would expand markmv's capabilities beyond file operations to include content acquisition, making it a more complete markdown workflow tool.
Proposed Solution
Add a new command clip or fetch that downloads web pages and converts them to clean markdown:
Example Usage
# Basic usage - download and save with auto-generated filename
npx markmv clip https://example.com/article
# Specify output filename
npx markmv clip https://example.com/article -o article.md
# Download multiple URLs
npx markmv clip urls.txt --batch
# Download with specific options
npx markmv clip https://example.com/article \
--format clean \
--images download \
--output-dir docs/articles/
# Extract only article content (using readability)
npx markmv clip https://example.com/article --article-only
# Include metadata in frontmatter
npx markmv clip https://example.com/article --metadata
Core Features
-
Clean markdown extraction
- Remove unnecessary HTML elements
- Preserve article structure
- Convert common HTML patterns to markdown
-
Image handling
- Download images locally
- Update image paths in markdown
- Optional: skip image download for text-only
-
Metadata preservation
- Title, author, date
- Optional frontmatter generation
- Source URL tracking
-
Content extraction modes
- Full page conversion
- Article extraction (Readability-style)
- Custom selectors for specific content
Implementation Suggestions
Dependencies to Consider
@mozilla/readability - For article extraction
turndown - HTML to markdown conversion
node-fetch or axios - HTTP requests
cheerio - HTML parsing if needed
Configuration Options
# .markmvrc or markmv.config.js
clip:
output_dir: "./clipped"
image_dir: "./clipped/images"
frontmatter: true
format: "clean" # clean, raw, article
timeout: 30000
user_agent: "Mozilla/5.0..."
ignore_patterns:
- "*.pdf"
- "mailto:*"
Benefits
- Complete workflow - From content discovery to organization
- Consistency - Same tool for acquiring and managing markdown
- Integration - Clipped content automatically benefits from markmv's link management
- Automation - Script documentation gathering from multiple sources
Additional Features to Consider
- Authentication support - For paywalled content (cookies, headers)
- Rate limiting - Respectful crawling with delays
- CSS selector support - Extract specific page sections
- Template system - Custom markdown output formats
- Link preservation - Convert relative to absolute URLs
- Code block detection - Properly format code snippets
- Table support - Convert HTML tables to markdown tables
Use Cases
- Documentation aggregation - Collect API docs, guides, tutorials
- Research compilation - Save articles for offline reading
- Knowledge base building - Archive important web content
- Blog migration - Convert HTML posts to markdown
- Tutorial collection - Save programming tutorials locally
Integration with Existing Features
After clipping content, users could:
- Use
validate to check all links in clipped content
- Use
move to organize clipped files
- Use
index to generate navigation for clipped content
- Use
convert to standardize link formats
This feature would position markmv as a comprehensive markdown toolkit, handling the full lifecycle from content acquisition to maintenance.
Feature Description
Add functionality to download and convert web pages to markdown, similar to browser extensions like Markdown Web Clipper. This would expand markmv's capabilities beyond file operations to include content acquisition, making it a more complete markdown workflow tool.
Proposed Solution
Add a new command
cliporfetchthat downloads web pages and converts them to clean markdown:Example Usage
Core Features
Clean markdown extraction
Image handling
Metadata preservation
Content extraction modes
Implementation Suggestions
Dependencies to Consider
@mozilla/readability- For article extractionturndown- HTML to markdown conversionnode-fetchoraxios- HTTP requestscheerio- HTML parsing if neededConfiguration Options
Benefits
Additional Features to Consider
Use Cases
Integration with Existing Features
After clipping content, users could:
validateto check all links in clipped contentmoveto organize clipped filesindexto generate navigation for clipped contentconvertto standardize link formatsThis feature would position markmv as a comprehensive markdown toolkit, handling the full lifecycle from content acquisition to maintenance.