WikiSeek is a Go-based web application that serves as a local Wikipedia browser and search engine. It allows you to host and browse Wikipedia content from a compressed Wikipedia XML dump file, providing fast search capabilities and article rendering.
![Screenshot 2025-01-31 at 4 20 58 PM](https://private-user-images.githubusercontent.com/1565303/408760025-30d07043-38df-4bd6-832f-46fc49e38104.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NjUwMzQsIm5iZiI6MTczOTY2NDczNCwicGF0aCI6Ii8xNTY1MzAzLzQwODc2MDAyNS0zMGQwNzA0My0zOGRmLTRiZDYtODMyZi00NmZjNDllMzgxMDQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTZUMDAxMjE0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MzU2YTgyNDNkNjc4MWFmMTkzYzUxMjM5MjY5NGU4YmExNTYwMTc3ZWFhYmQyZDZkYmQ0M2UyN2VjZDQxZWZlOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.yRxGTjHPs0ELRwXJBTIOXZvR8TNu4KCNUoe_jB8jKIo)
- Browse Wikipedia articles with rendered HTML output
- Fast full-text search through article titles
- Random article suggestions
- Clean, responsive web interface
- Efficient handling of large compressed Wikipedia dumps
- Markdown-to-HTML conversion of Wikipedia markup
- Create a
dumps
directory in your project root:
mkdir dumps
-
Place your Wikipedia dump files in the
dumps
directory:- The main Wikipedia XML dump (e.g.,
enwiki-20241201-pages-articles-multistream.xml.bz2
) - The index file (e.g.,
enwiki-20241201-pages-articles-multistream-index.txt.bz2
)
- The main Wikipedia XML dump (e.g.,
-
Run using docker cli
docker run -p 8080:8080 -v ./dumps:/dumps xanderstrike/wikiseek -file /dumps/enwiki-20241201-pages-articles-multistream.xml.bz2 -index /dumps/enwiki-20241201-pages-articles-multistream-index.txt.bz2
Or compose:
version: '3'
services:
wikiseek:
image: xanderstrike/wikiseek
ports:
- "8080:8080"
volumes:
- ./dumps:/dumps
command: -file /dumps/enwiki-20241201-pages-articles-multistream.xml.bz2 -index /dumps/enwiki-20241201-pages-articles-multistream-index.txt.bz2
Install Pandoc with apt or brew or what have you.
Run the server directly with Go:
go run main.go -file path/to/wiki.xml.bz2 -index path/to/index.bz2 -port 8080
Then visit http://localhost:8080 in your browser.
-file
: Path to the Wikipedia XML dump file (bzip2 compressed)-index
: Path to the index file (bzip2 compressed)-port
: Port to run the server on (default: 8080)
- Articles are rendered with full HTML formatting
- Internal links are preserved and clickable
- Clean typography and layout
- Fast title-based search
- Search results show article titles with direct links
- Case-insensitive matching
- Shows 10 random articles for discovery
- Search box for quick access
- Clean, minimal interface
WikiSeek uses:
- Go's built-in HTTP server
- bzip2 compression handling
- XML parsing for Wikipedia dump format
- Pandoc for markup conversion
- HTML templating
- Static file serving
This project is open source and available under the MIT License.