やがて夜は明ける.
graph TD
A["Images of Books (ex. 国立国会図書館デジタルコレクション)"] -->|OCR| B
B["Text File (MD? XML?; UTF-8? Shift-JIS?)"] -->|"Parser? (Should I do this?)"| C
B --> |"Iterate + Modify by Human (Editor in Browser or Git; cf. Wiki, Qiita, Zenn)"| B
C["Aozora Bunko File Format?"] -->| | D
D["Publish to Aozora Bunko?"]
- Use Existing Aozora Bunko Files as Training Data
- We can find original texts since Aozora Bunko shows the original version of the texts ("底本").
- Supervised learning with these data
- Text Recognition
- OCR with Python
- Aim to generate texts accurately and quickly also in Japanese vertical texts
- Viewer/Editor
- Simple and Fast Viewer and Editor working on Browser
- Anyone can modify the generated texts either in the Built-in Editor or GitHub (Can we compare the original pictures and the generated texts?)
- Can this editor be built with Python as well?
- Text Matching Game
- Matching Game for Japanese Texts
- Aim to improve the accuracy of OCR (also for fun, of course!)
- This game can be a learning material for Japanese learners (like the original concept of Duolingo)
- cf. Google Captcha
- aozorahack
- Web Page
- ideathon: There are many ideas similar to this project!
- kosakuin: Aozora Bunko Editor (MIT License)
- aozora-cli: Aozora Bunko CLI (MIT License)
- aozora-parser.js
- aozoraflow
- kyukyunyorituryo/AozoraEditor: 青空文庫エディタ
- kyukyunyorituryo/html2aozora
- gearsns/AozoraJavaScriptParser
Look at the Nuxt documentation to learn more.
Make sure to install dependencies:
# npm
npm install
# pnpm
pnpm install
# yarn
yarn install
# bun
bun install
Start the development server on http://localhost:3000
:
# npm
npm run dev
# pnpm
pnpm dev
# yarn
yarn dev
# bun
bun run dev
Build the application for production:
# npm
npm run build
# pnpm
pnpm build
# yarn
yarn build
# bun
bun run build
Locally preview production build:
# npm
npm run preview
# pnpm
pnpm preview
# yarn
yarn preview
# bun
bun run preview
Check out the deployment documentation for more information.