Skip to content

Latest commit

 

History

History
43 lines (38 loc) · 1.35 KB

README.md

File metadata and controls

43 lines (38 loc) · 1.35 KB

RNNsearch

An implementation of RNNsearch using Tensorflow, the model is the same with GroundHog, a Theano version is also available

Note

This repository is deprecated. Please switch to new repository.

Usage

Data Preprocessing

Preprocessing scripts can be found at [here] (https://github.com/XMUNLP/RNNsearch)

  1. Build vocabulary
  • Build source vocabulary
python scripts/buildvocab.py --corpus zh.txt --output vocab.zh.pkl
                             --limit 30000 --groundhog
  • Build target vocabulary
python scripts/buildvocab.py --corpus en.txt --output vocab.en.pkl
                             --limit 30000 --groundhog
  1. Shuffle corpus (Optional)
python scripts/shuffle.py --corpus zh.txt en.txt

Training

  python rnnsearch.py train --model nmt --corpus zh.txt.shuf en.txt.shuf
      --vocab zh.vocab.pkl en.vocab.pkl --embdim 620 --hidden 1000
      --attention 1000 --alpha 5e-4 --norm 1.0 --batch 128 --maxepoch 5
      --seed 1234 --freq 1000 --vfreq 1500 --dfreq 50 --sort 20
      --references nist02.ref0 nist02.ref1 nist02.ref2 nist02.ref3
      --validation nist02.src

Decoding

  python rnnsearch.py translate --model nmt.best.pkl < input > translation