From cf676ebf6a91e95ad904a26a5e097a4ac42d55b6 Mon Sep 17 00:00:00 2001 From: "Michael R. Bernstein" Date: Sat, 15 Jul 2017 10:32:33 -0600 Subject: [PATCH] Made step 1 in the workflow more explicit. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c2ae03a..c689d30 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,7 @@ This project has two purposes. First of all, I'd like to share some of my experi * Go get various English word vectors [here](https://github.com/3Top/word2vec-api) if needed. ## Work Flow -* STEP 1. Download the [wikipedia database backup dumps](https://dumps.wikimedia.org/backup-index.html) of the language you want. +* STEP 1. Download the [wikipedia database backup dumps](https://dumps.wikimedia.org/backup-index.html) of the language you want (for example, for english wiki go to `https://dumps.wikimedia.org/enwiki/` click the latest timestamp, and download the `enwiki-YYYYMMDD-pages-articles-multistream.xml.bz2` file). * STEP 2. Extract running texts to `data/` folder. * STEP 3. Run `build_corpus.py`. * STEP 4-1. Run `make_wordvector.sh` to get Word2Vec word vectors.