This repository contains a BiLSTM Recurrent Neural Network model which classifies gender using names.
virtualenv <env-name>source <env-name>/bin/activatedeactivateto deactivate.
Creating a virtual environment helps us to make isolated python environments. You can read more about it here.
pip install -r requirements.txt
This installs all the required packages.
Make sure you install all the required packages before running anything. If the requirements.txt doesn't contain any package then you can manually do pip install <package-name>.
python <file-name>.py
To run the API server.
python runserver.py [port-number]
ex: python runserver.py 8000
After running the server you can use the following URI:
http://localhost:5000/api/v1.0/classify?name=john
The API should run on local machines too:
use: http://machine-ip:5000/api/v1.0/classify?name=john
- Initially we used this to build the model. Although it gave me an accuracy of 84% to 85% accuracy, but it only worked on indian names.
- In this we have used features like n-gram, sonorants, ratio of syllables and boolean (last letter vowel).
- We have also considered using features like frequency or date of birth but it didn't seem like a good path.
- Even though we increased the dataset we couldn't increase the accuracy of Machine Laerning models. So we decided to use RNN.
- We used RNN because it is good for sequential data. Note that this model will only work for first names and the API cleans the data for first names before predicting.
- We have used hyperas for hyperparameter tuning.
- All the data collected, scripts used and the py files are included in this repository.
- There was a tensorflow session problem which was solved using this.
- I have used flask to run the api. You can read all it here.
- The cleaning done for first names in the API first removes all the unicode symbols, then converts the string to lower case letters, removes surnames if any and if there are surnames not in the repository then it leaves it as it is. Finally it takes the first word if there are more than two words in the finally cleaned name.
- If you want to add more surnames, you can add it to
./data/indian_surnames.txt. - If the server is not accessible you can use the following to fix it. We got this fix here.
sudo ufw enablesudo ufw allow 5000/tcp //allow the server to handle the request on port 5000