Skip to content

amoghng7/name-to-gender

Repository files navigation

Name to Gender Classifier

This repository contains a BiLSTM Recurrent Neural Network model which classifies gender using names.

Setup

  1. Local Environment Setup
  2. Running The Application
  3. Running The Server
  4. Using The API
  5. Logs
  6. Notes

Local environment setup

Create a virtual environment

  1. virtualenv <env-name>
  2. source <env-name>/bin/activate
  3. deactivate to deactivate.

Creating a virtual environment helps us to make isolated python environments. You can read more about it here.

Install requirements

  1. pip install -r requirements.txt

This installs all the required packages.

Running the application

Make sure you install all the required packages before running anything. If the requirements.txt doesn't contain any package then you can manually do pip install <package-name>.

  1. python <file-name>.py

Running the server

To run the API server.

  1. python runserver.py [port-number]

    ex: python runserver.py 8000

Using the API

After running the server you can use the following URI:

http://localhost:5000/api/v1.0/classify?name=john

The API should run on local machines too: use: http://machine-ip:5000/api/v1.0/classify?name=john

Logs

  • Initially we used this to build the model. Although it gave me an accuracy of 84% to 85% accuracy, but it only worked on indian names.
  • In this we have used features like n-gram, sonorants, ratio of syllables and boolean (last letter vowel).
  • We have also considered using features like frequency or date of birth but it didn't seem like a good path.
  • Even though we increased the dataset we couldn't increase the accuracy of Machine Laerning models. So we decided to use RNN.
  • We used RNN because it is good for sequential data. Note that this model will only work for first names and the API cleans the data for first names before predicting.
  • We have used hyperas for hyperparameter tuning.
  • All the data collected, scripts used and the py files are included in this repository.

Notes

  • There was a tensorflow session problem which was solved using this.
  • I have used flask to run the api. You can read all it here.
  • The cleaning done for first names in the API first removes all the unicode symbols, then converts the string to lower case letters, removes surnames if any and if there are surnames not in the repository then it leaves it as it is. Finally it takes the first word if there are more than two words in the finally cleaned name.
  • If you want to add more surnames, you can add it to ./data/indian_surnames.txt.
  • If the server is not accessible you can use the following to fix it. We got this fix here.
  1. sudo ufw enable
  2. sudo ufw allow 5000/tcp //allow the server to handle the request on port 5000

About

Name to gender classifier

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors