A web scraper and doc2vec project to identify company similarity based on website text data I have included a requirements.txt file to give you dependent libraries and their versions so as long as you have python downloaded you should be able to open this repo in your IDE and run `pip install -r requirements.txt from the root level.
Datasets are only small sample of original in order to fit in github. Feel free to add larger list of urls to train the model to be more robust/accurate