GitHub

A web scraper and doc2vec project to identify company similarity based on website text data I have included a requirements.txt file to give you dependent libraries and their versions so as long as you have python downloaded you should be able to open this repo in your IDE and run `pip install -r requirements.txt from the root level.

Datasets are only small sample of original in order to fit in github. Feel free to add larger list of urls to train the model to be more robust/accurate

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
DATASETS		DATASETS
README.md		README.md
concurrent_scraper copy.py		concurrent_scraper copy.py
podium_d2v_urls_copy.ipynb		podium_d2v_urls_copy.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Justinbenfit23/podium_url_d2v

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages