Web Scraping and Social Network Analysis (SNA) to analyze the research interst(s) of arbitrary Iranian university professors

This repository contains the code (and a sample of data) of a research project in the field of Social Network Analysis (SNA). We used requests module (in Python) to scrape the information of theses, supervised by a certain university professor, then use SNA to analyze different aspects of research done by that professor.

This research encompasses several steps as follows:

Scraping the information of a professor from irandoc.
Conducting Exploratory Data Analysis (EDA) on the crawled data.
Cleaning the data.

e.g., cleaning: the kewords and titles, unsupported characters, duplicate records, etc.
Converting data into an Adjacency matrix.
Converting the Adjacency matrix into a graph (using igraph).
Analyzing and visualizing the network by calculating different centrality measures.
Doing community detection (CD) to clustering theses into similar groups.

CD methods: i.e., Label Propagation, Eigenvector, Infomap, and Components

Requirements

Project Dir Structure

.
├── data
├── images
│   └── logos
├── logs
├── outputs
│   ├── csv
│   ├── json
│   └── plots
│       ├── 1. networks visualization
│       ├── 2. network with centrality measures
│       └── 3. community detection visualization
│           └── gephi
├── reports
└── utils

14 directories

If you have any questions, feel free to contact TekBoArt @tekboart.

License

Shield:

Refer to the file LICENSE for more information regarding the license of this repository.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
logs		logs
outputs		outputs
reports		reports
utils		utils
.gitignore		.gitignore
1. Iran_Doc_scraping_Pandas.ipynb		1. Iran_Doc_scraping_Pandas.ipynb
2. Iran_Doc_scraping_igraph.ipynb		2. Iran_Doc_scraping_igraph.ipynb
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping and Social Network Analysis (SNA) to analyze the research interst(s) of arbitrary Iranian university professors

Requirements

Project Dir Structure

License

About

Releases

Packages

Languages

License

tekboart/SNA-thesis

Folders and files

Latest commit

History

Repository files navigation

Web Scraping and Social Network Analysis (SNA) to analyze the research interst(s) of arbitrary Iranian university professors

Requirements

Project Dir Structure

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages