Web Scraping and Social Network Analysis (SNA) to analyze the research interst(s) of arbitrary Iranian university professors
This repository contains the code (and a sample of data) of a research project in the field of Social Network Analysis (SNA). We used requests
module (in Python) to scrape the information of theses, supervised by a certain university professor, then use SNA to analyze different aspects of research done by that professor.
This research encompasses several steps as follows:
- Scraping the information of a professor from irandoc.
- Conducting Exploratory Data Analysis (EDA) on the crawled data.
- Cleaning the data.
e.g., cleaning: the kewords and titles, unsupported characters, duplicate records, etc.
- Converting data into an Adjacency matrix.
- Converting the Adjacency matrix into a graph (using igraph).
- Analyzing and visualizing the network by calculating different centrality measures.
- Doing community detection (CD) to clustering theses into similar groups.
CD methods: i.e., Label Propagation, Eigenvector, Infomap, and Components
.
├── data
├── images
│ └── logos
├── logs
├── outputs
│ ├── csv
│ ├── json
│ └── plots
│ ├── 1. networks visualization
│ ├── 2. network with centrality measures
│ └── 3. community detection visualization
│ └── gephi
├── reports
└── utils
14 directories
If you have any questions, feel free to contact TekBoArt @tekboart.
- Refer to the file
LICENSE
for more information regarding the license of this repository.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.