We aim to understand how both sides of Rassian-Ukrainian conflict used computational propaganda on twitter and how their influence and strategies differ. The complete report can be found here.
We used the "Ukraine Conflict Twitter Dataset" on Kaggle.
In "code" file there are all the code for data processing and network analysis, and this includes:
- 0_DCol.ipynb: Ramdom data sampling and regrouping
- 1_DPP.ipynb: Data processing
- Hashtag sorting
- Political stance categorisation of tweets and users
- Tweet text processing
- 2_SEDA.ipynb: Simple Exploratory data analysis (EDA)
- Discussion trend by political stance
- Word clouds for both stances at different timeframe
- 3_NCon_N.ipynb: Network Construction
- Bipartite Network construction
- Projected one-mode user network
- Export edgelist and nodelist for Gephi visualisation
- Node attributes: political orientation index and eigenvector centrality
- 4_NAna.ipynb: Bot detection
- Bot detection using botometer API
- 5_bipartite_network_analysis.ipynb: Network analysis
- Network properties: size, density, diameter, Average shortest path length, centrality, etc.
- 6_Propaganda.ipynb: Trend and information flow of Russia propaganda
- Discussion trend of the following keywords by time and political stance
- "special military operation"
- "Neo-Nazis" and "fascists"
- Discussion trend of the following keywords by time and political stance
Code folder also include the most important datasets:
20220502_resampled_dataset.csv contains the raw data of tweets randomly sampled from the Kaggle dataset.
preprocessed_data.pkl: output of processed data, with all tweet-related and user-related information, added cleaned hashtags and political stance categorisation.
labelled.txt includs the final list of most frequent hashtags manually labelled for political stance.
Figures folder has all the plots of tweeter trends, including wordclouds, frequency histogram of hashtags, and density distribution of tweet/user creation time.
We used Gephi to visualise the networks. In the Gephi visualisation folder:
nodelist1.csv and nodelist2.csv: Node list exported for Gephi visualisation, containing user information of userid,username, user created date, No. Following and followers, total number of tweets, political orientation index, politcal categorization, and eigenvector centrality in the projected user network.
projected_w_user_edgelist_1.csv and projected_w_user_edgelist_2.csv: Weighted edgelist for the projected user network, where node representing users, edge representing the action of sharing the same tweet, and weight representing the number of overlapping tweets.
clusters_viz1_new.gephi and clusters_viz2_new.gephi are the updated Gephi graph file where we produced the network visualisation in the report.