Propaganda Detection

This is a solution for datathon Hack the News Datathon Case – Propaganda Detection task 1 and task 2.
The solutoin report can be found here. Propaganda Detection in Social Media

Environment setup

python3
tensorflow >= 1.11.0
install package spacy
install en_core_web_sm by python -m spacy download en_core_web_sm
install vader_lexicon by nltk.download('vader_lexicon')
Download bert into folder bert
Download BERT-Base, Uncased into folder checkpoint

Datasets

The original datasets can be downloaded from the datathon website.
In data folder, the data has been processed for BERT input.

Machine Learning models

task1_ML.ipynb: ML models include SVM, Logistic Regression, Random Forest and KNN for task 1, using full articles and article summaries.
task2_ML.ipynb: ML models include SVM, Logistic Regression, Random Forest and KNN for task 2, using text sentences and supplementary features (named entities and sentimental polarities).

BERT-based models

BERT_classifier.py: BERT classification
Example:

python BERT_classifier.py --data_dir=data/task_2 --bert_config_file=checkpoint/uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint=checkpoint/uncased_L-12_H-768_A-12/bert_model.ckpt --vocab_file=checkpoint/uncased_L-12_H-768_A-12/vocab.txt --output_dir=./output/BERT_classifier --max_seq_length 128 --do_train --do_eval --do_predict 2>&1 | tee output/BERT_classifier/training.log

BERT_post_matching.py: integrate supplementary features (named entities and sentimental polarities) into BERT by post-matching. Details are in the solution report.
Training parameters:

--data_dir=data/task_2
--post_matching = mean/concat (default: mean)
--use_ner = True/False (default: True)
--use_polarity = True/False (default: True)

Example:

python BERT_post_matching.py --data_dir=data/task_2 --bert_config_file=checkpoint/uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint=checkpoint/uncased_L-12_H-768_A-12/bert_model.ckpt --vocab_file=checkpoint/uncased_L-12_H-768_A-12/vocab.txt --output_dir=./output/BERT_post_matching --max_seq_length 128 --do_train --do_eval --do_predict --post_matching=mean --use_ner --use_polarity 2>&1 | tee output/BERT_post_matching/training.log

BERT_ner_embedding.py: integrate named entity features into BERT by input embedding. Polarity features are optionally integrated by post-matching. Details are in the solution report.
Training parameters:

--data_dir=data/task_2
--post_matching = mean/concat (default: mean)
--use_polarity = True/False (default: True)

Example:

python BERT_ner_embedding.py --data_dir=data/task_2 --bert_config_file=checkpoint/uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint=checkpoint/uncased_L-12_H-768_A-12/bert_model.ckpt --vocab_file=checkpoint/uncased_L-12_H-768_A-12/vocab.txt --output_dir=./output/BERT_ner_embedding --max_seq_length 128 --do_train --do_eval --do_predict --post_matching=mean --use_polarity 2>&1 | tee output/BERT_ner_embedding/training.log

BERT_multitask.py: multi-task training tasks include propaganda text classification, NER sequence labelling and sentimental polarity text classification. Details are in the solution report.
Training parameters:

--data_dir=data/task_2_ner
--polarity_threshold = 0.4 (The threshold of the absolute value of polarity compound score. default: 0.4)
--use_ner = True/False (default: True)
--use_polarity = True/False (default: True)

Example:

python BERT_multitask.py --data_dir=data/task_2_ner --bert_config_file=checkpoint/uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint=checkpoint/uncased_L-12_H-768_A-12/bert_model.ckpt --vocab_file=checkpoint/uncased_L-12_H-768_A-12/vocab.txt --output_dir=./output/BERT_multitask --max_seq_length 128 --do_train --do_eval --do_predict --use_ner --use_polarity --polarity_threshold=0.4 2>&1 | tee output/BERT_multitask/training.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Propaganda Detection

Environment setup

Datasets

Machine Learning models

BERT-based models

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bert		bert
data		data
BERT_classifier.py		BERT_classifier.py
BERT_multitask.py		BERT_multitask.py
BERT_ner_embedding.py		BERT_ner_embedding.py
BERT_post_matching.py		BERT_post_matching.py
README.md		README.md
task1_ML.ipynb		task1_ML.ipynb
task2_ML.ipynb		task2_ML.ipynb

qjiang002/Propaganda-Detection

Folders and files

Latest commit

History

Repository files navigation

Propaganda Detection

Environment setup

Datasets

Machine Learning models

BERT-based models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages