Developed by Antonio Finocchiaro and Riccardo Cuccia, this project is part of the Technologies For Advanced Programming course at the University of Catania.
The goal of this project is to develop a system capable of updating the list of latest releases in the music scenario, and to provide a real-time analysis of the emotions that the song is able to transmit to the listener.
- Web Scraping: BeautifulSoup and Genius API
- Centralized service: Zookeeper
- Data Ingestion: Logstash
- Data Streaming: Apache Kafka and Spark Structured Streaming
- Data Processing: Apache Spark with SparkML, Spacy and TextBlob
- Data Indexing: Elasticsearch
- Data Visualization: Kibana
- Apache Kafka: download from here and put the tgz file into Kafka/Setup directory.
- Apache Spark: download from here and put the tgz file into Spark/Setup directory.
Execute python3 starter.py
in the main folder. This script will build the docker images for the project.
After the previous step is completed, the project can be started by using the code docker-compose up
Here are the addresses to see the results of the project:
- Kafka UI: http://localhost:8080
- Kafka Server: http://localhost:9092
- Kibana: http://localhost:5601
- Logstash: http://localhost:5092
- ElasticSearch: http://localhost:9200