SMFR

Introduction

Social Media Flood Risk (SMFR) is a platform to monitor specific flood events on social media (currently, only Twitter).

The software is developed as Exploratory Research project at the JRC - European Commission facilities and is released under EUPL licence.

Current version is based on:

Twitter Stream API to filter tweets using keywords and/or bounding box criteria
ML algorithms and models for classification of tweets (flood relevance)
Geonames index for geocoding of relevant tweets
Cassandra to keep raw tweets data
Kafka and MySQL as infrastructure elements

When a potential catastrofic flood event is recorded in EFAS, SMFR is notified and will start to:

Collect tweets with Twitter Stream API
A pipeline is implemented to collect, annotate and geocode tweets in sequence.

Final product of SMFR is an event-related map reporting relevant tweets and affected areas.

Installation and Configuration

Docker configuration

Ensure to have installed Docker and Docker Compose
Get the source code: $ git clone https://github.com/domeniconappo/SMFR.git
Enter into SMFR folder and copy .env.tpl file to .env.
- $ cp .env.tpl .env
Edit the file .env and change defaults according your needs (see later in this document details about configuration parameters and variables).
Copy all yaml.tpl files to have extension yaml and edit them according your configuration (e.g. admin_collector.yaml which contains Twitter client keys and secrets)
Execute ./build.sh if you need to rebuild images. This step can take several minutes and will also push updates to Docker registry in case the DOCKER_ID_USER is set and it's got rights to push.

In this case, you must login to Docker Hub (just issue $ docker login before to build).

Execute $ ./singlenode_up.sh for local testing or solution deployment on a single server, $ ./swarm_up.sh script if you deploy to a Docker Swarm cluster.
You will see all services starting:
- cassandrasmfr
- mysql
- kafka
- geonames (Gazzetter)
- persister
- annotator
- geocoder
- aggregator
- products
- restserver
- web
Wait a minute for services to get "warm" and connect to each other.
Connect to the web interface by pointing your browser to http://:8888
REST Server API responds to http://:5555/1.0 calls.
Swagger UI is available at http://:5555/1.0/ui
Elasticsearch geonames instance at http://:9200

Init and manage Databases

Cassandra

Create a new SUPERUSER in cassandra (with same user and password you set up in your jmxremote.access)

$ docker exec -it cassandrasmfr cqlsh -u cassandra -p cassandra
Connected to Test Cluster at 127.0.0.1:9042.
cassandra@cqlsh> CREATE USER myuser WITH PASSWORD 'mypassword' SUPERUSER;
cassandra@cqlsh> ALTER USER cassandra WITH PASSWORD 'anotherpassword';

Geonames

Check that Geonames index is up in ES: Connect to http://localhost:9200/_cat/indices?v You should see something like the output below (check docs.count and store.size):

health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   geonames 23vFz20STbudmqktmHVOLg   1   1   11139265            0      2.7gb          2.7gb

MySQL

Whenever new models (or new fields) are added to the mysql schema, execute the following script to update SMFR DB.

$ ./upgrade_db.sh

The script will:

start MySQL container only
install models package in a virtualenv
execute flask alembic commands for migrations
shutdown MySQL container

The last command flask db upgrade that performs the actual schema migration, it's executed at each startup of the rest server image anyway so upgrade_db.sh is used in DEV phase only, to add migrations files.

Note: You have to create migrations in development and push them to GIT repo. Then you have to apply migrations on all systems where SMFR runs (dev, test, prod etc.)

Upgrading Nuts tables

When a new version of nuts tables is available (compressed json files are under scripts/seeds/data/ repository folder), execute:

$ ./upgrade_nuts_tables.sh

Tip: From host, connect to MySQL DB by using docker exec: docker exec -it mysql mysql -h 127.0.0.1 -p

Cassandra

Table migrations (i.e. new columns) will be automatically added by CQLAlchemy. Note: New columns are added by CQLAlchemy but you have to manually drop or alter types of existing columns using cqlsh.

From host, use cqlsh on docker container to connect to DB: docker exec -it cassandrasmfr cqlsh -u $CASSANDRA_USER -p $CASSANDRA_PASSWORD

Troubleshooting

This sections tries to address all kind of issues you can have at system level when you run SMFR suite.

MySQL is slow

MySQL seems to have some problems with ext4 filesystem. If you are in Ubuntu and MySQL operations are extremely slow, this can depend on filesystem settings. Follow this article to fix: http://phpforus.com/how-to-make-mysql-run-fast-with-ext4-on-ubuntu/_**

Issues with Geonames Elasticsearch and/or Cassandra

If you see an error/warning about vm.max_map_count variable in Elasticsearch logs of Geonames docker image, or in Cassandra docker image, the vm.map_max_count setting should be set permanently in /etc/sysctl.conf of the host machine:

$ grep vm.max_map_count /etc/sysctl.conf
vm.max_map_count=1048575

Cassandra tombstones

Null values produce cell tombstones. Cassandray will abort queries if tombstones threshold is reached. To avoid it: set gc_grace_seconds to 0 for tweet table:

alter table smfr_persistent.tweet with gc_grace_seconds = 0;

Free some disk space from unused 'dockers'

docker images --no-trunc | grep '<none>' | awk '{ print $3 }' | xargs -r docker rmi

docker-compose down --rmi all --remove-orphans

Use the following script to clean up everything at once: https://lebkowski.name/docker-volumes/

#!/bin/bash

# remove exited containers:
docker ps --filter status=dead --filter status=exited -aq | xargs -r docker rm -v

# remove unused images:
docker images --no-trunc | grep '<none>' | awk '{ print $3 }' | xargs -r docker rmi

# remove unused volumes:
find '/var/lib/docker/volumes/' -mindepth 1 -maxdepth 1 -type d | grep -vFf <(
  docker ps -aq | xargs docker inspect | jq -r '.[] | .Mounts | .[] | .Name | select(.)'
) | xargs -r rm -fr

Interfaces

Connect to http://localhost:8888/ for SMFR web interface.

Name		Name	Last commit message	Last commit date
Latest commit History 737 Commits
aggregator		aggregator
annotator		annotator
backupper		backupper
base		base
cassandrasmfr		cassandrasmfr
collectors		collectors
geocoder		geocoder
geonames		geonames
mysql		mysql
persister		persister
products		products
restserver		restserver
scripts		scripts
tester		tester
web		web
.dockerignore		.dockerignore
.env.tpl		.env.tpl
.gitignore		.gitignore
AUTHORS.md		AUTHORS.md
DEVELOPER_NOTES.md		DEVELOPER_NOTES.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
docker-compose.dbs.swarm.yaml		docker-compose.dbs.swarm.yaml
docker-compose.dbs.yaml		docker-compose.dbs.yaml
docker-compose.override.yaml		docker-compose.override.yaml
docker-compose.swarm.yaml		docker-compose.swarm.yaml
docker-compose.test.yaml		docker-compose.test.yaml
docker-compose.yaml		docker-compose.yaml
reload_webapp.sh		reload_webapp.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMFR

Introduction

Installation and Configuration

Docker configuration

Init and manage Databases

Cassandra

Geonames

MySQL

Upgrading Nuts tables

Cassandra

Troubleshooting

MySQL is slow

Issues with Geonames Elasticsearch and/or Cassandra

Cassandra tombstones

Free some disk space from unused 'dockers'

Interfaces

About

Releases

Packages

Languages

License

domeniconappo/smfr

Folders and files

Latest commit

History

Repository files navigation

SMFR

Introduction

Installation and Configuration

Docker configuration

Init and manage Databases

Cassandra

Geonames

MySQL

Upgrading Nuts tables

Cassandra

Troubleshooting

MySQL is slow

Issues with Geonames Elasticsearch and/or Cassandra

Cassandra tombstones

Free some disk space from unused 'dockers'

Interfaces

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages