Performance hints

Compute key for each image

When incoming in the chain, we compute the hashkey on the first 2.5mb of the image. This hashkey is then used to identify the image in the whole process. These 2.5 mb contain all the exif metadata we need to extract and save in the hbase database.

Filter out duplicates

We use a kafka stream filter to avoid the duplicate of the images. This kafka stream filter records the last processed hashkeys and then filters out the next incoming hashkeys which were previously recorded.

Ignite : a distributed cache

We use ignite to avoid to transfer through kafka the 2.5 mb in which the exif metadata are stored. So

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance hints

Compute key for each image

Filter out duplicates

Ignite : a distributed cache

Coprocessors in hbase

Pagination

Secundaries indexes

Clone this wiki locally