Skip to content

Performance hints

hedes-gs edited this page Feb 16, 2021 · 5 revisions

Compute key for each image

When incoming in the chain, we compute the hashkey on the first 2.5mb of the image. This hashkey is then used to identify the image in the whole process. These 2.5 mb contain all the exif metadata we need to extract and save in the hbase database.

Filter out duplicates

We use a kafka stream filter to avoid the duplicate of the images. This kafka stream filter records the last processed hashkeys and then filters out the next incoming hashkeys which were previously recorded.

Ignite : a distributed cache

We use ignite to avoid to transfer through kafka the 2.5 mb in which the exif metadata are stored. So

Coprocessors in hbase

Pagination

TBD.

Secundaries indexes

TBD.