Skip to content
This repository has been archived by the owner on Jan 23, 2025. It is now read-only.

Improvement - Use Redis As The Unique Queue For Cronjob To Populate Index #6

Open
skyhit opened this issue Mar 5, 2018 · 2 comments

Comments

@skyhit
Copy link
Contributor

skyhit commented Mar 5, 2018

I would like to improve the service to use Redis as a unique queue for cronjob to populate index.

the refactoring will be like

  1. the Redis support duplication detection and working as a unique queue, so we will just need to update once if in the different periods, the same challenge or match need to update, see https://redis.io/commands/sadd and https://redis.io/commands/spop
  2. the endpoints will push the challenge id or match id into Redis as a candidate to aggregate data and populate the index
  3. there are cronjobs which will run periodically to find the changed challenge ids and match ids.
  4. there will be running threads, which will monitoring Redis to pop challenge ids and match ids and do the real aggregation and populate into index.
  5. so for initial load, we can have endpoints to do this purposely any time, just load every challenge ids and match ids and add into the Redis set (the running threads) will take care of that.

the current architecture, we need to adjust the environment variables, in order to do the initial load

@sushilshinde @ajefts Let me know your thought about this approach.

@sushilshinde
Copy link
Contributor

sushilshinde commented Mar 5, 2018

I think we should stick to elastic search because it has REST-based interface, and tuned for a search that's what we need most.

For duplicates, there are many practices to avoid that
https://qbox.io/blog/minimizing-document-duplication-in-elasticsearch

For initial load, the code should take care of the index is already populated.

@cwdcwd your comments

@skyhit
Copy link
Contributor Author

skyhit commented Mar 5, 2018

@sushilshinde you misundersand my approach, the final goal is not changed, it is going to populate the elasticsearch indexes.

what I am suggesting is the way to populate the elasticsearch indexes.

  1. the main part will be thread which is running all the time, the pickup the changed challenge ids or match ids from Redis cache, and populate the indexes.

  2. but there can be different ways to update the challenge ids and match ids in Redis cache, like a endpoint, which can be used purposely if we see some data in index is not updated, and we can force an update by pushing the challenge id into Redis cache, and the job in 1 will pick it up and populate the indexes.

it can be a cronjob will be monitoring the changes.

it can be other ways which just push the challenge ids and match ids into Redis cache

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants