-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving DataLayerService read performance #10
Comments
created a new branch "feature/datalayer_lease" for this feature. |
It's not a lease, because that would imply the local data layer [cache] has a temporal ownership of the global data, which it doesn't - any other local data layer can write the same identifier to the global backend, which renders the local cache content invalid. It's an expiration timer typically labelled TTL. |
As an alternative to the above solution, one could utilize Redis as a caching layer in front of Riak. See https://riak.com/products/riak-kv/redis-integration/index.html. In this alternative, the DataLayerService would become a translation layer for storage operations:
As a result of these modifications, the DataLayerService will not cache anything; thus, offloading cache-related problems like invalidation to Redis. Furthermore, its logic will be simplified, so that it can become part of each sandbox. As an implementation optimization regarding JVMs, it can be merged with the queue service. @ruichuan, @paarijaat: Did I miss anything? |
As far as I can tell, the data layer service is only used by sandboxes (including management) and the storage frontend, which is going to be removed (see #28). After that, the data layer service will be the only remaining component in the bare metal setup that requires the fluentbit server to send the logs to elasticsearch (in fact, the fluentbit server is not utilized in the knative setup anyway). If the data layer service moves inside the sandbox, we can also get rid of the extra fluentbit server in bare metal, because the sandbox logs are already sent to elasticsearch. That move would affect the way the management service is bootstrapped, though. |
@iakkus thanks, it captures our discussion well. The data layer service (DLS) is a bit more than a simple translation layer of redirecting requests to riak/redis for processing. DLS needs to use respective client libraries to organize operations in certain ways for correct processing. I agree that the new `riak_redis' approach can simplify DLS. Regarding where to put DLS (either standalone or inside a sandbox), we can decide later. It anyways is a simple move. |
switched this work to "feature/redis_caching" branch |
[Environment]: Kubernetes, bare metal
[Known affected releases]: master (includes all releases)
The current DataLayerService asks users to specify whether they want to retrieve the data from local or global data layer. In the future, it'd be valuable to maintain
2.1) if it is in the cache and it hasn't expired yet, it returns the value
2.2.) if it is in the cache but expired OR if it is not in the cache, it retrieves from the global data layer, caches it with the new expiry date timestamp and returns that value.
The 'lease duration' is then a configurable value.
As another improvement, at step 2.1 (when the value in the cache hasn't expired yet), after returning to the client, the data layer server may make another asynchronous request to the global data layer server around the expiry time to retrieve the latest value and update the expiry date timestamp accordingly. This approach would try to keep the caches synchronized and may increase the overhead a bit, but it will only be for the items that have been accessed recently. For others, the cache will expire as normal. (So for a cached item that is accessed once, the time in the cache will be extended at least once more lease duration. Afterwards, if it hasn't been accessed again, it will expire normally.)
Then we could have a really lazy thread running once every in a while to clean up the cache to remove the expired values (simply go over the cache and remove the expired ones). This is an optimization to reduce resource consumption and may not be critical.
The write synchronization stays the same: it is written to the local cache and sent to the global data layer asynchronously. We can apply the same principle here as well: after writing to the local cache, the expiry date will be set accordingly (hopefully, by that time the synchronization to the global data layer will have finished). Perhaps, we can use this criterion for having an idea about the lease duration.
The text was updated successfully, but these errors were encountered: