-
Notifications
You must be signed in to change notification settings - Fork 4
feat: Policy cache invalidation approach #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the pull request, @rodmgwgu! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. Where can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
openedx_authz/engine/enforcer.py
Outdated
| None | ||
| """ | ||
| current_timestamp = time.time() | ||
| cache.set(cls.CACHE_KEY, current_timestamp, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if using Django cache this way would always guarantee invalidation across processes, It may depend on the backend used, for tutor redis, it should at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a short conversation with @ormsbee I think the short answer is that it's possible to configure several different caches, at least across lms/cms so this may not be a reliable method to do the invalidation. The most reliable way would be to put it in the database, where it's guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have different caches for the same service? #140 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be worth wile to implement this using the database? it won't be as performant as we are adding a hit to the db, but at least it is a simple query, which seems better than reloading the whole policy on every request.
Or perhaps, make a configuration switch to use either cache or the db?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use the db and just solve it for good. It should just be a 1 row table that mysql should serve out of cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor note: We don't necessarily care about the timestamp, and comparisons with time.time() across multiple machines can potentially introduce intermittent issues with clock skew. We mostly just care about the question of "does the current in-process state match the state of the database", which could be done by setting any random value whenever there's an invalidation and having the workers check their "current state" var with that one.
I do agree with @bmtcril that we'll probably need a more granular caching mechanism before we expand past libraries.
cbecf01 to
ef70b2c
Compare
|
Thank you so much for moving this forward! I don’t think we can assume all installations share the same Redis setup, but here’s what we know for the MVP:
So we can only guarantee consistency within processes of the same service. It would be even better if we could use a single service, so the admin console could call the CMS where the inline enforcements happen. Right? I asked internally the infra team and they don't believe this is a common practice, but better be safe I guess. |
mariajgrimaldi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we run make format? Thanks! Everything else looks good :)
I'll be testing around in our remote environment :)
I think the problem here is the invalidation. How do the LMS processes know when the CMS invalidates the cache due to a permissions update? |
Of course. This approach only works if we focus on a single service, which might be fine for the MVP. The inline enforcements happen in the CMS, so the admin could call the APIs there. But this won't hold long term. So a better option could be a singleton model that stores the invalidation data as you mentioned. |
openedx_authz/engine/enforcer.py
Outdated
| """ | ||
| last_modified_timestamp = PolicyCacheControl.get_last_modified_timestamp() | ||
|
|
||
| current_timestamp = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using time.time(), as Dave suggested, we could have sort of a counter to compare between local state (cache - local to the process) vs global state (db - shared by all processes). If the global invalidation counter > local invalidation counter then an invalidation occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a counter can introduce (probably very rare) race conditions where 2 processes increment to the same value and neither of them gets the other's updates, I'd just do something like a guid for safety.
ff0d841 to
f50123a
Compare
| ******************** | ||
|
|
||
| Changed | ||
| ======= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this was on purpose?
mariajgrimaldi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you so much for moving this forward :)
If we find any issues related to this, I think we can address them in a different PR. Again, thank you so much!
|
This should just need the model docstring annotated that it doesn't contain PII:
|
BryanttV
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rodmgwgu, thank you very much for this! I tested it locally and it works very well. I just have a few minor comments.
openedx_authz/engine/enforcer.py
Outdated
| last_version = PolicyCacheControl.get_version() | ||
|
|
||
| if last_version is None: | ||
| # No timestamp in cache; initialize it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks!
openedx_authz/tests/test_enforcer.py
Outdated
| def test_load_policy_if_needed_initializes_cache_timestamp(self, mock_toggle): | ||
| """Test that load_policy_if_needed initializes cache timestamp on first call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to update this according to the new UUID approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks!
f50123a to
de47a7d
Compare
BryanttV
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!


Related issue: #133
This PR intends to solve the issue of having to hard-load all Casbin permissions on every request, by implementing a simple per-process caching strategy with global invalidation support.
This means, that even when we have multiple copies of the Casbin policies loaded on multiple processes (uwsgi workers on a single lms or cms container instance, the lms and cms instances themselves, or multiples of these on a kubernetes cluster), we will always be sure to have the latests policies loaded in memory, by reloading them only when needed by checking an invalidation timestamp on a cross-process cache.
Note: This PR also disables the auto-reload functionality by default (by setting CASBIN_AUTO_LOAD_POLICY_INTERVAL to 0 on the defaults), as this should no longer be needed.
Approach: Handle cache invalidation via Django cache with a timestamp
It works like this (thinking on how it would work on a standard tutor prod setup):
Concerns
The invalidation mechanism implemented here works correctly when the Django cache is configured in such a way that all lms and cms processes share the same cache backend, which is the case on the way tutor deploys the system, by using a single redis instance as the Django cache backend.
I'm not sure what other cache configurations would be supported by Open edX, in theory, a memcached backend would also work if it's a shared instance, but if the cache is set up in any way where some instances connect to different, isolated backends, then the cache invalidation won't be guaranteed across all processes and services.
Merge checklist:
Check off if complete or not applicable: