-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor models to satisfy desired invariants #49
Conversation
It turns out there are a few mistakes easy to make when defining new model classes (and we've made them on almost all classes defined so far). These mistakes can break expected usage of hash/== and a few other things. Before fixing this, let's add some tests to cover the problems. These tests will verify that model objects satisfy basic invariants such as "actual types match declared types" and hash/== functionality. As can be seen in the test, every model class right now is unable to pass the tests and so are marked as XFAIL.
The intent is for all public model classes to be immutable. However, a problem arises when a model is expected to have a list field, as lists are mutable and not hashable. Previously, the attempted fix for this was to use tuples for those fields instead. This was done since a tuple is the closest thing to an immutable list built-in to Python. Unfortunately it turns out there are some undesirable properties from using tuples. The biggest issue is that most APIs are defined in terms of lists and developers are usually picking lists as the default choice for an iterable. This means that, even if we define e.g. signing_keys to be a tuple, a dev is likely to write: Repository(signing_keys=['foo', 'bar'], ...) And we can make that "work" by adding a conversion to tuple, but unfortunately a tuple is not equal to a list of same length and contents, which results in unexpected behavior: # assertion fails! assert Repository(signing_keys=['key']).signing_keys == ['key'] It seems that if we continue using tuples, users of these classes would have to be constantly vigilant to remember whether a value is to be passed and compared as a tuple or list, which is quite user-unfriendly. So, to keep the ease-of-use of lists while avoiding the downsides of tuples, let's define our own immutable list type, FrozenList. This type is just like a list except that no mutation is allowed, and instances can be hashed (like a tuple). This class should be used on attributes like: - use default=Factory(FrozenList) if you want to default to an empty list - use type=list, and not FrozenList, because `type` acts as the documented type and we are not committing to using this specific type - use converter=FrozenList to convert from input iterables into a FrozenList The resulting behavior is as follows: # can create with a plain list repo = Repository(signing_keys=['key']) # assertion passes assert repo.signing_keys == ['key'] # attempting to modify crashes repo.signing_keys.append('other-key')
Most model classes had some bugs causing them to fail invariant checks. Fix them all. The most common issue was missing validation against type. Two larger changes include: - MaintenanceReport: some fields were declared and documented as type=datetime, but all of the code was implemented using strings (see related test update). This has been fixed to use datetime throughout. - Page previously was not frozen at all. Now it is, like the other model classes.
Recent model changes have adjusted the return type of some methods and made certain behavior more strict. Though there probably aren't affected clients right now, this is technically backwards-incompatible, so the major version should be bumped.
default=attr.Factory(tuple), | ||
type=tuple, | ||
default=attr.Factory(FrozenList), | ||
type=list, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not "FrozenList"?
In docs there's:
Please note that attrs doesn’t do anything with this metadata by itself. You can use it as part of your own code or for static type checking.
having list here makes it confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attrs docs aren't quite accurate, or at least, aren't telling the whole story. These types do actually make it into the API as they are added as Python3 type annotations, where tools like pylint, mypy and sphinx look at them. I expect them to end up in our docs eventually, though it's not quite working right now (#24).
For example, before this change the declared type of 'distributors' argument shows up in API as:
>>> inspect.signature(Repository.__init__).parameters['distributors'].annotation
<class 'tuple'>
After this change, it will be 'list'. I do believe list is more appropriate than FrozenList here because:
- setting
type=FrozenList
is exposing that type as part of our public API, but I don't want to commit to returning instances of exactly that class - given that it's used as type annotation on the constructor, the appropriate usage seems more like 'declare this as the type you accept' and not 'declare this as the type you return/store'. So, type=list is fine as the constructor does indeed accept any type of list and doesn't have to be passed exactly a FrozenList.
@@ -0,0 +1,35 @@ | |||
class FrozenList(list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the benefit of having custom class instead of tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote explanation for that in 75790f1 commit message. In summary: the previous solution was to use tuple but it has usability issues because developers choose lists as the default container and tuples aren't as compatible with them as it may seem at first glance. For instance (1, 2, 3) == [1, 2, 3]
isn't true.
That means, if it continued using tuples, developers have to get in the habit of remembering where they need to write a tuple and where they need to write a list, or writing tuples everywhere (which is a tough ask), or running into weird bugs.
Our model classes had various bugs and weaknesses. This pull request
fixes them and adds tests to ensure that the same mistakes can't happen
on new classes.
The original motivation here was to fix up inconsistent usage of tuples vs lists
for immutability, but testing found several additional bugs to be fixed, most
notably that the recently added maintenance classes had fields using an
entirely different type than what was declared in the model.
The changes made here include:
such as "actual types match declared types" and "all fields are immutable"
formerly using tuples or plain lists
(although I doubt that any current users are affected).
More information on each of these changes can be found in the commit messages.