Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mmif.serialize pythonic behavior #304

Open
wants to merge 3 commits into
base: 2.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions documentation/target-versions.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"``mmif-python`` version","Target MMIF Specification"
1.0.20.dev1,"1.0.5"
1.0.16,"1.0.4"
1.0.15,"1.0.4"
1.0.14,"1.0.4"
Expand Down
38 changes: 38 additions & 0 deletions mmif/serialize/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,41 @@
"""
Aggregatitive summary for mmif.serialize package:

''Prerequisites'': recall CLAMS MMIF structure at https://mmif.clams.ai/1.0.5/#the-structure-of-mmif-files

The MMIF follows JSON schema which consists of two data structures: dictionary/hash and list.
Therefore, for the purpose of best practice in Python's OOP, MMIF overwrites its own 'dict-like'
and 'list-like' classes.

As a high-level overview of the package, the following parent classes are defined first:

- `MmifObject`: a base class for MMIF objects that are ''dict-like''
- `DataList`: a base class for MMIF data fields that are ''list-like''
- `DataDict`: a base class for MMIF data fields that are ''dict-like''

Then, the following classes are defined and categorized into either ''dict-like''
or ''list-like'' child classes:

'''dict-like''':
- `Mmif`: a class for MMIF objects
- `MmifMetaData`: a class for MMIF metadata
- `View`: a class for MMIF view
- `ViewMetaData`: a class for MMIF view metadata
- `ErrorDict`: a class for a specific view that contains error
- `ContainsDict`: a class for `View`'s 'contains' field
- `Annotation` & `Document`: a class for MMIF annotation and document
- `AnnotationProperties` & `DocumentProperties`: a class for MMIF annotation properties
- `Text`: a class for `Document`'s text field

'''list-like''':
- `DocumentsList`: a class for a list of `Document` objects
- `ViewsList`: a class for a list of `View` objects
- `AnnotationsList`: a class for a list of `Annotation` objects

By doing so, the any field of a ''dict-like'' class should be accessed if and
only if by either bracket `[key]` or `.get(key)`. For a ''list-like'' class, the
elements are ordered and accessed by index, for example, `[idx]`.
"""
from .annotation import *
from .annotation import __all__ as anno_all
from .mmif import *
Expand Down
188 changes: 93 additions & 95 deletions mmif/serialize/annotation.py

Large diffs are not rendered by default.

171 changes: 89 additions & 82 deletions mmif/serialize/mmif.py

Large diffs are not rendered by default.

133 changes: 89 additions & 44 deletions mmif/serialize/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@

import json
from datetime import datetime
from typing import Union, Any, Dict, Optional, TypeVar, Generic, Generator, Iterator, Type, Set, ClassVar
from collections import OrderedDict
from typing import Union, Any, Dict, Optional, TypeVar, Generic, Generator, Iterator, Type, Set, ClassVar, List

from deepdiff import DeepDiff

Expand Down Expand Up @@ -44,49 +45,49 @@

1. _unnamed_attributes:
Only can be either None or an empty dictionary. If it's set to None,
it means the class won't take any ``Additional Attributes`` in the
JSON schema sense. If it's an empty dict, users can throw any k-v
pairs to the class, as long as the key is not a "reserved" name,
it means the class won't take any ``Additional Attributes`` in the
JSON schema sense. If it's an empty dict, users can throw any k-v
pairs to the class, as long as the key is not a "reserved" name,
and those additional attributes will be stored in this dict while
in memory.
in memory.
2. _attribute_classes:
This is a dict from a key name to a specific python class to use for
deserialize the value. Note that a key name in this dict does NOT
have to be a *named* attribute, but is recommended to be one.
3. _required_attributes:
This is a simple list of names of attributes that are required in
the object. When serialize, an object will skip its *empty* (e.g.
zero-length, or None) attributes unless they are in this list.
Otherwise, the serialized JSON string would have empty
This is a simple list of names of attributes that are required in
the object. When serialize, an object will skip its *empty* (e.g.
zero-length, or None) attributes unless they are in this list.
Otherwise, the serialized JSON string would have empty
representations (e.g. ``""``, ``[]``).
4. _exclude_from_diff:
This is a simple list of names of attributes that should be excluded
from the diff calculation in ``__eq__``.
This is a simple list of names of attributes that should be excluded
from the diff calculation in ``__eq__``.

# TODO (krim @ 8/17/20): this dict is however, a duplicate with the type hints in the class definition.
Maybe there is a better way to utilize type hints (e.g. getting them
as a programmatically), but for now developers should be careful to
Maybe there is a better way to utilize type hints (e.g. getting them
as a programmatically), but for now developers should be careful to
add types to hints as well as to this dict.

Also note that those special attributes MUST be set in the __init__()
before calling super method, otherwise deserialization will not work.

And also, a subclass that has one or more *named* attributes, it must
set those attributes in the __init__() before calling super method.
When serializing a MmifObject, all *empty* attributes will be ignored,
so for optional named attributes, you must leave the values empty
(len == 0), but NOT None. Any None-valued named attributes will cause
set those attributes in the __init__() before calling super method.
When serializing a MmifObject, all *empty* attributes will be ignored,
so for optional named attributes, you must leave the values empty
(len == 0), but NOT None. Any None-valued named attributes will cause
issues with current implementation.

:param mmif_obj: JSON string or `dict` to initialize an object.
If not given, an empty object will be initialized, sometimes with
an ID value automatically generated, based on its parent object.
"""

view_prefix: ClassVar[str] = 'v_'
id_delimiter: ClassVar[str] = ':'

# these are the reserved names that cannot be used as attribute names, and
# these are the reserved names that cannot be used as attribute names, and
# they won't be serialized
reserved_names: Set[str] = {
'reserved_names',
Expand Down Expand Up @@ -274,9 +275,9 @@

def __len__(self) -> int:
"""
Returns number of attributes that are not *empty*.
Returns number of attributes that are not *empty*.
"""
return (sum([named in self and not self.is_empty(self[named]) for named in self._named_attributes()])
return (sum([named in self and not self.is_empty(self[named]) for named in self._named_attributes()])
+ (len(self._unnamed_attributes) if self._unnamed_attributes else 0))

def __setitem__(self, key, value) -> None:
Expand Down Expand Up @@ -307,13 +308,28 @@
value = self.__dict__[key]
elif self._unnamed_attributes is None:
raise AttributeError(f"Additional properties are disallowed by {self.__class__}: {key}")
else:
else:
value = self._unnamed_attributes[key]
if key not in self._required_attributes and self.is_empty(value):
raise KeyError(f"Property not found: {key} (is it set?)")
else:
else:
return value

def get(self, key, default=None) -> Optional[Union['MmifObject', str, datetime]]:
"""
The dict-like get(key, default) method.
If the key is not found, it returns the default value instead of raising KeyError.
Note: Although KeyError is not raised, AttributeError can be raised if the key is not disallowed.

:param key: the key to search for
:param default: the default value to return if the key is not found
:return: the value matching that key or the default value
"""
try:
return self[key]
except KeyError:
return default

Check warning on line 331 in mmif/serialize/model.py

View check run for this annotation

Codecov / codecov/patch

mmif/serialize/model.py#L330-L331

Added lines #L330 - L331 were not covered by tests


class MmifObjectEncoder(json.JSONEncoder):
"""
Expand Down Expand Up @@ -362,29 +378,35 @@
"""
Passes the input data into the internal deserializer.
"""
super().deserialize(mmif_json)
super().deserialize(mmif_json)

@staticmethod
def _load_json(json_list: Union[list, str]) -> list:
if type(json_list) is str:
json_list = json.loads(json_list)
return [MmifObject._load_json(obj) for obj in json_list]

def _deserialize(self, input_list: list) -> None:
raise NotImplementedError()

def get(self, key: str) -> Optional[T]:
def get_item(self, key: str) -> Optional[T]:
"""
Standard dictionary-style get() method, albeit with no ``default``
parameter. Relies on the implementation of __getitem__.

Will return ``None`` if the key is not found.
Mimic to standard dictionary-style get() method,
albeit with no ``default`` parameter (which will return ``None``).
However, it doesn't rely on the implementation of __getitem__,
so it can be understood as a backdoor underneath the list property of the class.

:param key: the key to search for
:return: the value matching that key
"""
key_id = None
if ":" in key:
_, key_id = key.split(":")
try:
return self[key]
if key_id:
return self._items[key_id]
else:
return self._items[key]
except KeyError:
return None

Expand All @@ -406,22 +428,45 @@
if not overwrite and key in self._items:
raise KeyError(f"Key {key} already exists")
else:
self[key] = value
self._items[key] = value

def append(self, value, overwrite):
raise NotImplementedError()

def __getitem__(self, key: str) -> T:
if key not in self.reserved_names:
return self._items.__getitem__(key)
else:
raise KeyError("Don't use __getitem__ to access a reserved name")
def __getitem__(self, index: int) -> T:
"""
Index-based access for list-like behavior.

def __setitem__(self, key: str, value: T):
if key not in self.reserved_names:
self._items.__setitem__(key, value)
else:
super().__setitem__(key, value)
FIXME:
--------
Since this implementation makes use of python's built-in list,
I didn't explicitly handle the `IndexError` manually. Instead,
I depended on the built-in list's behavior to raise the error.
--------

:param idx: the index of the item to access
:return: the item at the specified index
"""
return list(self._items.values()).__getitem__(index)

def __setitem__(self, index: int, value: T) -> None:
"""
Set up the item at the specified index.

IMPORTANT:
--------
For any subclass that needs to implement this method, it's list-like but
assumed to be read-only. That's to say, once the data list is built, the
items are not supposed to be changed. If any attempt that changes the items,
an error would be raised by this method.
--------

:param index: the index of the item to set
:param value: the value to set at the specified index

:raise TypeError: the list is assumed to be read-only
"""
raise TypeError("The list is read-only.")

def __iter__(self) -> Iterator[T]:
return self._items.values().__iter__()
Expand All @@ -432,7 +477,7 @@
def __reversed__(self) -> Iterator[T]:
return reversed(self._items.values())

def __contains__(self, item) -> bool:
def __contains__(self, item: T) -> bool:
return item in self._items

def empty(self):
Expand Down Expand Up @@ -492,6 +537,6 @@

def __contains__(self, item):
return item in self._items

def empty(self):
self._items = {}
Loading
Loading