Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataclass for reference datatype and improve validation #45 #59

Merged
merged 5 commits into from
Feb 11, 2025

Conversation

jacobtylerwalls
Copy link
Member

@jacobtylerwalls jacobtylerwalls commented Feb 5, 2025

As #45 mentions, we didn't have validation against unexpected keys in the reference tile data, so that's how we ended up with list_id and listid.

We could roll our own solution for this, but I'm suggesting that one approach would be to just use python dataclasses for this so that we have one central place to define the expected keys. This made the validate() method much easier to rewrite. We could also make use of the frozen feature if we want immutable python objects later.

There's a bit of fiddliness around transforming error messages into something that can be localized, but if we like this pattern we can probably factor that out into something reusable with other datatypes.

Intended to be tested with the partner lingo PR: archesproject/arches-lingo#208

Closes #45



@dataclass(kw_only=True)
class Reference:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of using this in a shell:

In [10]: label_types = TileModel.as_nodegroup("appellative_status", graph_slug="concept")

In [14]: obj = label_types.filter(appellative_status_ascribed_relation__isnull=False).last()

In [15]: obj.appellative_status_ascribed_name_content
Out[15]: 'Intentional Sampling'

In [17]: from pprint import pprint

In [18]: pprint(obj.appellative_status_ascribed_relation)
[Reference(uri='https://rdm.dev.fargeo.com/plugins/controlled-list-manager/item/2e3a2045-b44e-47fc-a27b-3078b17e08e4',
           labels=[ReferenceLabel(id='6ac8e471-476e-4fd0-b276-86e01a17bcc8',
                                  value='prefLabel',
                                  language_id='en',
                                  list_item_id='2e3a2045-b44e-47fc-a27b-3078b17e08e4',
                                  valuetype_id='prefLabel')])]

Copy link
Member Author

@jacobtylerwalls jacobtylerwalls Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this breaks calling save(), so I need to either allow that or just unwind this pattern and go back to plain dicts. EDIT: done

@jacobtylerwalls jacobtylerwalls marked this pull request as draft February 5, 2025 23:21
@jacobtylerwalls
Copy link
Member Author

jacobtylerwalls commented Feb 5, 2025

Setting back to draft, would like to:

  • also roll in a fix for validating against node config multivalue
  • unbreak calling save(), see above

edit: done

@jacobtylerwalls
Copy link
Member Author

Obviously the dataclass stuff is open for discussion, and I'm eager to hear ideas.

It might be a nice place to hang additional methods, e.g. refreshLabels(), which for the sake of argument might query the db to replace the information in the labels array. Without a dataclass, we'd do something like:

dt_instance = DataTypeFactory().get_instance("reference")
# Is it safe to call refreshLabels? I don't know, let's run some validation and find out.
try:
    dt_instance.transform_value_for_tile(tile_val ... (?)
    dt_instance.validate(tile_val, ...) (?)
except:
    pass (?)
# Okay, it should have the "labels" key by now.
dt_instance.refresh_labels(tile_val)

versus:

tile_val.refresh_labels()  # the serialization layer gave you a dataclass ready to call methods

Of course only meatier datatypes like this one would even need something like this.

@jacobtylerwalls jacobtylerwalls marked this pull request as ready for review February 6, 2025 17:01
Copy link
Contributor

@chrabyrd chrabyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good

parsed = self.to_python(value)
self.validate_pref_labels(parsed)
self.validate_multivalue(parsed, node, nodeid)
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should exceptions be caught/transformed at this level? Would it be a better pattern to do this in at the callsite?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better or worse I think this is the pattern used across datatypes in Arches. If something goes wrong, validate returns a list of errors. That list gets passed on to importers so that they can report everything that's wrong with whatever data a user is trying to load.

@jacobtylerwalls jacobtylerwalls merged commit db55989 into main Feb 11, 2025
5 checks passed
@jacobtylerwalls jacobtylerwalls deleted the jtw/improve-refdt branch February 11, 2025 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reference datatype doesn't pop extra keys
4 participants