Draft
Conversation
- added defs to `ResourceRoleEnum` values
- added an `ingest_source` permissible value - to help capture which source the data was actually ingested from
- made the `RetrievalSoruce.resoruce_role` slot multivalued - to allow indicating that a particular primary or aggregator source was also the 'ingest_source'
- also implemented an alternative pattern to capture this info - that defines a separate slot to capture the ingest_source - "ingest_source: boolean"
- this pattern may make it easier to parse out this important data, and allow us to make capturing this info required if we want
- if we decide to capture this type of metadata, chose one of the two implemented patterns
- added an ingest_files slot to RetrievalSource, to capture files(s from which the data used to create the edge were retrieved. This provides more complete provenance, and supports various downstream activities:
- manual QA efforts (help reviewer organize edge types by file source)
- internal developer debugging
- identifying edges that may need to be updated/reviewed if a source updates its data/files
- more precise provenance for end users to understand where the edge came from
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Exploring some modeling that would support capturing a couple additional retrieval source provenance details on a per edge basis. To discuss on upcoming MUTT/DINGO call:
added an
ingest_sourcepermissible value - to help capture which source the data was actually ingested from (and made theRetrievalSoruce.resoruce_roleslot multivalued - to allow indicating that a particular primary or aggregator source was also the 'ingest_source')also tested an alternative pattern to capture this info - that defines a separate slot to capture the ingest_source -
ingest_source: booleanadded an
ingest_filesslot toRetrievalSource- for use in theRetrievalSourceobject for the ingest_source, to report files(s) from which the data used to create the edge were retrieved. This provides more complete provenance, and supports various downstream activities:. . . If not at the edge level in the data, perhaps making it standard to put this info in the RIG for each 'EdgeType' object - as proposed in the RIG schema PR here.
ResourceRoleEnumvalues - which I think we should keep even if we don't adopt the other proposals above