Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that relative imports can be imported without requiring ./ in front of the import file name #350

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 12 additions & 21 deletions linkml_runtime/utils/schemaview.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,25 +108,6 @@ def is_absolute_path(path: str) -> bool:
drive, tail = os.path.splitdrive(norm_path)
return bool(drive and tail)

def _resolve_import(source_sch: str, imported_sch: str) -> str:
if os.path.isabs(imported_sch):
# Absolute import paths are not modified
return imported_sch
if urlparse(imported_sch).scheme:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urlparse does not check for URL validity, which isn't obvious if you haven't read the docs for the function. A better test here would be for :, which should only occur in well-formed URLs and CURIEs, but not in file system paths.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also -- imported_sch starting with file:// and file:/// do not resolve and http / https / etc. will be rejected unless there's a mapping in the prefixes section of the schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz add failing tests illustrating these cases and then fix :)

Copy link
Author

@ialarmedalien ialarmedalien Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear to me what the desired behaviour is here, which is the main reason why I didn't do anything. I thought that the point of the prefixes section was so that external resources could be referred to without using URLs - specify the URL there and the importer Does The Right Thing automatically. file:// (or file:///) don't work at all due to bugs in the underlying hbreader package. I don't know whether this is a problem throughout the codebase, but I think I would consider ditching the package and using something more standard if so.

The docs/code are a bit ambiguous as the code enforces prefixes but the docs (or at least the imports slot range, uriorcurie) suggest URLs are OK. The documentation page on imports only talks about CURIEs or local paths, so I don't think that URIs should be allowed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETA: I've just realised it's a code comment that's leading me astray:

                    # origin schema. Imports can be a URI or Curie, and imports from the same
                    # directory don't require a ./, so if the current (sn) import is a relative

The docs are pretty clear that the field contents should be local paths or CURIEs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume the metamodel spec of xsd:anyURI is correct over the docs, since the metamodel is a formal specification? what makes you say the docs are clear(er than the metamodel?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://linkml.io/linkml/schemas/imports.html#imports:

LinkML encourages modular schema development. You can split you schema into modules, and even reuse other peoples modules on the web

A schema can have a list of imports associated with it. These are specified as CURIEs or local imports.

(emphasis added)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section below describes how to import external schemas (e.g. from the web) by assigning them a prefix, giving the example of the linkml:types schema.

Copy link
Contributor

@sneakers-the-rat sneakers-the-rat Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but in general i would think that a formal specification would be the source of truth, and docs are just a description of it that can be imprecise. for example it's not really clear what a "local import" is. Since i'm a language user i interpret that as meaning "local path," but the spec has no such ambiguity. the same thing that allows the prefix notation is what would allow URIs - the range is URIOrCurie and those curie prefixes expand into URIs.

edit: asked in chat since i'm not a core maintainer or anything and don't want to lead us off base here

Copy link
Author

@ialarmedalien ialarmedalien Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking in slack.

I should emphasise that the implementation assumes that anything curie-like is treated as a prefixed entity with an entry for that prefix in the prefixes section -- i.e. if you have http://example.com/schema and ftp://my-fave-schemas.com/path/to/file in your imports, it expects there to be entries for http and ftp in the prefixes section, and throws an error if there are not.

# File with URL schemes are not modified
return imported_sch

if WINDOWS:
path = PurePath(os.path.normpath(PurePath(source_sch).parent / imported_sch)).as_posix()
else:
path = os.path.normpath(str(Path(source_sch).parent / imported_sch))

if imported_sch.startswith(".") and not path.startswith("."):
# Above condition handles cases where both source schema and imported schema are relative paths: these should remain relative
return f"./{path}"
Comment on lines -124 to -126
Copy link
Author

@ialarmedalien ialarmedalien Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

It looks like this was put in to fudge test results -- that is not a good reason to keep it in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #368 (comment)

they are functionally equivalent. i mildly prefer having explicit relative path annotations, or some way of telling that these are relative paths/paths at all, but yes string conventions are a weak way of doing that compared to proper typing

Copy link
Author

@ialarmedalien ialarmedalien Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your (linked) comment about wanting the explicit paths -- when I first looked at this code, I was wondering about the possibility of using Path objects but that turned out to be a non-trivial exercise to implement. More than wanting explicit paths, I would prefer everything to be treated uniformly: either keep everything as it is in the source, or convert everything to absolute paths. The previous approach altered some paths but not others, which didn't sit well with a pedant like me!


return path
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary of logic in this function:

  • if the imported_sch path is absolute or is URI / CURIE-like, leave as-is
  • normalise the imported_sch path, assuming that source_sch and imported_sch are siblings
  • add ./ to the normalised imported_sch path if it originally started with ./ (not really necessary)

==> this can be refactored and simplified



@dataclass
class SchemaUsage:
Expand Down Expand Up @@ -319,14 +300,24 @@ def imports_closure(self, imports: bool = True, traverse: Optional[bool] = None,
# path, and the target import doesn't have : (as in a curie or a URI)
# we prepend the relative path. This WILL make the key in the `schema_map` not
# equal to the literal text specified in the importing schema, but this is
# essential to sensible deduplication: eg. for
# essential to sensible deduplication: e.g. for
# - main.yaml (imports ./types.yaml, ./subdir/subschema.yaml)
# - types.yaml
# - subdir/subschema.yaml (imports ./types.yaml)
# - subdir/types.yaml
# we should treat the two `types.yaml` as separate schemas from the POV of the
# origin schema.
i = _resolve_import(sn, i)

# if i is not a CURIE and sn looks like a path with at least one parent folder,
# normalise i with respect to sn
if "/" in sn and ":" not in i:
if WINDOWS:
# This cannot be simplified. os.path.normpath() must be called before .as_posix()
i = PurePath(
os.path.normpath(PurePath(sn).parent / i)
).as_posix()
else:
i = os.path.normpath(str(Path(sn).parent / i))
todo.append(i)

# add item to closure
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
id: four
name: import_four
title: Import Four
description: |
Import loaded by the StepChild class.
imports:
- linkml:types
classes:
Four:
attributes:
value:
range: string
ifabsent: "Four"
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
id: one
name: import_one
title: Import One
description: |
Import loaded by the StepChild class.
imports:
- linkml:types
- two
classes:
One:
attributes:
value:
range: string
ifabsent: "One"
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
id: stepchild
name: stepchild
title: stepchild
description: |
Child class that imports files in the same directory as itself without consistently using `./` in the link notation.
imports:
- linkml:types
- one
- two
- ./three
classes:
StepChild:
attributes:
value:
range: string
ifabsent: "StepChild"
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
id: three
name: import_three
title: Import Three
description: |
Import loaded by the StepChild class.
imports:
- linkml:types
- ./four
classes:
Three:
attributes:
value:
range: string
ifabsent: "Three"
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
id: two
name: import_two
title: Import Two
description: |
Import loaded by the StepChild class.
imports:
- linkml:types
classes:
Two:
attributes:
value:
range: string
ifabsent: "Two"
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@ imports:
- ../../L0_1/cousin
- ./L2_0_0_0/child
- ./L2_0_0_1/child
- L2_0_0_2/stepchild
classes:
Main:
description: "Our intrepid main class!"
attributes:
value:
range: string
ifabsent: "Main"
ifabsent: "Main"
10 changes: 8 additions & 2 deletions tests/test_utils/test_schemaview.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

SCHEMA_NO_IMPORTS = Path(INPUT_DIR) / 'kitchen_sink_noimports.yaml'
SCHEMA_WITH_IMPORTS = Path(INPUT_DIR) / 'kitchen_sink.yaml'
SCHEMA_WITH_STRUCTURED_PATTERNS = Path(INPUT_DIR) / "pattern-example.yaml"
SCHEMA_WITH_STRUCTURED_PATTERNS = Path(INPUT_DIR) / 'pattern-example.yaml'
SCHEMA_IMPORT_TREE = Path(INPUT_DIR) / 'imports' / 'main.yaml'
SCHEMA_RELATIVE_IMPORT_TREE = Path(INPUT_DIR) / 'imports_relative' / 'L0_0' / 'L1_0_0' / 'main.yaml'
SCHEMA_RELATIVE_IMPORT_TREE2 = Path(INPUT_DIR) / 'imports_relative' / 'L0_2' / 'main.yaml'
Expand Down Expand Up @@ -357,7 +357,7 @@ def test_caching():
view.add_class(ClassDefinition('X'))
assert len(['X']) == len(view.all_classes())
view.add_class(ClassDefinition('Y'))
assert len(['X', 'Y']) == len(view.all_classes())
assert len(['X', 'Y']) == len(view.all_classes())
# bypass view method and add directly to schema;
# in general this is not recommended as the cache will
# not be updated
Expand Down Expand Up @@ -546,6 +546,11 @@ def test_imports_relative():
'../L1_0_1/dupe',
'./L2_0_0_0/child',
'./L2_0_0_1/child',
'L2_0_0_2/two',
'L2_0_0_2/one',
'L2_0_0_2/four',
'L2_0_0_2/three',
'L2_0_0_2/stepchild',
'main'
]

Expand Down Expand Up @@ -716,6 +721,7 @@ def test_slot_inheritance():
with pytest.raises(ValueError):
view.slot_ancestors('s5')


def test_attribute_inheritance():
"""
Tests attribute inheritance edge cases.
Expand Down
Loading