-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that relative imports can be imported without requiring ./
in front of the import file name
#350
base: main
Are you sure you want to change the base?
Conversation
@sierra-moxon any chance of a review? should be pretty simple. |
151aa7c
to
adb8ed1
Compare
linkml_runtime/utils/schemaview.py
Outdated
@@ -303,7 +303,7 @@ def imports_closure(self, imports: bool = True, traverse: Optional[bool] = None, | |||
# - subdir/types.yaml | |||
# we should treat the two `types.yaml` as separate schemas from the POV of the | |||
# origin schema. | |||
if sn.startswith('.') and ':' not in i: | |||
if '/' in sn and ':' not in i: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want an additional check for something being an absolute path here:
if '/' in sn and ':' not in i: | |
if '/' in sn and ':' not in i and not PurePath(sn).is_absolute(): |
but otherwise this seems fine. the ':' check still protects against this breaking other protocol schemes.
only downside i can think of are that by allowing relative paths without leading ./
or ../
, they become less explicit, but shouldn't affect the way imports are normally used now. maybe a tiny chance of some weird json <-> yaml escaped slash problem, but that would be a json parsing bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you suggest a test case to add where the importer will fail without the test for an absolute path? I tried to add as many variants of path representation as possible to the tests so I'd like to have at least one path that would fail if the suggested addition was not in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure - should be any second-level schema imported via absolute path. one second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! pleasantly surprised to learn that an absolute path causes the thing it's being appended to to be ignored. learned something new :). Disregard this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking - glad I could provide a surprising learning experience for the day!
It seems that it breaks Windows but you don't see it because #360 is not yet merged. Also running your PR-branch locally on Win10 gives an error: (.venv) λ pytest -k schemaview
======================================= test session starts =======================================
platform win32 -- Python 3.10.11, pytest-8.0.1, pluggy-1.4.0
rootdir: C:\Users\dlinke\MyProg_local\gh-dalito\linkml-runtime
collected 663 items / 630 deselected / 33 selected
tests\test_issues\test_linkml_runtime_issue_1317.py .s. [ 9%]
tests\test_utils\test_schemaview.py ............F.s............... [100%]
============================================ FAILURES =============================================
______________________________________ test_imports_relative ______________________________________
def test_imports_relative():
"""Relative imports from relative imports should evaluate relative to the *importing* schema."""
sv = SchemaView(SCHEMA_RELATIVE_IMPORT_TREE)
> closure = sv.imports_closure(imports=True)
tests\test_utils\test_schemaview.py:524:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
linkml_runtime\utils\schemaview.py:281: in imports_closure
imported_schema = self.load_import(sn)
linkml_runtime\utils\schemaview.py:227: in load_import
schema = load_schema_wrap(sname + '.yaml', base_dir=base_dir)
linkml_runtime\utils\schemaview.py:84: in load_schema_wrap
schema = yaml_loader.load(path, target_class=SchemaDefinition, **kwargs)
linkml_runtime\loaders\loader_root.py:76: in load
results = self.load_any(*args, **kwargs)
linkml_runtime\loaders\yaml_loader.py:41: in load_any
data_as_dict = self.load_as_dict(source, base_dir=base_dir, metadata=metadata)
linkml_runtime\loaders\yaml_loader.py:27: in load_as_dict
data = self._read_source(source, base_dir=base_dir, metadata=metadata, accept_header="text/yaml, application/yaml;q=0.9")
linkml_runtime\loaders\loader_root.py:167: in _read_source
data = hbread(source, metadata, base_dir, accept_header)
.venv\lib\site-packages\hbreader\__init__.py:260: in hbread
with hbopen(source, open_info, base_path, accept_header, is_actual_data, read_codec) as f:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
source = './four.yaml'
open_info = FileInfo(source_file=None, source_file_date=None, source_file_size=None, base_path='C:\\Users\\dlinke\\MyProg_local\\gh-dalito\\linkml-runtime\\tests\\test_utils\\input\\imports_relative\\L0_0\\L1_0_0')
base_path = 'C:\\Users\\dlinke\\MyProg_local\\gh-dalito\\linkml-runtime\\tests\\test_utils\\input\\imports_relative\\L0_0\\L1_0_0'
accept_header = 'text/yaml, application/yaml;q=0.9'
is_actual_data = <function default_str_tester at 0x00000199BCE57760>, read_codec = None
def hbopen(source: HB_TYPE,
open_info: Optional[FileInfo] = None,
base_path: Optional[str] = None,
accept_header: Optional[str] = None,
is_actual_data: Optional[Callable[[str], bool]] = default_str_tester,
read_codec: str = None) -> TextIO:
"""
Return an open IO representation of source
:param source: anything that can be construed to be a string, a URL, a file name or an open file handle
:param open_info: what we learned about source in the process of converting it
:param base_path: Base to use if source is a relative URL or file name
:param accept_header: Accept header to use if it turns out to be a URL
:param is_actual_data: Function to differentiate plain text from URL or file name
:param read_codec: Name of codec to use if bytes being read. (URL only)
:return: TextIO representation of open file
"""
source_type = detect_type(source, base_path, is_actual_data)
if source_type is HBType.STRINGABLE:
source_as_string = str(source)
elif source_type is HBType.DECODABLE:
# TODO: Tie this into the autodetect machinery
source_as_string = source.decode()
elif source_type is HBType.STRING:
source_as_string = source
else:
source_as_string = None
# source is a URL or a file name
if source_as_string:
if open_info:
assert open_info.source_file is None, "source_file parameter not allowed if data is a file or URL"
assert open_info.source_file_date is None, "source_file_date parameter not allowed if data is a file or URL"
open_info.source_file_size = len(source_as_string)
return StringIO(source_as_string)
if source_type is HBType.URL:
url = source if '://' in source else urljoin(base_path + ('' if base_path.endswith('/') else '/'),
source, allow_fragments=True)
req = Request(quote(url, '/:'))
if accept_header:
req.add_header("Accept", accept_header)
try:
response = urlopen(req, context=ssl._create_unverified_context())
except HTTPError as e:
# This is here because the message out of urllib doesn't include the file name
e.msg = f"{e.filename}"
raise e
if open_info:
open_info.source_file = response.url
open_info.source_file_date = response.headers['Last-Modified']
if not open_info.source_file_date:
open_info.source_file_date = response.headers['Date']
open_info.source_file_size = response.headers['Content-Length']
parts = urlsplit(response.url)
open_info.base_path = urlunsplit((parts.scheme, parts.netloc, os.path.dirname(parts.path),
parts.query, None))
# Auto convert byte stream to
return _to_textio(response, response.fp.mode, read_codec)
if source_type is HBType.FILENAME:
if not base_path:
fname = os.path.abspath(source)
else:
fname = source if os.path.isabs(source) else os.path.abspath(os.path.join(base_path, source))
> f = open(fname, encoding=read_codec if read_codec else 'utf-8')
E FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\dlinke\\MyProg_local\\gh-dalito\\linkml-runtime\\tests\\test_utils\\input\\imports_relative\\L0_0\\L1_0_0\\four.yaml'
.venv\lib\site-packages\hbreader\__init__.py:206: FileNotFoundError
===================================== short test summary info =====================================
FAILED tests/test_utils/test_schemaview.py::test_imports_relative - FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\dlinke\\MyProg_local\\gh-da...
============== 1 failed, 30 passed, 2 skipped, 630 deselected, 37 warnings in 9.81s =============== |
I have asked on dev-channel if #360 can be merged. |
#360 now merged! I think it's best if one of you two handle the resulting conflict... |
30b2186
to
f5d5961
Compare
@dalito it looks like the GA tests passed; would you be able to see if they pass locally on Win10? I don't have a windows box available. |
@ialarmedalien - The tests pass now on Win10. Great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for fixing this. the ./
check always felt fragile to me. always love it when the fix is so simple because the stdlib tools are better than i knew they were!
no idea why upstream tests are failing, but doesn't look related to this. approving for the substance of the PR and hopefully that's uh just a temporary glitch
I don't have merge privs so can someone who does press the magic button (or make a mental note to press the magic button if there are other PRs that depend on this)? |
Before we merge, lets make sure we address the upstream failing tests (as an out here, we could post a note to why they are failing in this PR and we can get someone to fix separately). (been burned so many times by merging without the green checks that I am def hesitant to do that - esp on Fridays! :P ) |
the failures are truly mysterious to me, it's swapping out |
I just triggered a re-run of failed jobs. Let's see. |
The issue comes from the linkml/linkml repo, and more specifically in the compliance tests, where there are tests failing on py 3.8, 3.9, and 3.10. py3.11 added support for parsing more datetime formats to datetime.fromisoformat and the tests in question assume that this support is in place. |
f5d5961
to
a2c1082
Compare
a2c1082
to
b5cbc5a
Compare
sorry, approving the idea of adding more tests for this, once they pass |
tests/test_utils/test_schemaview.py
Outdated
@@ -544,8 +544,13 @@ def test_imports_relative(): | |||
'../../L0_1/L1_1_0/index', | |||
'../../L0_1/cousin', | |||
'../L1_0_1/dupe', | |||
'./L2_0_0_0/child', | |||
'./L2_0_0_1/child', | |||
'L2_0_0_0/child', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged impl ensures ./
prepended to relative child paths in the internal representation but doesn't require them in input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I am looking over the changes that were made because I'm suspicious that the "fix" (specifically, using urlparse
and then checking for scheme
) isn't working how people think it is, but rather as a side effect. urlparse
is extremely lax in what it accepts and will parse; it does not validate whether or not what you give it is an URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, the only thing it's used for there is to check if there is a scheme component, so passing other strings is desired behavior? but yeah if you find some bug then by all means patch it
if os.path.isabs(imported_sch): | ||
# Absolute import paths are not modified | ||
return imported_sch | ||
if urlparse(imported_sch).scheme: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urlparse
does not check for URL validity, which isn't obvious if you haven't read the docs for the function. A better test here would be for :
, which should only occur in well-formed URLs and CURIEs, but not in file system paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also -- imported_sch
starting with file://
and file:///
do not resolve and http
/ https
/ etc. will be rejected unless there's a mapping in the prefixes
section of the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz add failing tests illustrating these cases and then fix :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear to me what the desired behaviour is here, which is the main reason why I didn't do anything. I thought that the point of the prefixes
section was so that external resources could be referred to without using URLs - specify the URL there and the importer Does The Right Thing automatically. file://
(or file:///
) don't work at all due to bugs in the underlying hbreader
package. I don't know whether this is a problem throughout the codebase, but I think I would consider ditching the package and using something more standard if so.
The docs/code are a bit ambiguous as the code enforces prefixes but the docs (or at least the imports
slot range, uriorcurie
) suggest URLs are OK. The documentation page on imports only talks about CURIEs or local paths, so I don't think that URIs should be allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ETA: I've just realised it's a code comment that's leading me astray:
# origin schema. Imports can be a URI or Curie, and imports from the same
# directory don't require a ./, so if the current (sn) import is a relative
The docs are pretty clear that the field contents should be local paths or CURIEs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume the metamodel spec of xsd:anyURI is correct over the docs, since the metamodel is a formal specification? what makes you say the docs are clear(er than the metamodel?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://linkml.io/linkml/schemas/imports.html#imports:
LinkML encourages modular schema development. You can split you schema into modules, and even reuse other peoples modules on the web
A schema can have a list of imports associated with it. These are specified as CURIEs or local imports.
(emphasis added)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The section below describes how to import external schemas (e.g. from the web) by assigning them a prefix, giving the example of the linkml:types
schema.
if imported_sch.startswith(".") and not path.startswith("."): | ||
# Above condition handles cases where both source schema and imported schema are relative paths: these should remain relative | ||
return f"./{path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
It looks like this was put in to fudge test results -- that is not a good reason to keep it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #368 (comment)
they are functionally equivalent. i mildly prefer having explicit relative path annotations, or some way of telling that these are relative paths/paths at all, but yes string conventions are a weak way of doing that compared to proper typing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your (linked) comment about wanting the explicit paths -- when I first looked at this code, I was wondering about the possibility of using Path
objects but that turned out to be a non-trivial exercise to implement. More than wanting explicit paths, I would prefer everything to be treated uniformly: either keep everything as it is in the source, or convert everything to absolute paths. The previous approach altered some paths but not others, which didn't sit well with a pedant like me!
# Above condition handles cases where both source schema and imported schema are relative paths: these should remain relative | ||
return f"./{path}" | ||
|
||
return path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
summary of logic in this function:
- if the
imported_sch
path is absolute or is URI / CURIE-like, leave as-is - normalise the
imported_sch
path, assuming thatsource_sch
andimported_sch
are siblings - add
./
to the normalisedimported_sch
path if it originally started with./
(not really necessary)
==> this can be refactored and simplified
upstream_repo: sneakers-the-rat/linkml
upstream_branch: python-3.13
fixes linkml/linkml#2422
Ensures that relative imports work correctly without having
./
in front of the import file name.