Skip to content

Commit 168936d

Browse files
authoredAug 19, 2024··
Doc fixes & more metadata in pyproject for PyPI (#147)
* Add more pkg metadata including link to repo & docs * Add link to repo & other minor doc fixes * Fix properties in pyproject for poetry * Fixes in docs to reduce sphinx warnings The only remaining warning are related to metamodels docs. * Add utilities package to index of docs
1 parent 7405c9c commit 168936d

17 files changed

+137
-136
lines changed
 

‎CONTRIBUTING.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ For running tests, we use `pytest`.
2929
## Discussion
3030

3131
If you run into any issues or find certain functionality not documented/explained properly then feel free to
32-
raise a ticket in the project's [issue tracker](https://github.com/linkml/issues).
32+
raise a ticket in the project's [issue tracker](https://github.com/linkml/schema-automator/issues).
3333
There are issue templates to capture certain types of issues.
3434

3535
## First Time Contributors
@@ -73,7 +73,7 @@ with a 'Do Not Merge' label.
7373
## How to Report a Bug
7474

7575
We recommend making a new ticket for each bug that you encounter while working with KGX. Please be sure to provide
76-
sufficient context for a bug you are reporting. There are [Issue Templates](https://github.com/linkml/issues/new/choose)
76+
sufficient context for a bug you are reporting. There are [Issue Templates](https://github.com/linkml/schema-automator/issues/new/choose)
7777
that you can use as a starting point.
7878

7979
## How to Request an Enhancement

‎docs/_static/.gitkeep

Whitespace-only changes.

‎docs/cli.rst

+7-7
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
1-
.. cli:
1+
.. _cli:
22

3-
Command Line
4-
============
3+
Command Line Interface
4+
======================
55

6-
All Schema Automator functionality is available via the ``schemauto`` command
6+
All Schema Automator functionality is available via the ``schemauto`` command.
77

88
Preamble
99
--------
1010

11-
.. warning ::
11+
.. warning::
1212

1313
Previous versions had specific commands like ``tsv2linkml`` these are now deprecated.
1414
Instead these are now *subcommands* of the main ``schemauto`` command, and have been renamed.
1515

16-
.. note ::
16+
.. note::
1717

1818
we follow the `CLIG <https://clig.dev/>`_ guidelines as far as possible
1919

2020
Main commands
21-
---------
21+
-------------
2222

2323
.. currentmodule:: schema_automator.cli
2424

‎docs/index.rst

+4-3
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
LinkML Schema Automator
2-
============================================
2+
=======================
33

44
Schema Automator is a toolkit for bootstrapping and automatically enhancing schemas from a variety of sources.
55

6+
The project is open source (BSD 3-clause license) and hosted on `GitHub <https://github.com/linkml/schema-automator>`_.
7+
68
Use cases include:
79

810
1. Inferring an initial schema or data dictionary from a dataset that is a collection of TSVs
911
2. Automatically annotating schema elements and enumerations using the BioPortal annotator
10-
3. Importing from a language like RDFS/OWL
12+
3. Importing from a language like RDFS/OWL/SQL
1113

1214
The primary output of Schema Automator is a `LinkML Schema <https://linkml.io/linkml>`_. This can be converted to other
1315
schema frameworks, including:
@@ -23,7 +25,6 @@ schema frameworks, including:
2325
:maxdepth: 3
2426
:caption: Contents:
2527

26-
index
2728
introduction
2829
install
2930
cli

‎docs/install.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Installation
2-
======
2+
============
33

44
Direct Installation
5-
------------
5+
-------------------
66

77
``schema-automator`` and its components require Python 3.9 or greater.
88

@@ -17,7 +17,7 @@ To check this works:
1717
schemauto --help
1818
1919
Running via Docker
20-
------------
20+
------------------
2121

2222
You can use the `Schema Automator Docker Container <https://hub.docker.com/r/linkml/schema-automator>`_
2323

‎docs/introduction.rst

+6-8
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,13 @@
1-
.. _introduction:
2-
31
Introduction
4-
=======================
2+
============
53

64
This is a toolkit that assists with generating and enhancing schemas and data models from a variety
75
of sources.
86

97
The primary end target is a `LinkML <https://linkml.io>`_ schema, but the framework can be used
108
to generate JSON-Schema, SHACL, SQL DDL etc via the `LinkML Generator <https://linkml.io/linkml/generators>`_ framework.
119

12-
All functionality is available via a :ref:`cli`. In future there will be a web-based interface.
10+
All functionality is available via a :ref:`CLI <cli>`. In future there will be a web-based interface.
1311
The functionality is also available by using the relevant Python :ref:`packages`.
1412

1513
Generalization from Instance Data
@@ -24,7 +22,7 @@ Generalizers allow you to *bootstrap* a schema by generalizing from existing dat
2422
* RDF instance graphs
2523

2624
Importing from alternative modeling frameworks
27-
---------------------------------
25+
----------------------------------------------
2826

2927
See :ref:`importers`
3028

@@ -35,7 +33,7 @@ See :ref:`importers`
3533
In future other frameworks will be supported
3634

3735
Annotating schemas
38-
---------------------------------
36+
------------------
3937

4038
See :ref:`annotators`
4139

@@ -46,7 +44,7 @@ Annotators to provide ways to automatically add metadata to your schema, includi
4644
* Annotate using Large Language Models (LLMs)
4745

4846
General Utilities
49-
---------------------------------
47+
-----------------
5048

51-
See :ref:`utilitiess`
49+
See :ref:`utilities`
5250

‎docs/metamodels/index.rst

-10
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,6 @@ metamodels in order to define transformations.
88
:maxdepth: 3
99
:caption: Contents:
1010

11-
index
1211
cadsr/index
1312
frictionless/index
1413
dosdp/index
15-
fhir/index
16-
17-
18-
Indices and tables
19-
==================
20-
21-
* :ref:`genindex`
22-
* :ref:`modindex`
23-
* :ref:`search`

‎docs/packages/annotators.rst

+1-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
1-
.. annotators:
2-
31
Annotators
4-
=========
2+
==========
53

64
Importers take an existing schema and *annotate* it with information
75

‎docs/packages/generalizers.rst

+64-65
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
1-
.. generalizers:
2-
31
Generalizers
4-
=========
2+
============
53

64
Generalizers take example data and *generalizes* to a schema
75

@@ -11,7 +9,7 @@ Generalizers take example data and *generalizes* to a schema
119
that *semi*-automates the creation of a new schema for you.
1210

1311
Generalizing from a single TSV
14-
-----------------
12+
------------------------------
1513

1614
.. code-block::
1715
@@ -90,65 +88,9 @@ Enums will be automatically inferred:
9088
Lowland Black Spruce:
9189
description: Lowland Black Spruce
9290
93-
Chaining an annotator
94-
-----------------
95-
96-
If you provide an ``--annotator`` option you can auto-annotate enums:
97-
98-
.. code-block::
99-
100-
schemauto generalize-csv \
101-
--annotator bioportal:envo \
102-
tests/resources/NWT_wildfires_biophysical_2016.tsv \
103-
-o wildfire.yaml
104-
105-
.. code-block:: yaml
106-
107-
ecosystem_enum:
108-
from_schema: https://w3id.org/MySchema
109-
permissible_values:
110-
Open Fen:
111-
description: Open Fen
112-
meaning: ENVO:00000232
113-
exact_mappings:
114-
- ENVO:00000232
115-
Treed Fen:
116-
description: Treed Fen
117-
meaning: ENVO:00000232
118-
exact_mappings:
119-
- ENVO:00000232
120-
Black Spruce:
121-
description: Black Spruce
122-
Poor Fen:
123-
description: Poor Fen
124-
meaning: ENVO:00000232
125-
exact_mappings:
126-
- ENVO:00000232
127-
Fen:
128-
description: Fen
129-
meaning: ENVO:00000232
130-
Lowland:
131-
description: Lowland
132-
Upland:
133-
description: Upland
134-
meaning: ENVO:00000182
135-
Bog:
136-
description: Bog
137-
meaning: ENVO:01000534
138-
exact_mappings:
139-
- ENVO:01000535
140-
- ENVO:00000044
141-
- ENVO:01001209
142-
- ENVO:01000527
143-
Lowland Black Spruce:
144-
description: Lowland Black Spruce
145-
146-
The annotation can also be run as a separate step
147-
148-
See :ref:`annotators`
14991
15092
Generalizing from multiple TSVs
151-
------------
93+
-------------------------------
15294

15395
You can use the ``generalize-tsvs`` command to generalize from *multiple* TSVs, with
15496
foreign key linkages auto-inferred.
@@ -217,7 +159,7 @@ slots:
217159
range: string
218160

219161
Generalizing from tables on the web
220-
-----------------
162+
-----------------------------------
221163

222164
You can use ``generalize-htmltable``
223165

@@ -274,12 +216,69 @@ Will generate:
274216
- TWAS P value
275217
276218
Generalizing from JSON
277-
-----------
219+
----------------------
278220

221+
tbw
279222

223+
Chaining an annotator
224+
---------------------
225+
226+
If you provide an ``--annotator`` option you can auto-annotate enums:
227+
228+
.. code-block::
229+
230+
schemauto generalize-csv \
231+
--annotator bioportal:envo \
232+
tests/resources/NWT_wildfires_biophysical_2016.tsv \
233+
-o wildfire.yaml
234+
235+
.. code-block:: yaml
236+
237+
ecosystem_enum:
238+
from_schema: https://w3id.org/MySchema
239+
permissible_values:
240+
Open Fen:
241+
description: Open Fen
242+
meaning: ENVO:00000232
243+
exact_mappings:
244+
- ENVO:00000232
245+
Treed Fen:
246+
description: Treed Fen
247+
meaning: ENVO:00000232
248+
exact_mappings:
249+
- ENVO:00000232
250+
Black Spruce:
251+
description: Black Spruce
252+
Poor Fen:
253+
description: Poor Fen
254+
meaning: ENVO:00000232
255+
exact_mappings:
256+
- ENVO:00000232
257+
Fen:
258+
description: Fen
259+
meaning: ENVO:00000232
260+
Lowland:
261+
description: Lowland
262+
Upland:
263+
description: Upland
264+
meaning: ENVO:00000182
265+
Bog:
266+
description: Bog
267+
meaning: ENVO:01000534
268+
exact_mappings:
269+
- ENVO:01000535
270+
- ENVO:00000044
271+
- ENVO:01001209
272+
- ENVO:01000527
273+
Lowland Black Spruce:
274+
description: Lowland Black Spruce
275+
276+
The annotation can also be run as a separate step
277+
278+
See :ref:`annotators`
280279

281-
Packages
282-
--------
280+
Packages for generalizing
281+
-------------------------
283282

284283
.. currentmodule:: schema_automator.generalizers
285284

‎docs/packages/importers.rst

+7-9
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
.. importers:
2-
31
Importers
42
=========
53

@@ -15,7 +13,7 @@ Importers are the opposite of `Generators <https://linkml.io/linkml/generators/i
1513
will be created.
1614

1715
Importing from JSON-Schema
18-
---------
16+
--------------------------
1917

2018
The ``import-json-schema`` command can be used:
2119

@@ -24,7 +22,7 @@ The ``import-json-schema`` command can be used:
2422
schemauto import-json-schema tests/resources/model_card.schema.json
2523
2624
Importing from Kwalify
27-
---------
25+
----------------------
2826

2927
The ``import-kwalify`` command can be used:
3028

@@ -33,7 +31,7 @@ The ``import-kwalify`` command can be used:
3331
schemauto import-kwalify tests/resources/test.kwalify.yaml
3432
3533
Importing from OWL
36-
---------
34+
------------------
3735

3836
You can import from a schema-style OWL ontology. This must be in functional syntax
3937

@@ -45,7 +43,7 @@ Use robot to convert ahead of time:
4543
schemauto import-owl schemaorg.ofn
4644
4745
Importing from SQL
48-
---------
46+
------------------
4947

5048
You can import a schema from a SQL database
5149

@@ -65,7 +63,7 @@ For example, for the `RNA Central public database <https://rnacentral.org/help/p
6563
schemauto import-sql postgresql+psycopg2://reader:NWDMCE5xdipIjRrp@hh-pgsql-public.ebi.ac.uk:5432/pfmegrnargs
6664
6765
Importing from caDSR
68-
---------
66+
--------------------
6967

7068
caDSR is an ISO-11179 compliant metadata registry. The ISO-11179 conceptual model can be mapped to LinkML. The
7169
canonical mapping maps a CDE onto a LinkML *slot*.
@@ -79,8 +77,8 @@ NCI implements a JSON serialization of ISO-11197. You can import this JSON and c
7977
schemauto import-cadsr "cdes/*.json"
8078
8179
82-
Packages
83-
-------
80+
Packages for importing
81+
----------------------
8482

8583
.. currentmodule:: schema_automator.importers
8684

‎docs/packages/index.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
.. packages:
2-
31
Packages
42
========
53

@@ -12,3 +10,4 @@ The code is organized into different python *packages*
1210
importers
1311
generalizers
1412
annotators
13+
utilities

‎docs/packages/utilities.rst

-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
.. utilities:
2-
31
Utilities
42
=========
53

‎pyproject.toml

+20
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,26 @@ authors = ["Chris Mungall", "Mark Miller", "Sierra Moxon", "Harshad Hegde"]
66
license = "BSD 3-Clause"
77
readme = "README.md"
88

9+
keywords = ["schema", "linked data", "data modeling", "rdf", "owl"]
10+
11+
classifiers = [
12+
"Development Status :: 5 - Production/Stable",
13+
"Programming Language :: Python",
14+
"Environment :: Console",
15+
"Intended Audience :: Developers",
16+
"Intended Audience :: Science/Research",
17+
"Intended Audience :: Healthcare Industry",
18+
"License :: OSI Approved :: BSD License",
19+
"Operating System :: OS Independent",
20+
"Programming Language :: Python :: 3.9",
21+
"Programming Language :: Python :: 3.10",
22+
"Programming Language :: Python :: 3.11",
23+
"Programming Language :: Python :: 3.12",
24+
]
25+
26+
repository = "https://github.com/linkml/schema-automator/"
27+
documentation = "https://linkml.io/schema-automator/"
28+
929
packages = [
1030
{ include = "schema_automator" }
1131
]

‎schema_automator/cli.py

+17-17
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ def generalize_tsv(tsvfile, output, class_name, schema_name, pandera: bool, anno
129129
130130
Example:
131131
132-
schemauto generalize-tsv --class-name Person --schema-name PersonInfo my/data/persons.tsv
132+
``schemauto generalize-tsv --class-name Person --schema-name PersonInfo my/data/persons.tsv``
133133
"""
134134
kwargs = {k:v for k, v in kwargs.items() if v is not None}
135135
if pandera:
@@ -161,11 +161,11 @@ def generalize_tsvs(tsvfiles, output, schema_name, **kwargs):
161161
162162
See :ref:`generalizers` for more on the generalization framework
163163
164-
This uses :ref:`CsvDataGeneralizer.convert_multiple`
164+
This uses CsvDataGeneralizer.convert_multiple
165165
166166
Example:
167167
168-
schemauto generalize-tsvs --class-name Person --schema-name PersonInfo my/data/*.tsv
168+
``schemauto generalize-tsvs --class-name Person --schema-name PersonInfo my/data/*.tsv``
169169
"""
170170
ie = CsvDataGeneralizer(**kwargs)
171171
schema = ie.convert_multiple(tsvfiles, schema_name=schema_name)
@@ -229,7 +229,7 @@ def import_dosdps(dpfiles, output, **args):
229229
230230
Example:
231231
232-
schemauto import-dosdps --range-as-enums patterns/*yaml -o my-schema.yaml
232+
``schemauto import-dosdps --range-as-enums patterns/*.yaml -o my-schema.yaml``
233233
"""
234234
ie = DOSDPImportEngine()
235235
schema = ie.convert(dpfiles, **args)
@@ -309,7 +309,7 @@ def generalize_json(input, output, schema_name, depluralize: bool, format, omit_
309309
310310
Example:
311311
312-
schemauto generalize-json my/data/persons.json -o my.yaml
312+
``schemauto generalize-json my/data/persons.json -o my.yaml``
313313
"""
314314
ie = JsonDataGeneralizer(omit_null=omit_null, depluralize_class_names=depluralize)
315315
if inlined_map:
@@ -336,7 +336,7 @@ def generalize_toml(input, output, schema_name, omit_null, **kwargs):
336336
337337
Example:
338338
339-
schemauto generalize-toml my/data/conf.toml -o my.yaml
339+
``schemauto generalize-toml my/data/conf.toml -o my.yaml``
340340
"""
341341
ie = JsonDataGeneralizer(omit_null=omit_null)
342342
schema = ie.convert(input, format='toml', **kwargs)
@@ -365,7 +365,7 @@ def import_json_schema(input, output, import_project: bool, schema_name, format,
365365
366366
Example:
367367
368-
schemauto import-json-schema my/schema/personinfo.schema.json
368+
``schemauto import-json-schema my/schema/personinfo.schema.json``
369369
"""
370370
ie = JsonSchemaImportEngine(**kwargs)
371371
if not import_project:
@@ -390,7 +390,7 @@ def import_kwalify(input, output, schema_name, **kwargs):
390390
391391
Example:
392392
393-
schemauto import-kwalify my/schema/personinfo.kwalify.yaml
393+
``schemauto import-kwalify my/schema/personinfo.kwalify.yaml``
394394
"""
395395
ie = KwalifyImportEngine(**kwargs)
396396
schema = ie.convert(input, output, name=schema_name, format=format)
@@ -409,7 +409,7 @@ def import_frictionless(input, output, schema_name, schema_id, **kwargs):
409409
410410
Example:
411411
412-
schemauto import-frictionless cfde.package.json
412+
``schemauto import-frictionless cfde.package.json``
413413
"""
414414
ie = FrictionlessImportEngine(**kwargs)
415415
schema = ie.convert(input, name=schema_name, id=schema_id)
@@ -429,7 +429,7 @@ def import_cadsr(input, output, schema_name, schema_id, **kwargs):
429429
430430
Example:
431431
432-
schemauto import-cadsr "cdes/*.json"
432+
``schemauto import-cadsr "cdes/*.json"``
433433
"""
434434
ie = CADSRImportEngine()
435435
paths = [str(gf.absolute()) for gf in Path().glob(input) if gf.is_file()]
@@ -460,7 +460,7 @@ def import_owl(owlfile, output, **args):
460460
461461
Example:
462462
463-
schemauto import-owl prov.ofn -o my.yaml
463+
``schemauto import-owl prov.ofn -o my.yaml``
464464
"""
465465
sie = OwlImportEngine()
466466
schema = sie.convert(owlfile, **args)
@@ -509,7 +509,7 @@ def generalize_rdf(rdffile, dir, output, **args):
509509
510510
Example:
511511
512-
schemauto generalize-rdf my/data/persons.ttl
512+
``schemauto generalize-rdf my/data/persons.ttl``
513513
"""
514514
sie = RdfDataGeneralizer()
515515
if not os.path.exists(dir):
@@ -539,13 +539,13 @@ def annotate_schema(schema: str, input: str, output: str, **kwargs):
539539
540540
Example:
541541
542-
schemauto annotate-schema -i bioportal: my-schema.yaml -o annotated.yaml
542+
``schemauto annotate-schema -i bioportal: my-schema.yaml -o annotated.yaml``
543543
544544
This will require you setting the API key via OAK - see OAK docs.
545545
546546
You can specify a specific ontology
547547
548-
schemauto annotate-schema -i bioportal:ncbitaxon my-schema.yaml -o annotated.yaml
548+
``schemauto annotate-schema -i bioportal:ncbitaxon my-schema.yaml -o annotated.yaml``
549549
550550
In future OAK will support a much wider variety of annotators including:
551551
@@ -594,13 +594,13 @@ def enrich_using_ontology(schema: str, input: str, output: str, annotate: bool,
594594
595595
Example:
596596
597-
schemauto enrich-using-ontology -i bioportal: my-schema.yaml -o my-enriched.yaml
597+
``schemauto enrich-using-ontology -i bioportal: my-schema.yaml -o my-enriched.yaml``
598598
599599
If your schema has no mappings you can use --annotate to add them
600600
601601
Example:
602602
603-
schemauto enrich-using-ontology -i so.obo --annotate my-schema.yaml -o my-enriched.yaml --annotate
603+
``schemauto enrich-using-ontology -i so.obo --annotate my-schema.yaml -o my-enriched.yaml --annotate``
604604
"""
605605
impl = get_implementation_from_shorthand(input)
606606
annr = SchemaAnnotator(impl)
@@ -630,7 +630,7 @@ def enrich_using_llm(schema: str, model: str, output: str, **args):
630630
631631
Example:
632632
633-
pip install schema-automator[llm]
633+
``pip install schema-automator[llm]``
634634
635635
"""
636636
logging.info(f"Enriching: {schema}")

‎schema_automator/generalizers/csv_data_generalizer.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -118,9 +118,9 @@ def infer_linkages(self, files: List[str], **kwargs) -> List[ForeignKey]:
118118
119119
This procedure can generate false positives, so additional heuristics are applied. Each potential
120120
foreign key relationship gets an ad-hoc score:
121-
- links across tables score more highly than within
122-
- suffixes such as _id are more likely on PK and FK tables
123-
- the foreign key column table is likely to start with the base column name
121+
- links across tables score more highly than within
122+
- suffixes such as _id are more likely on PK and FK tables
123+
- the foreign key column table is likely to start with the base column name
124124
In addition, if there are competing primary keys for a table, the top scoring one is selected
125125
"""
126126
fks: List[ForeignKey] = []

‎schema_automator/importers/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
from schema_automator.importers.owl_import_engine import OwlImportEngine
33
from schema_automator.importers.dosdp_import_engine import DOSDPImportEngine
44
from schema_automator.importers.frictionless_import_engine import FrictionlessImportEngine
5-
5+
from schema_automator.importers.cadsr_import_engine import CADSRImportEngine

‎schema_automator/importers/jsonschema_import_engine.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ def json_schema_from_open_api(oa: Dict) -> Dict:
3535
@dataclass
3636
class JsonSchemaImportEngine(ImportEngine):
3737
"""
38-
A :ref:`ImportEngine` that imports a JSON-Schema representation to a LinkML Schema
38+
An ImportEngine that imports a JSON-Schema representation to a LinkML Schema
3939
"""
4040
use_attributes: bool = False
4141
is_openapi: bool = False

0 commit comments

Comments
 (0)
Please sign in to comment.