Update README.md

ccb-hms · Jun 5, 2024 · ae99a2a · ae99a2a
1 parent 133be13
commit ae99a2a
Showing 1 changed file with 25 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -46,7 +46,7 @@ dfd = text2term.map_terms(source_terms={"asthma":"disease", "acute bronchitis":[
   <summary><b>Examples of Programmatic Caching</b></summary>
 
 ### Examples of Programmatic Caching
-text2term supports caching an ontology for repeated use. The next example caches an ontology and gives it a name for use later on
+text2term supports caching an ontology for repeated use. Here we cache an ontology and give it a name for later use:
 ```python
 mondo = text2term.cache_ontology(ontology_url="http://purl.obolibrary.org/obo/mondo.owl", 
                                  ontology_acronym="MONDO")
@@ -107,7 +107,7 @@ python text2term.py -s test/unstruct_terms.txt -t test/mondo.owl -iris http://pu
 While MONDO uses terms from other ontologies such as CHEBI and Uberon, the tool only considers terms whose IRIs start either with "http://purl.obolibrary.org/obo/mondo" or "http://identifiers.org/hgnc".
 
 ---
-Cache an ontology for repeated use, by first running the tool as usual while instructing it to cache the ontology using `-c <name>`:
+Cache an ontology for repeated use by running the tool while instructing it to cache the ontology via `-c <name>`:
 ```shell
 python text2term -s test/unstruct_terms.txt -t http://purl.obolibrary.org/obo/mondo.owl -c MONDO
 ```
@@ -157,9 +157,14 @@ The function returns a pandas `DataFrame` containing the generated ontology mapp
    - Unmapped terms can still be included in the output if `incl_unmapped` is True
 
 `target_ontology`&mdash;Path, URL or name of 'target' ontology to map the source terms to
-: Ontology names can be given as values to `target_ontology` (eg "EFO" or "CL")--text2term uses [bioregistry](https://bioregistry.io) to get URLs for such names.
-: When using BioPortal or Zooma, this should be a comma-separated list of ontology acronyms (eg 'EFO,HPO') or **'all'** to search all ontologies.
-: When the target ontology has been cached, this should be the ontology name given when it was first cached.
+
+> [!TIP]
+> Ontology names can be given as values to `target_ontology` e.g. "EFO" or "CL"--text2term uses [bioregistry](https://bioregistry.io) to get URLs for such names.
+>
+> Similarly, when the target ontology has been cached, enter the name used upon caching.
+
+> [!NOTE]
+> When using BioPortal or Zooma, this should be a comma-separated list of ontology acronyms (eg 'EFO,HPO') or **'all'** to search all ontologies.
 
 `base_iris`&mdash;Map only to ontology terms whose IRIs start with one of the strings given in this tuple
 
@@ -171,8 +176,7 @@ The function returns a pandas `DataFrame` containing the generated ontology mapp
 
 `separator`&mdash;Character that separates columns when input is a table (eg '\t' for TSV) 
 
-`mapper`&mdash;Method used to compare source terms with ontology terms
-    : One of levenshtein, jaro, jarowinkler, jaccard, fuzzy, tfidf, zooma, bioportal
+`mapper`&mdash;Method used to compare source terms with ontology terms. One of `levenshtein, jaro, jarowinkler, jaccard, fuzzy, tfidf, zooma, bioportal` (see [Supported Mappers](#supported-mappers))
 
 `max_mappings`&mdash;Maximum number of top-ranked mappings returned per source term
 
@@ -307,18 +311,22 @@ To display a help message with descriptions of tool arguments do:
 
 The mapping score associated with each mapping is indicative of how similar an input term is to an ontology term (via its labels or synonyms). The mapping/similarity scores generated by text2term are the result of applying one of the following "mappers":
 
-TF-IDF-based mapper
-: [TF-IDF](https://en.wikipedia.org/wiki/Tf–idf), a statistical measure often used in information retrieval, measures how important a word is to a document in a corpus of documents. We first generate TF-IDF-based vectors of the source terms and of labels and synonyms of ontology terms. Then we compute the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between vectors to determine how similar a source term is to a target term (label or synonym).
+**TF-IDF-based mapper**&mdash;[TF-IDF](https://en.wikipedia.org/wiki/Tf–idf) is a statistical measure often used in information retrieval that measures how important a word is to a document in a corpus of documents. We first generate TF-IDF-based vectors of the source terms and of labels and synonyms of ontology terms. Then we compute the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between vectors to determine how similar a source term is to a target term (label or synonym).
+
+**BioPortal Web API-based mapper**&mdash;uses an interface to the [BioPortal Annotator](https://bioportal.bioontology.org/annotator) that we built to allow mapping terms to ontologies in the [BioPortal](https://bioportal.bioontology.org) repository.
+
+> [!IMPORTANT]
+> Make sure to specify the target ontology name(s) as they appear in BioPortal
 
-BioPortal Web API-based mapper 
-: uses an interface to the [BioPortal Annotator](https://bioportal.bioontology.org/annotator) that we built to allow mapping terms to ontologies in the [BioPortal](https://bioportal.bioontology.org) repository. To use it, make sure to specify the target ontology name(s) as they appear in BioPortal. 
+> [!WARNING]
+> there are no confidence scores associated with BioPortal annotations, so we decided to set the mapping score of all mappings to 1
 
-: _Note_: there are no confidence scores associated with BioPortal annotations, so we decided to set the mapping score of all mappings to 1.
+**Zooma Web API-based mapper**&mdash;uses a [Zooma](https://www.ebi.ac.uk/spot/zooma/) interface that we built to allow mapping terms to ontologies in the [Ontology Lookup Service (OLS)](https://www.ebi.ac.uk/ols4) repository. 
 
-Zooma Web API-based mapper
-: uses a [Zooma](https://www.ebi.ac.uk/spot/zooma/) interface that we built to allow mapping terms to ontologies in the [Ontology Lookup Service (OLS)](https://www.ebi.ac.uk/ols4) repository. To use it, make sure to specify the target ontology name(s) as they appear in OLS. 
+> [!IMPORTANT]
+> Make sure to specify the target ontology name(s) as they appear in OLS 
 
-Syntactic distance-based mappers
-: text2term provides support for commonly used and popular syntactic (edit) distance metrics. Specifically, we implemented support for Levenshtein, Jaro, Jaro-Winkler, Jaccard, and Indel metrics. We use the [nltk](https://pypi.org/project/nltk/) package to compute Jaccard distances, and [rapidfuzz](https://pypi.org/project/rapidfuzz/) for all others.  
+**Syntactic distance-based mappers**&mdash;text2term provides support for commonly used and popular syntactic (edit) distance metrics: Levenshtein, Jaro, Jaro-Winkler, Jaccard, and Indel. We use the [nltk](https://pypi.org/project/nltk/) package to compute Jaccard distances and [rapidfuzz](https://pypi.org/project/rapidfuzz/) to compute all others.  
 
-_Note_: syntactic distance-based mappers and Web API-based mappers perform slowly (much slower than the TF-IDF mapper). The former because they do pairwise comparisons between each input string and each ontology term label/synonym. In the Web API-based approaches there are networking and API load overheads.
+> [!NOTE]
+> Syntactic distance-based mappers and Web API-based mappers perform slowly (much slower than the TF-IDF mapper). The former because they do pairwise comparisons between each input string and each ontology term label/synonym. In the Web API-based approaches there are networking and API load overheads.