|
| 1 | +# The Terms WOT manage structure explained |
| 2 | + |
| 3 | +To generate our static content site on the Github project page of this [WOT-terms page](https://weboftrust.github.io/WOT-terms/) repo, we have: |
| 4 | + |
| 5 | +1. wiki resources, terms in separate `.md` files in a Docusaurus directory structure |
| 6 | +2. a sheet called _Terms WOT manage_ (`.xls`) a central location with strict editing rights. |
| 7 | +3. a "comma separated" file _Terms WOT manage_ (`.csv`) exported from Excel in any directory locally. In fact it's a semi-colon-separated text file that you might get; so check the result. |
| 8 | + |
| 9 | +<img src="./images/Terms-wot-manage-screen-example.png" alt="Terms-wot-manage-screen-example" width="800" /> |
| 10 | +<img src="./images/csv-utf-8-save-as.png" alt="csv-utf-8-save-as" width="600" /> |
| 11 | + |
| 12 | +4. A comma separated text file _Terms WOT manage_ (`.txt`) in the root directory of the `gh-pages` branch of `WOT-terms` repo. Use a plain text editor. |
| 13 | + |
| 14 | +<img src="./images/rename-csv-to-txt1.png" alt="Rename-csv-to-txt step1" width="400" /> |
| 15 | +<img src="./images/rename-csv-to-txt2.png" alt="Rename-csv-to-txt step2" width="400" /> |
| 16 | + |
| 17 | +#### Why not straigth export from Excel to a semi-colon separated text file. |
| 18 | + |
| 19 | +In Excel the _"save as"_ option does not provide a text export with semi-colons.\ |
| 20 | +The third and fourth step have proven necessary to generate a semi-colon-separated text file. |
| 21 | + |
| 22 | +> Note by Henk van Cann |
| 23 | +> Beware of non-ascii characters still present on the first line of the file coming from Excel. |
| 24 | +> I trimmed them out with the `tr` command in my bash-tools to handle the sheet. |
| 25 | +
|
| 26 | +<code> |
| 27 | +cat ${SOURCE} | tr -cd '\11\12\40-\176' > "${INPUT}" # want to get rid of non-printable character Excel leaves in the text export |
| 28 | +</code> |
| 29 | + |
| 30 | +## Why a sheet and why is it called Terms WOT manage? |
| 31 | + |
| 32 | +We need a place where terms are defined and declared. A sheet of terms is very practical: |
| 33 | + |
| 34 | +- lots of software available to amend and manage sheets |
| 35 | +- many people have the skills to manage sheets with this type of software (Excel, Google sheets, Numbers, etc.) |
| 36 | +- a sheet can enforce a notion of a unique value ('Key') in a column, a meaningless long-living identifier |
| 37 | +- sheets can be flexibly expanded so that the content gets richer: tags, categories, dictionaries, etc |
| 38 | +- sheets can be exported to comma separated files (CSVs) |
| 39 | +- resulting CSVs can be imported into markdown, front matter, yaml files, etc. |
| 40 | + |
| 41 | +The reason it's called '_Terms WOT manage_': |
| 42 | + |
| 43 | +1. it's the management tool of our unique terms identifier |
| 44 | +2. it covers all the concepts, terms, categories, dictionaries, tags in the WebofTrust (WOT) field. |
| 45 | +3. The smallest unit of declaration is a 'Term'. |
| 46 | + |
| 47 | +## Term life cycle |
| 48 | + |
| 49 | +A term is a bitch. A term might have one or more abbreviations that are assimilated already (e.g. 'PTEL' and 'Public TEL' for public-transaction-event-log). It can be lower case, upper case, mixed case (e.g. 'vLEI'). it can be singular and plural (e.g. 'OOBIs' is more used than 'OOBI'). So, to have a term as identifying `Key` to itself (self-referential) is a pain as long as the process hasn't completed; the term hasn't _hardened_ yet. This process is very different for every single term. For example `icp: tag` doesn't even have a proper term name yet (mid 2022). |
| 50 | + |
| 51 | +### Lifecycle phases |
| 52 | + |
| 53 | +- Start: We need a 'Key' field to identify a most probably changing term and prevent the database from getting polluted with double terms. |
| 54 | +- Midlife: We need a 'Key' field to uniquely identify a term that might have various names used in the same (!) context. |
| 55 | +- End: A term is well-known, agreed upon by the community, therefore assimilated and unique, and we don't need 'Key' anymore. |
| 56 | + |
| 57 | +### Three simple rules (for now, feel free to comment!) |
| 58 | + |
| 59 | +1. We prefer **a singular expression** over plural expression. So for example the term is OOBI not OOBIs, unless it's grammatically incorrect (can't think of anything now). |
| 60 | +2. We use **lower case** as much as we can in the long (identifying) terms. The abbreviated term are linkers: they will get a `## See` header and a link to the long more meaningful term in itself. |
| 61 | +3. **Longer compound terms take precedence** over the parts. `public TEL` or `public-transaction-event-log` is apparently worth explaining as a special form of `TEL` or `transaction-event-log`. So we first look for a hit on the longer term while parsing texts. |
| 62 | + |
| 63 | +## Conventions and comparison to database |
| 64 | + |
| 65 | +Columns are comparable to _fields_ in a database. The rows are the _records_ in a database. |
| 66 | + |
| 67 | +Field _names_ are in the first row. A few columns maintain our database-like structure: |
| 68 | + |
| 69 | +- Key: a unique incremental meaningless numeric identifier. The uniqueness is not enforced by code, but by userinterface: conditional formatting colours the cells with the same value red. The Key field becomes redundant as soon as the term itself is a well-known meaningful Key and Term at the same time, like a [country code](https://www.countrycode.org). |
| 70 | +- TTTTT_FKey: this columns contains foreigns key into another table or sheet. TTTTT can be a file that has terms mentioned in a video ("PhilVid") or another Glossary ("eSSIF-lab") that are related to the matching term on a specific row in our sheet. |
| 71 | +- Cat_CCCCC: this columns contains Categories. We consider a term from a certain category went it's mentioned regularly in the content of certain repository (e.g. 'KERI' or 'OOBI') of the WebofTrust Github site. |
| 72 | + |
| 73 | +One term per row. We give **an extra row to the abbreviation** of a term. The reason fot this is a ToIP convention: |
| 74 | + |
| 75 | +- the term is lowercase and has '-' between the words of the term, e.g. 'key-event-log' |
| 76 | +- the abbreviation is uppercase and can have a hyphen, e.g. 'VC-TEL' |
| 77 | +- the term always has a corresponding .md file in the ToIP Glossary, its .md file has a '## Definition' header |
| 78 | +- the abbreviation (if relevant, which is a subjective guess by the team) also has a corresponding .md file in the ToIP Glossary, its .md file ONLY has a **'## See' header**. The 'See' contains a link to the matching term. |
| 79 | + |
| 80 | +KEL.md: |
| 81 | + |
| 82 | +``` |
| 83 | +## See |
| 84 | +[Key event log](key-event-log) |
| 85 | +``` |
| 86 | + |
| 87 | +key-event-log.md: |
| 88 | + |
| 89 | +``` |
| 90 | +## Definition |
| 91 | +A verifiable data structure that is a backward and forward chained ... |
| 92 | +``` |
| 93 | + |
| 94 | +### Why not a database? |
| 95 | + |
| 96 | +We're generating static websites, for good reasons. [More info](https://www.cloudflare.com/en-gb/learning/performance/static-site-generator#Pros). But feel free to Google comparison and evaluations which direction you consider best. E.g. [wpamelia.com](https://wpamelia.com/static-vs-dynamic-website/?gclid=Cj0KCQjwxveXBhDDARIsAI0Q0x234PSArlYOfbriIL6u0g3RUlRST8zfdAnYtkrRSs-GJ3RdgwaCSaEaArioEALw_wcB). |
| 97 | + |
| 98 | +Because we've chosen a static site generator, [Docusaurus](https://docusaurus.io) for the time being, but there are other open-source options, **a database would be balast**. |
| 99 | + |
| 100 | +## Counting tool |
| 101 | + |
| 102 | +The counting tool is offered by Blockchainbird.org and has been develop in 2019 as a means to assess the level of real expertise in blockchain publications. It crawled through a pdf, based on a dictionaries of terms and very simple business rules. |
| 103 | + |
| 104 | +> E.g. If a pdf mentions 'bitcoin' in conjunction with words like 'scam', 'tax evasion', etc. we considered the writers as not being informed too well about the true nature of the bitcoin / blockchain innovation. |
| 105 | +
|
| 106 | +### Why would we need a terms counting tool? |
| 107 | + |
| 108 | +The actual presence of a certain glossary term in documents and webpages is a strong indication whether the term at hand is relevant in a certain section. Based on this relevance expressed in an objective count we can automatically added certain tags and categories to the term. |
| 109 | + |
| 110 | +> The term 'out-of-band' has lots of 'hits' in the OOBI repo, but much less so in the KERI repo. |
| 111 | +
|
| 112 | +Based on this relevance expressed in an objective count we can automatically added certain tags and categories to the term. |
| 113 | + |
| 114 | +> The term 'out-of-band' wil have an impressive count in the column `Cat_OOBI`. We might offer a high level menu item for the term in the sidebar of the WebofTrust Glossary 'OOBI'. |
| 115 | +
|
| 116 | +The other reason is that a manual check for terms in documents is a very strenuous and time-consuming effort. And the result is always outdated per definition: once your change the source, the glossary needs to be updated too. |
| 117 | + |
| 118 | +**In brief:\ |
| 119 | +We count, so we're lazily up to date.** |
| 120 | + |
| 121 | +### Redesign |
| 122 | + |
| 123 | +Recently the tool has been engineered towards the WOT-terms challenge: |
| 124 | + |
| 125 | +- it crawls any Github page and also pdfs (if necessary) |
| 126 | +- the tool uses the 'Terms WOT manage' sheet to match terms |
| 127 | +- the scores are based on a combination of parameters: |
| 128 | + 1. level of (understanding need for) the term |
| 129 | + 2. number of appearances, the actual count |
| 130 | + |
| 131 | +#### Regular expression to hit |
| 132 | + |
| 133 | +- Some abbreviations are too short. The acronyms match non-relevant things in the text. E.g. "AN", "AID" and "SAID". We need to look for "(AN)", " AN " or at the beginning of a paragraph: "\nAN " instead. |
| 134 | +- multiple-word expressions like "virtual-credential-transaction-event-log" should be looked for using "virtual credential transaction event log". |
| 135 | +- The longer combination that matches exactly, takes precedence in the count: |
| 136 | + "virtual-credential-transaction-event-log", then "credential transaction event log" and lastly "transaction event log". No double counts here. Same with acronyms: an exact match for "VC-TEL" implies that there's no count for "TEL". Lastly also in syllables: a hit for "keridemlia" doesn't count "keri" in this word. |
| 137 | + |
| 138 | +### Results |
| 139 | + |
| 140 | +The count of terms are in the Cat_CCCCC columns after a (re)run of the counting tool. |
| 141 | + |
| 142 | +| TBW prio 1: the tool is currently being re-developed, August 18 2022 | |
| 143 | + |
| 144 | +## Why do we need this? |
| 145 | + |
| 146 | +- Key: We might need a Key field **to be able to** have a unique long-living identifier for a term in the WebofTrust domain. However, any term goes through a life cycle, with the end state of a term being well-known, unchanged for a while and unique. The `Key` field has become superfluous by then. |
| 147 | +- TTTTT_Fkey / TTTTT_start: We use this Foreign Key to link to other educational resources of the this term, like Youtube footage\*, webpages and other glossaries. |
| 148 | +- level: We assess a [level of understanding](../README.md#levels-of-understanding) to meaningful study a term. Regardless this subjective and personal judgement, the filtering options are numerous: |
| 149 | + |
| 150 | +1. offer everything (a glossary) |
| 151 | +2. offer a learning trajectory |
| 152 | +3. filter in the opposite direction: exclude terms for experts; don't bother them with Noob answers. |
| 153 | +4. etc. |
| 154 | + |
| 155 | +- Cat_CCCCC: we are now able to store the [counts](#counting-tool) and then offer the term in various relevant contexts at the front end of the site. |
| 156 | + |
| 157 | +'\* Youtube footage: plus the start time of where the term is mentioned first or most extensively. |
| 158 | + |
| 159 | +## Why not a term-content file per level of understanding? |
| 160 | + |
| 161 | +Per term various levels of explanation (plus related further readings) are offered within one source file `.md`. The reason for this is that every individual learner is different. Within the source file of a term we can label "stars" to both questions and answers, compliant to what's explained in this section of the README.md file : [Levels of Understanding](../README.md#levels-of-understanding) |
| 162 | + |
| 163 | +By offering "everything we have" about a certain term in one file, a reader is able to identify herself / himself with a certain level in a specific context and "filter the stars" in an eye blink. |
| 164 | + |
| 165 | +## What's the whole point of managing WOT terms in a sheet? |
| 166 | + |
| 167 | +Three major applications: |
| 168 | + |
| 169 | +Being the home of our terms maintenance, we [load ToIP glossary](./load-toip-glossary-in-weboftrust-github-page.md) and generate our Docusaurus [static content site](https://weboftrust.github.io/WOT-terms/) on Github. This whole process is steered with the content in the _Terms WOT manage sheet_. |
| 170 | + |
| 171 | +Any resource that mentions WebofTrust terms can be much easier enriched with the use of _Terms WOT manage sheet_. |
| 172 | +For example, we can create a [terms link table for any footage](https://github.com/WebOfTrust/WOT-terms/blob/gh-pages/howto/create-terms-link-table.md) from the sheet Terms WOT manage sheet. |
| 173 | + |
| 174 | +Integration and synchronisation with other glossaries and destination information sources is possible by maintenance of Key and Foreign Keys in _Terms WOT manage sheet_. |
| 175 | + |
| 176 | +This is a non-exhaustive list of application options. |
0 commit comments