Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a psake task to generate the dictionaries #2

Open
abhi18av opened this issue Jul 18, 2022 · 3 comments
Open

Create a psake task to generate the dictionaries #2

abhi18av opened this issue Jul 18, 2022 · 3 comments
Assignees

Comments

@abhi18av
Copy link

Hi @Mxrcon ,

I noticed that the https://github.com/Mxrcon/BioNameGenerator/blob/main/BioNameGenerator/Databases/Generate-Databases.ps1 is basically a small task to generate the sqlite database from the TSV files.

I was wondering whether it makes sense to

  • Add a task to download these TSV files into a folder: This way the sources would be self-documenting
  • Add another task to generate the database

Nothing urgent, but something nice to have as it'll guide the collaborators.

@Mxrcon
Copy link
Owner

Mxrcon commented Jul 18, 2022

I completely agree Abhinav, I was worried about the TSV's outside the repository, I don't want to distribute the tsv's file within the package, but for development propose I can add the on the root folder of the repository and them create a psake task as you mentioned.

@Mxrcon Mxrcon self-assigned this Jul 18, 2022
@Mxrcon
Copy link
Owner

Mxrcon commented Jul 19, 2022

Hey H@abhi18av, The tsv sources are actually modified from different web sources, and some of them had inconsistencies that I had to solve manually, like names with u instead of ü. and cleanups on some unnecessary compound names like Iron-II vs Iron-III.

What do you think about adding TSV files on the root folder of this repository on this structure:

Dictionaries/
├── Generate-Databases.ps1
└── TsvDictionaries
    ├── 28kAdjectives.tsv
    ├── 5kColors.tsv
    ├── Aminoacids.tsv
    ├── Animals.tsv
    ├── BacterialGeneras.tsv
    ├── BacterialSpecies.tsv
    ├── BiologicalBooks.tsv
    ├── BrazilianScientists.tsv
    ├── ChemicalCompounds.tsv
    ├── Colors.tsv
    ├── ComputationKeywords.tsv
    ├── Dictionaries.db
    ├── FieldsWinners.tsv
    ├── LaboratoryKeywords.tsv
    ├── MetalsAndAlloys.tsv
    ├── NF-Adjectives.tsv
    ├── NF-Names.tsv
    ├── NobelLaureates.tsv
    ├── NucleicAcids.tsv
    ├── PeriodicTableElements.tsv
    └── RPGKeywords.tsv

And this Script would be called by the psake task in order to generate a database, and we would have 2 tasks:

  1. Generate database (to use all TSVs and generate a sqlite database)
  2. Update database (Generate a new database and update the Bionamegenerator/Databases/Dictionaries.db

Unfortunately I'm not sure that I'll be able to write a pwsh script able to completely reproduce the process on getting wikipedia pages and them formating them to the TSV's.

Kindly, Davi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@abhi18av @Mxrcon and others