generated from linkml/linkml-template
-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathbasic_slots.yaml
1193 lines (1012 loc) · 38.9 KB
/
basic_slots.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
id: https://w3id.org/nmdc/basic_slots
name: NMDC-Basic-Slots
title: Basic Slots for NMDC Schema
description: >-
Basic Linkml slots that are used across the Schema for National Microbiome Data Collaborative (NMDC).
Examples include "id", "name", "description". These slots have primitive data types (e.g., sting) as ranges.
license: https://creativecommons.org/publicdomain/zero/1.0/
prefixes:
dcterms: http://purl.org/dc/terms/
skos: http://www.w3.org/2004/02/skos/core#
linkml: https://w3id.org/linkml/
nmdc: https://w3id.org/nmdc/
imports:
- nmdc_types
default_prefix: nmdc
default_range: string
slots:
qc_comment:
range: string
description: >-
Slot to store additional comments about laboratory or workflow output. For workflow output
it may describe the particular workflow stage that failed. (ie Failed at call-stage due to a malformed fastq file).
objective:
range: string
description: >-
The scientific objectives associated with the entity.
It SHOULD correspond to scientific norms for objectives field in a structured abstract.
mappings:
- SIO:000337
md5_checksum:
range: string
description: MD5 checksum of file (pre-compressed)
data_object_type:
range: FileTypeEnum
description: The type of file represented by the data object.
examples:
- value: FT ICR-MS Analysis Results
- value: GC-MS Metabolomics Results
structured_aliases:
data_object_type:
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
predicate: EXACT_SYNONYM
data_category:
range: DataCategoryEnum
description: The category of the file, such as instrument data from data generation or processed data from a workflow execution.
compression_type:
range: string
description: If provided, specifies the compression type
examples:
- value: gzip
todos:
- consider setting the range to an enum
started_at_time:
range: string
# range: datetime
pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
notes:
- 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/ It may not be complete, but it is good enough for now.'
mappings:
- prov:startedAtTime
ended_at_time:
# range: datetime
pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
notes:
- 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/ It may not be complete, but it is good enough for now.'
mappings:
- prov:endedAtTime
git_url:
description: The url that points to the exact github location of a workflow.
range: string
examples:
- value: "https://github.com/microbiomedata/mg_annotation/releases/tag/0.1"
- value: "https://github.com/microbiomedata/metaMS/blob/master/metaMS/gcmsWorkflow.py"
execution_resource:
range: ExecutionResourceEnum
description: The computing resource or facility where the workflow was executed.
examples:
- value: NERSC-Cori
websites:
range: string
multivalued: true
pattern: ^[Hh][Tt][Tt][Pp][Ss]?:\/\/(?!.*[Dd][Oo][Ii]\.[Oo][Rr][Gg]).*$
description: A list of websites that are associated with the entity.
comments:
- DOIs should not be included as websites. Instead, use the associated_dois slot.
- A consortium's homepage website should be included in the homepage_website slot, not in websites.
- consortium is a convenience term for a Study whose study_category value is consortium
- the website slot and its subproperties are virtually identical to the url slot, except that they are multivalued and url is single-valued.
see_also:
- nmdc:url
homepage_website:
is_a: websites
maximum_cardinality: 1
description: The website address (URL) of an entity's homepage.
examples:
- value: https://www.neonscience.org/
profile_image_url:
description: A url that points to an image of a person.
range: string
orcid:
description: The ORCID of a person.
range: string
email:
description: >-
An email address for an entity such as a person.
This should be the primary email address used.
range: string
slot_uri: schema:email
display_order:
range: integer
description: When rendering information, this attribute to specify the order in which the information should be rendered.
url:
range: string
notes:
- See issue 207 - this clashes with the mixs field
language:
range: language_code
description: Should use ISO 639-1 code e.g. "en", "fr"
has_raw_value:
description: The value that was specified for an annotation in raw form, i.e. a string. E.g. "2 cm" or "2-4 cm"
range: string
has_unit:
description: Links a QuantityValue to a unit
aliases:
- scale
range: unit
mappings:
- qud:unit
- schema:unitCode
has_numeric_value:
description: Links a quantity value to a number
range: decimal
mappings:
- qud:quantityValue
- schema:value
has_minimum_numeric_value:
is_a: has_numeric_value
description: The minimum value part, expressed as number, of the quantity value when the value covers a range.
has_maximum_numeric_value:
is_a: has_numeric_value
description: The maximum value part, expressed as number, of the quantity value when the value covers a range.
has_boolean_value:
description: Links a quantity value to a boolean
range: boolean
latitude:
range: decimal_degree
description: latitude
slot_uri: wgs84:lat
examples:
- value: "-33.460524"
mappings:
- schema:latitude
longitude:
range: decimal_degree
description: longitude
slot_uri: wgs84:long
examples:
- value: "150.168149"
mappings:
- schema:longitude
infiltrations:
description: The amount of time it takes to complete each infiltration activity
examples:
- value: [ '00:01:32', '00:00:53' ]
aliases:
- infiltration_1
- infiltration_2
multivalued: true
list_elements_ordered: true
range: string
pattern: ^(?:[0-9]|[1-9][0-9]|9[0-9]|0[0-9]|0[0-5][0-9]):[0-5][0-9]:[0-5][0-9]$
see_also:
- https://www.protocols.io/view/field-sampling-protocol-kqdg3962pg25/v1
soluble_iron_micromol:
range: string
sample_collection_site:
range: string
salinity_category:
description:
"Categorical description of the sample's salinity. Examples: halophile,
halotolerant, hypersaline, huryhaline"
range: string
see_also:
- https://github.com/microbiomedata/nmdc-metadata/pull/297
notes:
- "maps to gold:salinity"
proport_woa_temperature:
range: string
location:
range: string
host_name:
range: string
community:
range: string
embargoed:
description: >-
If true, the data are embargoed and not available for public access.
range: boolean
recommended: true
todos:
- make this required?
- first apply to Biosample
- try to apply to all Biosamples in a particular nmdc-server SubmissionMetadata?
- applying to a Study may not be granular enough
habitat:
range: string
version:
range: string
doi_value:
description: >-
A digital object identifier, which is intended to persistantly identify some resource on the web.
required: true
aliases:
- DOI
- digital object identifier
range: uriorcurie
pattern: '^doi:10.\d{2,9}/.*$'
examples:
- value: doi:10.46936/10.25585/60000880
description: The DOI links to an electronic document.
exact_mappings:
- OBI:0002110
narrow_mappings:
- edam.data:1188
doi_provider:
description: >-
The authority, or organization, the DOI is associated with.
range: DoiProviderEnum
close_mappings:
- NCIT:C74932
examples:
- value: ess_dive
description: The corresponding DOI is associated with ESS-DIVE.
doi_category:
description: >-
The resource type the corresponding doi resolves to.
range: DoiCategoryEnum
required: true
examples:
- value: dataset_doi
description: The corresponding DOI is a dataset resource type.
related_identifiers:
title: Related Identifiers
description: Identifiers assigned to a thing that is similar to that which is represented in NMDC. Related identifier are not an identical match and may have some variation.
notes: { }
funding_sources:
multivalued: true
range: string
description: >-
A list of organizations, along with the award numbers, that underwrite financial support for projects of
a particular type. Typically, they process applications and award funds to the chosen qualified
applicants.
comments:
- Include only the name of the funding organization and the award or contract number.
examples:
- value: National Sciences Foundation Dimensions of Biodiversity (award no. 1342701)
- value: >-
U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research
(BER) under contract DE-AC05-00OR2275
close_mappings:
- NCIT:C39409
## GOLD PATHS
gold_path_field:
range: string
abstract: true
description: >-
This is a grouping for any of the gold path fields
annotations:
tooltip:
tag: tooltip
value: GOLD Ecosystem Classification paths describe the surroundings from which an environmental sample or an organism is collected.
annotations:
source: https://gold.jgi.doe.gov/ecosystem_classification
ecosystem:
is_a: gold_path_field
description:
An ecosystem is a combination of a physical environment (abiotic
factors) and all the organisms (biotic factors) that interact with this environment.
Ecosystem is in position 1/5 in a GOLD path.
comments:
- The abiotic factors play a profound role on the type and composition
of organisms in a given environment. The GOLD Ecosystem at the top of the five-level
classification system is aimed at capturing the broader environment from which
an organism or environmental sample is collected. The three broad groups under
Ecosystem are Environmental, Host-associated, and Engineered. They represent
samples collected from a natural environment or from another organism or from
engineered environments like bioreactors respectively.
see_also: https://gold.jgi.doe.gov/help
ecosystem_category:
is_a: gold_path_field
description:
Ecosystem categories represent divisions within the ecosystem based
on specific characteristics of the environment from where an organism or sample
is isolated. Ecosystem category is in position 2/5 in a GOLD path.
comments:
- The Environmental ecosystem (for example) is divided into Air, Aquatic
and Terrestrial. Ecosystem categories for Host-associated samples can be individual
hosts or phyla and for engineered samples it may be manipulated environments
like bioreactors, solid waste etc.
see_also: https://gold.jgi.doe.gov/help
ecosystem_type:
is_a: gold_path_field
description:
Ecosystem types represent things having common characteristics within
the Ecosystem Category. These common characteristics based grouping is still
broad but specific to the characteristics of a given environment. Ecosystem
type is in position 3/5 in a GOLD path.
comments:
- The Aquatic ecosystem category (for example) may have ecosystem types
like Marine or Thermal springs etc. Ecosystem category Air may have Indoor air
or Outdoor air as different Ecosystem Types. In the case of Host-associated
samples, ecosystem type can represent Respiratory system, Digestive system,
Roots etc.
see_also: https://gold.jgi.doe.gov/help
ecosystem_subtype:
is_a: gold_path_field
description:
Ecosystem subtypes represent further subdivision of Ecosystem types
into more distinct subtypes. Ecosystem subtype is in position 4/5 in a GOLD
path.
comments:
- Ecosystem Type Marine (Environmental -> Aquatic -> Marine) is further
divided (for example) into Intertidal zone, Coastal, Pelagic, Intertidal zone
etc. in the Ecosystem subtype category.
see_also: https://gold.jgi.doe.gov/help
specific_ecosystem:
is_a: gold_path_field
description:
Specific ecosystems represent specific features of the environment
like aphotic zone in an ocean or gastric mucosa within a host digestive system.
Specific ecosystem is in position 5/5 in a GOLD path.
comments:
- Specific ecosystems help to define samples based on very specific characteristics
of an environment under the five-level classification system.
see_also: https://gold.jgi.doe.gov/help
add_date:
range: string
description: The date on which the information was added to the database.
mod_date:
range: string
description: The last date on which the database information was modified.
ncbi_taxonomy_name:
range: string
ncbi_project_name:
range: string
processing_institution:
range: ProcessingInstitutionEnum
description: The organization that processed the sample.
qc_status:
description: Stores information about the result of a process (ie the process of sequencing a library may have for qc_status of 'fail' if not enough data was generated)
range: StatusEnum
file_size_bytes:
range: bytes
description: Size of the file in bytes
analyte_category:
required: true
description: >
The type of analyte(s) that were measured in the data generation process and analyzed
in the Workflow Chain
type:
required: true
range: uriorcurie
slot_uri: rdf:type
description: the class_uri of the class that has been instantiated
notes:
- replaces legacy nmdc:type slot
- makes it easier to read example data files
- required for polymorphic MongoDB collections
see_also:
- https://github.com/microbiomedata/nmdc-schema/issues/1048
- https://github.com/microbiomedata/nmdc-schema/issues/1233
- https://github.com/microbiomedata/nmdc-schema/issues/248
structured_aliases:
workflow_execution_class:
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
predicate: NARROW_SYNONYM
examples:
- value: nmdc:Biosample
- value: nmdc:Study
designates_type: true
external_database_identifiers:
abstract: true
description: Link to corresponding identifier in external database
is_a: alternative_identifiers
multivalued: true
pattern: '^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$'
range: external_identifier
comments:
- The value of this field is always a registered CURIE
notes:
- "had tried ranges of external identifier and string"
close_mappings:
- skos:closeMatch
dna_concentration:
see_also:
- nmdc:nucleic_acid_concentration
title: DNA concentration in ng/ul
comments:
- Units must be in ng/uL. Enter the numerical part only. Must be calculated using
a fluorometric method. Acceptable values are 0-2000.
examples:
- value: '100'
from_schema: https://example.com/nmdc_dh
rank: 5
range: float
slot_group: JGI-Metagenomics
recommended: true
minimum_value: 0
maximum_value: 2000
extraction_targets:
description: Provides the target biomolecule that has been separated from a sample during an extraction process.
rank: 1000
multivalued: true
range: ExtractionTargetEnum
notes:
- todos, remove nucl_acid_ext from OmicsProcessing (DataGeneration)
narrow_mappings:
- NCIT:C177560
- MIXS:0000037
id:
required: true
identifier: true
range: uriorcurie
description: >-
A unique identifier for a thing.
Must be either a CURIE shorthand for a URI or a complete URI
structured_aliases:
workflow_execution_id:
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
predicate: NARROW_SYNONYM
data_object_id:
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
predicate: NARROW_SYNONYM
notes:
- "abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?"
- 'a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters will be accepted'
- 'typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked via per-class id slot usage assertions'
- 'minting authority shoulders should probably be enumerated and checked in the pattern'
examples:
- value: nmdc:mgmag-00-x012.1_7_c1
description: 'https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248'
pattern: '^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$'
name:
range: string
description: >-
A human readable label for an entity
description:
range: string
description: >-
a human-readable description of a thing
slot_uri: dcterms:description
title:
range: string
description: >-
A name given to the entity that differs from the name/label programmatically assigned to it.
For example, when extracting study information for GOLD, the GOLD system has assigned a name/label.
However, for display purposes, we may also wish the capture the title of the proposal that was used to fund the study.
exact_mappings:
- dcterms:title
alternative_titles:
range: string
multivalued: true
description: >-
A list of alternative titles for the entity.
The distinction between title and alternative titles is application-specific.
exact_mappings:
- dcterms:alternative
alternative_names:
range: string
multivalued: true
description: >-
A list of alternative names used to refer to the entity.
The distinction between name and alternative names is application-specific.
exact_mappings:
- dcterms:alternative
- skos:altLabel
alternative_descriptions:
range: string
multivalued: true
description: >-
A list of alternative descriptions for the entity.
The distinction between description and alternative descriptions is application-specific.
alternative_identifiers:
range: uriorcurie
multivalued: true
description: >-
A list of alternative identifiers for the entity.
pattern: '^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$'
start_date:
range: string
description: The date on which any process or activity was started
comments:
- We are using string representations of dates until all components of our ecosystem can handle ISO 8610 dates
- The date should be formatted as YYYY-MM-DD
todos:
- add date string validation pattern
end_date:
range: string
description: The date on which any process or activity was ended
comments:
- We are using string representations of dates until all components of our ecosystem can handle ISO 8610 dates
- The date should be formatted as YYYY-MM-DD
todos:
- add date string validation pattern
enums:
ExecutionResourceEnum:
see_also:
- nmdc:DoiProviderEnum
- nmdc:ProcessingInstitutionEnum
- nmdc:ExecutionResourceEnum
permissible_values:
NERSC-Cori:
description: NERSC Cori supercomputer
aliases:
- Cori
NERSC-Perlmutter:
description: NERSC Perlmutter supercomputer
aliases:
- Perlmutter
- Saul
EMSL:
description: Environmental Molecular Sciences Laboratory
EMSL-RZR:
description: Environmental Molecular Sciences Laboratory RZR cluster
aliases:
- RZR
JGI:
description: Joint Genome Institute
LANL-B-div:
description: LANL Bioscience Division
aliases:
- B-div
FileTypeEnum:
permissible_values:
Virus Summary:
description: Tab separated file listing the viruses found by geNomad.
see_also:
- https://portal.nersc.gov/genomad/
annotations:
file_name_pattern: '^_virus_summary\.tsv?$'
Plasmid Summary:
description: Tab separated file listing the plasmids found be geNomad.
see_also:
- https://portal.nersc.gov/genomad/
annotations:
file_name_pattern: '^_plasmid_summary\.tsv?$'
GeNomad Aggregated Classification:
description: >-
Tab separated file which combines the results from neural network-based classification
and marker-based classification for virus and plasmid detection with geNomad.
see_also:
- https://portal.nersc.gov/genomad/
annotations:
file_name_pattern: '^_aggregated_classification\.tsv?$'
Reference Calibration File:
description: A file that contains data used to calibrate a natural organic matter or metabalomics analysis.
Metagenome Raw Reads:
description: Interleaved paired-end raw sequencing data
annotations:
file_name_pattern: '^\.fastq(\.gz)?$'
Metagenome Raw Read 1:
description: Read 1 raw sequencing data, aka forward reads
examples:
value: "BMI_H25VYBGXH_19S_31WellG1_R1.fastq.gz"
annotations:
file_name_pattern: '^.+_R1\.fastq(\.gz)?$'
Metagenome Raw Read 2:
description: Read 2 raw sequencing data, aka reverse reads
examples:
value: "BMI_H25VYBGXH_19S_31WellG1_R2.fastq.gz"
annotations:
file_name_pattern: '^.+_R2\.fastq(\.gz)?$'
FT ICR-MS Analysis Results:
description: FT ICR-MS-based molecular formula assignment results table
GC-MS Metabolomics Results:
description: GC-MS-based metabolite assignment results table
Metaproteomics Workflow Statistics:
description: Aggregate workflow statistics file
Protein Report:
description: Filtered protein report file
Peptide Report:
description: Filtered peptide report file
Unfiltered Metaproteomics Results:
description: MSGFjobs and MASIC output file
Read Count and RPKM:
description: Annotation read count and RPKM per feature JSON
QC non-rRNA R2:
description: QC removed rRNA reads (R2) fastq
QC non-rRNA R1:
description: QC removed rRNA reads (R1) fastq
Metagenome Bins:
description: Metagenome bin contigs fasta
Metagenome HQMQ Bins Compression File:
description: Compressed file containing high qulaity and medium quality metagenome bins and associated files
annotations:
file_name_pattern:
tag: file_name_pattern
value: "[mag_wf_activity_id]_hqmq_bin.zip"
Metagenome LQ Bins Compression File:
description: Compressed file containing low quality metagenome bins and associated files
annotations:
file_name_pattern:
tag: file_name_pattern
value: "[mag_wf_activity_id]_lq_bin.zip"
Metagenome Bins Info File:
description: File containing version information on the binning workflow
annotations:
file_name_pattern: "[mag_wf_activity_id]_bin.info"
CheckM Statistics:
description: CheckM statistics report
Metagenome Bins Heatmap:
description: The Heatmap presents the pdf file containing the KO analysis results for metagenome bins
annotations:
file_name_pattern:
value: "[mag_wf_activity_id]_heatmap.pdf"
Metagenome Bins Barplot:
description: The Bar chart presents the pdf file containing the KO analysis results for metagenome bins
annotations:
file_name_pattern:
value: "[mag_wf_activity_id]_barplot.pdf"
Metagenome Bins Krona Plot:
description: The Krona plot presents the HTML file containing the KO analysis results for metagenome bins
annotations:
file_name_pattern:
value: "[mag_wf_activity_id]_kronaplot.html"
Read Based Analysis Info File:
description: File containing reads based analysis information
annotations:
file_name_pattern: "profiler.info"
GTDBTK Bacterial Summary:
description: GTDBTK bacterial summary
GTDBTK Archaeal Summary:
description: GTDBTK archaeal summary
GOTTCHA2 Krona Plot:
description: GOTTCHA2 krona plot HTML file
GOTTCHA2 Classification Report:
description: GOTTCHA2 classification report file
GOTTCHA2 Report Full:
description: GOTTCHA2 report file
Kraken2 Krona Plot:
description: Kraken2 krona plot HTML file
Centrifuge Krona Plot:
description: Centrifuge krona plot HTML file
Centrifuge output report file:
description: Centrifuge output report file
Kraken2 Classification Report:
description: Kraken2 output report file
Kraken2 Taxonomic Classification:
description: Kraken2 output read classification file
Centrifuge Classification Report:
description: Centrifuge output report file
Centrifuge Taxonomic Classification:
description: Centrifuge output read classification file
Structural Annotation GFF:
description: GFF3 format file with structural annotations
annotations:
file_name_pattern: "[GOLD-AP]_structural_annotation.gff"
Structural Annotation Stats Json:
description: Structural annotations stats json
annotations:
file_name_pattern: "[GOLD-AP]_structural_annotation_stats.json"
Functional Annotation GFF:
description: GFF3 format file with functional annotations
annotations:
file_name_pattern: "[GOLD-AP]_functional_annotation.gff"
Annotation Info File:
description: File containing annotation info
annotations:
file_name_pattern: "[GOLD-AP]_imgap.info"
Annotation Amino Acid FASTA:
description: FASTA amino acid file for annotated proteins
annotations:
file_name_pattern: "[GOLD-AP]_proteins.faa"
Annotation Enzyme Commission:
description: Tab delimited file for EC annotation
annotations:
file_name_pattern: "[GOLD-AP]_ec.tsv"
Annotation KEGG Orthology:
description: Tab delimited file for KO annotation
annotations:
file_name_pattern: "[GOLD-AP]_ko.tsv"
Assembly Info File:
description: File containing assembly info
annotations:
file_name_pattern: "README.txt"
Assembly Coverage BAM:
description: Sorted bam file of reads mapping back to the final assembly
annotations:
file_name_pattern: "[GOLD-AP]_pairedMapped.sam.gz"
Assembly AGP:
description: An AGP format file that describes the assembly
Assembly Scaffolds:
description: Final assembly scaffolds fasta
annotations:
file_name_pattern: "[GOLD-AP]_assembly.contigs.fasta"
Assembly Contigs:
description: Final assembly contigs fasta
annotations:
file_name_pattern: "assembly.contigs.fasta"
Assembly Coverage Stats:
description: Assembled contigs coverage information
annotations:
file_name_pattern: "[GOLD-AP]_pairedMapped_sorted.bam.cov"
Contig Mapping File:
description: Contig mappings between contigs and scaffolds
annotations:
file_name_pattern: "[GOLD-AP]_contig_names_mapping.tsv"
Error Corrected Reads:
description: Error corrected reads fastq
annotations:
file_name_pattern: "input.corr.fastq.gz"
Filtered Sequencing Reads:
description: Reads QC result fastq (clean data)
annotations:
file_name_pattern: "/.+?(?=filter)/filter-METAGENOME.fastq.gz "
Read Filtering Info File:
description: File containing read filtering information
annotations:
file_name_pattern: "[rqc_wf_activity_id]_readsQC.info"
QC Statistics Extended:
description: Extended report including methods and results for read filtering
annotations:
file_name_pattern: "/.+?(?=filter)/filtered-report.txt"
QC Statistics:
description: Reads QC summary statistics
annotations:
file_name_pattern: "[rqc_wf_activity_id]_filterStats2.txt"
TIGRFam Annotation GFF:
description: GFF3 format file with TIGRfam
annotations:
file_name_pattern: "[GOLD-AP]_tigrfam.gff"
CRT Annotation GFF:
description: GFF3 format file with CRT
annotations:
file_name_pattern: "[GOLD-AP]_crt.gff"
Genemark Annotation GFF:
description: GFF3 format file with Genemark
annotations:
file_name_pattern: "[GOLD-AP]_genemark.gff"
Prodigal Annotation GFF:
description: GFF3 format file with Prodigal
annotations:
file_name_pattern: "[GOLD-AP]_prodigal.gff"
TRNA Annotation GFF:
description: GFF3 format file with TRNA
annotations:
file_name_pattern: "[GOLD-AP]_trna.gff"
Misc Annotation GFF:
description: GFF3 format file with Misc
annotations:
file_name_pattern: "[GOLD-AP]_rfam_misc_bind_misc_feature_regulatory.gff"
RFAM Annotation GFF:
description: GFF3 format file with RFAM
annotations:
file_name_pattern: "[GOLD-AP]_rfam.gff"
TMRNA Annotation GFF:
description: GFF3 format file with TMRNA
annotations:
file_name_pattern: "[GOLD-AP]_rfam_ncrna_tmrna.gff"
Crispr Terms:
description: Crispr Terms
annotations:
file_name_pattern: "[GOLD-AP]_crt.crisprs"
Product Names:
description: Product names file
annotations:
file_name_pattern: "[GOLD-AP]_product_names.tsv"
Gene Phylogeny tsv:
description: Gene Phylogeny tsv
annotations:
file_name_pattern: "[GOLD-AP]_gene_phylogeny.tsv"
Scaffold Lineage tsv:
description: phylogeny at the scaffold level
annotations:
file_name_pattern: "[GOLD-AP]_scaffold_lineage.tsv"
Clusters of Orthologous Groups (COG) Annotation GFF:
description: GFF3 format file with COGs
annotations:
file_name_pattern: "[GOLD-AP]_cog.gff"
KO_EC Annotation GFF:
description: GFF3 format file with KO_EC
annotations:
file_name_pattern: "[GOLD-AP]_ko_ec.gff"
CATH FunFams (Functional Families) Annotation GFF:
description: GFF3 format file with CATH FunFams
annotations:
file_name_pattern: "[GOLD-AP]_cath_funfam.gff"
SUPERFam Annotation GFF:
description: GFF3 format file with SUPERFam
annotations:
file_name_pattern: "[GOLD-AP]_supfam.gff"
SMART Annotation GFF:
description: GFF3 format file with SMART
annotations:
file_name_pattern: "[GOLD-AP]_smart.gff"
Pfam Annotation GFF:
description: GFF3 format file with Pfam
annotations:
file_name_pattern: "[GOLD-AP]_pfam.gff"
Annotation Statistics:
description: Annotation statistics report
Direct Infusion FT ICR-MS Raw Data:
description:
Direct infusion 21 Tesla Fourier Transform ion cyclotron resonance
mass spectrometry raw data acquired in broadband full scan mode
LC-DDA-MS/MS Raw Data:
description: Liquid chromatographically separated MS1 and Data-Dependent MS2 binary instrument file
GC-MS Raw Data:
description: Gas chromatography-mass spectrometry raw data, full scan mode.
Configuration toml:
description: >-
A configuration toml file used by various programs to store settings that are specific to their
respective software.
broad_mappings:
- edam.format:4005
LC-MS Lipidomics Results:
description: >-
LC-MS-based lipidomics analysis results table
LC-MS Lipidomics Processed Data:
description: >-
Processed data for the LC-MS-based lipidomics analysis in hdf5 format
Contaminants Amino Acid FASTA:
description: FASTA amino acid file for contaminant proteins commonly observed in proteomics data.
Analysis Tool Parameter File:
description: A configuration file used by a single computational software tool that stores settings that are specific to that tool.
Workflow Operation Summary:
description: A human readable record of analysis steps applied during an instance of a workflow operation.
Metatranscriptome Expression:
description: Metatranscriptome expression values and read counts for gene features predicted on contigs
annotations:
file_name_pattern: "*.rnaseq_gea.txt"
Metatranscriptome Expression Intergenic:
description: Metatranscriptome expression values and read counts for intergenic regions.