Skip to content

Commit 893373b

Browse files
Update development with hot fix and changelog (#44)
* Add CL items for 1.3 release * try to find existing ms sample in case of replicate data * Add CL information for patch 1.3.1 Affects the register-mzml-dropbox. Checks the given project for existing samples with the provided sample code of type "Q_MS_RUN". A new sample will be only created in openBIS, if this sample does not yet exist. * Provide format documentation for single-end / paired-end data registration (#40) This CL introduces some documentation for the data structure in preparation to transfer data to QBIC and register them into qPortal. * Update Changelog Co-authored-by: Sven Fillinger <[email protected]> Co-authored-by: Sven F <[email protected]>
1 parent 798a1e1 commit 893373b

File tree

3 files changed

+75
-4
lines changed

3 files changed

+75
-4
lines changed

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
# Changelog
22

3+
## 1.4.0
4+
5+
* Register unclassified pooling data of Nanopore experiments directly at the experiment level (no copies are added to sample-based datasets)
6+
7+
## 1.3.1
8+
9+
* Avoid sample registration for existing mass spectrometry data
10+
11+
## 1.3
12+
13+
* Provide metadata schema in JSON for the IMGAG dropbox
14+
* Register checksums for Oxford Nanopore datasets
15+
* Register unclassified read data for Oxford Nanopore datasets
16+
317
## 1.2
418

519
* Provide ETL routine for Oxford Nanopore NGS data

README.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,55 @@
22

33
This repository holds a collection of Jython ETL (extract-transform-load) scripts that are used at QBiC that define the behaviour of openBIS dropboxes.
44
The ETL processes combine some quality control measures for incoming data and data transformation to facilitate the registration in openBIS.
5+
6+
## Data format guidelines
7+
8+
These guidelines describe the necessary file structure for different
9+
data types to be met in order to ingest and register them correctly in
10+
openBIS.
11+
12+
Formats:
13+
14+
- [NGS single-end / paired-end data](#ngs-single-end--paired-end-data)
15+
16+
### NGS single-end / paired-end data
17+
18+
**Responsible dropbox:**
19+
[QBiC-register-fastq-dropbox](drop-boxes/register-fastq-dropbox)
20+
21+
**Resulting data model in openBIS**
22+
Q_TEST_SAMPLE -> Q_NGS_RAW_DATA (with sample code) -> DataSet (directory
23+
with files contained)
24+
25+
**Description**
26+
For paired-end sequencing reads in FASTQ format, the file structure
27+
needs to look like this
28+
29+
```
30+
<QBIC sample code>.fastq // Directory
31+
|-- <QBIC sample code>_R1.fastq
32+
|-- <QBIC sample code>_R1.fastq.sha256sum
33+
|-- <QBIC sample code>_R2.fastq
34+
|-- <QBIC sample code>_R2.fastq.sha256sum
35+
```
36+
37+
or in the case of gzipped FASTQ files:
38+
39+
```
40+
<QBIC sample code>.fastq.gz // Directory
41+
|-- <QBIC sample code>_R1.fastq.gz
42+
|-- <QBIC sample code>_R1.fastq.gz.sha256sum
43+
|-- <QBIC sample code>_R2.fastq.gz
44+
|-- <QBIC sample code>_R2.fastq.gz.sha256sum
45+
```
46+
47+
In the case of single-end sequencing data, the file structure needs to
48+
look like this:
49+
50+
```
51+
<QBIC sample code>.fastq.gz // Directory
52+
|-- <QBIC sample code>.fastq.gz
53+
|-- <QBIC sample code>.fastq.gz.sha256sum
54+
```
55+
56+

drop-boxes/register-mzML-dropbox/register-mzml-dropbox.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,10 @@ def process(transaction):
8181
experimentIDs.append(exp.getExperimentIdentifier())
8282
if exp.getExperimentType() == expType:
8383
msExperiment = exp
84+
msSampleID = '/' + space + '/' + 'MS' + parentCode
85+
msSample = transaction.getSampleForUpdate(msSampleID)
86+
if msSample:
87+
msExperiment = msSample.getExperiment()
8488
# no existing experiment for samples of this sample preparation found
8589
if not msExperiment:
8690
expID = experimentIDs[0]
@@ -91,13 +95,14 @@ def process(transaction):
9195
expID = '/' + space + '/' + project + '/' + project + 'E' + str(expNum)
9296
msExperiment = transaction.createNewExperiment(expID, expType)
9397

94-
newMSSample = transaction.createNewSample('/' + space + '/' + 'MS'+ parentCode, "Q_MS_RUN")
95-
newMSSample.setParentSampleIdentifiers([sa.getSampleIdentifier()])
96-
newMSSample.setExperiment(msExperiment)
98+
if not msSample:
99+
msSample = transaction.createNewSample('/' + space + '/' + 'MS'+ parentCode, "Q_MS_RUN")
100+
msSample.setParentSampleIdentifiers([sa.getSampleIdentifier()])
101+
msSample.setExperiment(msExperiment)
97102
# create new dataset
98103
dataSet = transaction.createNewDataSet("Q_MS_MZML_DATA")
99104
dataSet.setMeasuredData(False)
100-
dataSet.setSample(newMSSample)
105+
dataSet.setSample(msSample)
101106

102107
transaction.moveFile(incomingPath, dataSet)
103108

0 commit comments

Comments
 (0)