Skip to content

CSi-Studio/Aird-SDK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

509 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1 What is Aird?

1.1 Abstract

Aird is a new format for mass spectrometry data storage. It is an opensource and computation-oriented format with controllable precision, flexible indexing strategies, and high compression rate for m/z, intensity and ion mobility pairs. Aird provides a novel compressor called ComboComp for m/z data compression,which makes up an amazing compression rate. Compared with Zlib, m/z data is about 65% lower in the Aird on average. Aird is a computational friendly algorithm. Through SIMD optimization, the decoding speed of Aird is much higher than that of Zlib.
Aird SDK is a developer tool written in Java, C# and Python language. It is convenient for developers who want to read the spectrum data in the Aird file quickly. With the high performance of reading and excellent compression rate, developer can develop a lot of application based on Aird for data visualization and analysis.

Aird Index File Suffix: .json
Aird Data File Suffix: .aird
Aird Index File and Aird Data File show be stored in the same directory with the same file.

1.2 AirdPro: Conversion Client for Vendor Files

You should use the AirdPro client to transfer the vendor files into Aird format.
You can download the AirdPro from the github:
https://github.com/CSi-Studio/AirdPro/releases/
After downloading, unzip the file, click the AirdPro.exe to start the AirdPro Application AirdPro is written in C#, it is also an opensource project. Simple UI is provided by AirdPro for people to convert the vendor file to the Aird file quickly.

1.3 Supported Acquisition Methods

  • DIA/SWATH
  • DDA
  • PRM
  • DIA_PASEF
  • DDA_PASEF

Demo code: see SampleCode.java in the project or in the "How to use" chapter

1.4 Citation

  1. Lu, M., An, S., Wang, R. et al. Aird: a computation-oriented mass spectrometry data format enables a higher compression ratio and less decoding time. BMC Bioinformatics 23, 35 (2022)

  2. Wang,J. et al. StackZDPD: a novel encoding scheme for mass spectrometry data optimized for speed and compression ratio. Scientific Reports, 12, 5384.(2022)

2. How to import (Java, C#, Python)

2.1 Maven for Java SDK

<dependency>
    <groupId>net.csibio.aird</groupId>
    <artifactId>aird-sdk</artifactId>
    <version>2.5.1.1</version>
</dependency>

2.2 Nuget for C# SDK

Search "AirdSDK" in Nuget Package Manager

2.3 PyPI for Python SDK

pip install AirdSDK

3 Domain Definition

3.1 AirdInfo

Name Type Required Description
version String True Aird format version
versionCode Integer True Aird format version code
engine Integer True Compression engine type (0: Row Compression, 1: Column Compression)
compressors List True The compression strategies for m/z, intensity and mobility array
instruments List True General information about the MS instrument
dataProcessings List False Description of any manipulation (from the first conversion to Aird format until the creation of the current Aird instance document) applied to the data
softwares List False Software used to convert the data. If data has been processed (e.g. profile > centroid) by any additional progs these should be added too
parentFiles List False Path to all the ancestor files (up to the native acquisition file) used to generate the current Aird document
rangeList List False The precursor m/z window ranges which have been adjusted with experiment overlap. This field is targeted for DIA and PRM type format
indexList List True The index for mass spectrometry data
indexStartPtr Long False Start position of compressed binary index data (version code >=7)
indexEndPtr Long False End position of compressed binary index data (version code >=7)
chromatogramIndex ChromatogramIndex False Chromatogram information for MRM acquisition mode
type String True Aird Type. Supported types: DIA, DDA, PRM, DIA_PASEF, DDA_PASEF, MRM, MSI_MALDI, COMMON
fileSize Long True The file size for Aird file and JSON file
totalCount Long True Total spectrums count
airdPath String False The .aird file path
activator String False Activator Method, CID,HCD,ETD,ECD
energy Float False Collision Energy
msType String True Mass Spectrum Type, PROFILE, CENTROIDED
rtUnit String True rt unit, always second
polarity String True Polarity type, POSITIVE, NEGATIVE, NEUTRAL
filterString String False Filter string for spectrum selection
ignoreZeroIntensityPoint Boolean True Whether ignore the point which intensity is 0
mobiInfo MobiInfo False ion mobility information
msiInfo MsiInfo False MSI (Mass Spectrometry Imaging) information
creator String False The file creator, this field can be set up in the AirdPro
createDate String False The create date for the aird file
features String False Some other features stored with "key:value;key:value" format
startTimeStamp String False Experiment start timestamp

3.2 Compressor

Name Type Required Description
target String True Compression target: mz, intensity, mobility, rt
methods List True Compression methods in order, e.g. ["VB","Zstd"]
precision Integer True Precision multiplier: 1000=3dp, 10000=4dp, etc.
digit Integer False Use for StackZDPD algorithm, 2^digit = layers (Python SDK only)
byteOrder String False Byte order: LITTLE_ENDIAN(default), BIG_ENDIAN

3.3 WindowRange

Name Type Required Description
start Double True Precursor m/z start
end Double True Precursor m/z end
mz Double True Precursor m/z
charge Integer False Precursor charge, 0 when empty
features String False Some other features stored with "key:value;key:value" format

3.4 BlockIndex

Name Type Required Description
level Integer True 1:MS1, 2:MS2
startPtr Long True The start point for the block
endPtr Long True The endpoint for the block
num Integer False The scan number in the vendor file. If a block has a list of MS2, this field is the related MS1's number
rangeList List False The precursor m/z window ranges which have been adjusted with experiment overlap. This field is targeted for DIA and PRM type format
nums List False Scan numbers in the block
rts List True All the retention times in the block
tics List False Every Spectrum's total intensity in the block
injectionTimes List False Every Spectrum's injection time in the block (C# and Java SDK only)
basePeakIntensities List True Every Spectrum's total base peak intensity in the block
basePeakMzs List True Every Spectrum's total base peak mz in the block
filterStrings List False Every Spectrum's filter string in the block
activators List False Every Spectrum's activator in the block
energies List False Every Spectrum's energy in the block
polarities List False Every Spectrum's polarity in the block
msTypes List False Every Spectrum's msType in the block
tags List False Used in StackZDPD, the original layers of every mz point (Python SDK only)
mzs List True Size for every m/z bytes size
ints List True Size for every intensity bytes size
mobilities List False Size for every ion mobility bytes size
cvList List<List> False PSI Controlled Vocabulary (Python SDK only)
features String False Some other features stored with "key:value;key:value" format

3.5 Instrument

Name Type Required Description
manufacturer String False Instrument manufacturer: "ABSciex","Thermo Fisher"
ionisation String False Ionisation method
resolution String False Resolution
model String False Instrument model
source List False Source: "electrospray ionization", "electrospray inlet"
analyzer List False Analyzer: "quadrupole", "orbitrap"
detector List False Detector: "inductive detector"

3.6 DataProcessing

Name Type Required Description
processingOperations List False Any additional manipulation not included elsewhere in the dataProcessing element

3.7 Software

Name Type Required Description
name String True The software name
version String False The software version
type String False The software function type, like "acquisition"

3.8 ParentFile

Name Type Required Description
name String True The filename
location String False The file location
type String False The file type

3.9 MobiInfo

Name Type Required Description
dictStart long True start position in the aird for mobi array
dictEnd long True end position in the aird for mobi array
unit String False ion mobility unit
type String False ion mobility type, see MobilityType

4 API Document

4.1 Parser Classes Overview

AirdSDK provides the following core Parser classes for different mass spectrometry data acquisition modes:

  • BaseParser: Abstract base class providing common spectrum reading functionality
  • DDAParser: DDA (Data-Dependent Acquisition) mode parser
  • DIAParser: DIA (Data-Independent Acquisition) mode parser
  • PRMParser: PRM (Parallel Reaction Monitoring) mode parser (inherits from DIAParser)
  • MRMParser: MRM/SRM (Multiple/Selected Reaction Monitoring) mode parser
  • MSIMaldiParser: MSI MALDI (Mass Spectrometry Imaging) mode parser

4.2 Load Aird Info into memory

    // Load DIA data
    DIAParser diaParser = new DIAParser("/FilePath/file.json");
    
    // Load DDA data
    DDAParser ddaParser = new DDAParser("/FilePath/file.json");
    
    // Load PRM data
    PRMParser prmParser = new PRMParser("/FilePath/file.json");
    
    // Load MRM data
    MRMParser mrmParser = new MRMParser("/FilePath/file.json");
    
    // Load MSI MALDI data
    MSIMaldiParser msiParser = new MSIMaldiParser("/FilePath/file.json");

4.3 Read AirdInfo

    DDAParser parser = new DDAParser(YOUR_AIRD_INDEX_FILE_PATH);
    AirdInfo airdInfo = parser.getAirdInfo();

4.4 Read Spectrum by Retention Time

    // Use BlockIndex and retention time to read single spectrum
    double rt = 12.3456;
    Spectrum spectrum = parser.getSpectrumByRt(blockIndex, rt);
    
    // Use multi-parameter version to read single spectrum
    Spectrum spectrum = parser.getSpectrumByRt(startPtr, rtList, mzOffsets, intOffsets, rt);

4.5 Read Spectrum by Index

    // Read spectrum by sequence number
    int index = 12;
    Spectrum spectrum = parser.getSpectrum(index);
    
    // Read spectrum by BlockIndex and block index
    Spectrum spectrum = parser.getSpectrumByIndex(blockIndex, index);

4.6 Read Multiple Spectra

    // Read all spectra from specified BlockIndex
    TreeMap<Double, Spectrum> spectraMap = parser.getSpectra(blockIndex);
    
    // Read spectra within specified retention time range
    TreeMap<Double, Spectrum> spectraMap = parser.getSpectra(start, end, rtList, mzOffsets, intOffsets);

4.7 DDA-Specific Operations

    // Get MS1 spectrum index
    BlockIndex ms1Index = ddaParser.getMs1Index();
    
    // Get all MS2 spectrum indexes
    List<BlockIndex> ms2Indexes = ddaParser.getAllMs2Index();
    
    // Read all DDA data into memory (recommended for small files <200MB)
    List<DDAMs> cycleList = ddaParser.readAllToMemory();
    
    // Get MS1 spectrum mapping
    TreeMap<Double, Spectrum> ms1Map = ddaParser.getMs1SpectraMap();

4.8 DIA/SWATH Operations

    DIAParser diaParser = new DIAParser("/FilePath/file.json");
    AirdInfo airdInfo = diaParser.getAirdInfo();
    
    // Read DIA window blocks one by one
    airdInfo.getIndexList().forEach(blockIndex -> {
        TreeMap<Double, Spectrum> map = diaParser.getSpectra(blockIndex); // key is retention time
    });

4.9 MRM-Specific Operations

    MRMParser mrmParser = new MRMParser("/FilePath/file.json");
    
    // Get chromatogram index
    ChromatogramIndex chromaIndex = mrmParser.getChromatogramIndex();
    
    // Get all MRM ion pairs
    List<MrmPair> mrmPairs = mrmParser.getAllMrmPairs();
    
    // Batch get chromatogram data
    HashMap<String, Xic> chromatograms = mrmParser.getChromatograms(start, end, keyList, rtOffsets, intOffsets);
    
    // Get chromatogram data for specified retention time range
    double[] rtData = mrmParser.getRts4Chroma(bytes, offset, length);
    double[] intensityData = mrmParser.getInts4Chroma(bytes, start, length);

4.10 MSI MALDI Operations

    MSIMaldiParser msiParser = new MSIMaldiParser("/FilePath/file.json");
    
    // Get MS1 index for MSI data
    BlockIndex ms1Index = msiParser.getMs1Index();
    
    // Read all MSI spectra into memory
    List<Spectrum> spectra = msiParser.readAllToMemory();
    
    // Get image data
    List<ImageData> imageData = msiParser.getImageDataList(mz, tolerance);

4.11 Data Processing Functions

    // Decompress M/Z data
    double[] mzValues = parser.getMzs(compressedBytes);
    double[] mzValues = parser.getMzs(compressedBytes, offset, length);
    int[] mzIntegerValues = parser.getMzsAsInteger(compressedBytes);
    
    // Decompress intensity data
    double[] intensities = parser.getInts(compressedBytes);
    double[] intensities = parser.getInts(compressedBytes, start, length);
    
    // Decompress mobility data
    double[] mobilities = parser.getMobilities(compressedBytes, start, length);
    
    // Calculate extracted ion chromatogram
    Xic xic = parser.calcXic(spectraMap, mzStart, mzEnd);

4.12 Resource Management

    // Close resources when done
    parser.close();

5 Detailed Documentation

5.1 Multi-language SDK Documentation

Java SDK Documentation

C# SDK Documentation

Python SDK Documentation

5.2 Project Structure

Aird-SDK/
├── CSharpSDK/          # C# SDK Source Code
├── JavaSDK/            # Java SDK Source Code
├── PyAirdSDK/          # Python SDK Source Code
├── docs/               # Documentation Directory
│   ├── Java/           # Java SDK Documentation
│   ├── CSharp/         # C# SDK Documentation
│   └── Python/         # Python SDK Documentation
└── README.md           # Project Overview

5.3 Supported Parser Classes

All SDKs support the following core Parser classes:

Base Parsers

  • BaseParser - Base class for all Parser classes, providing common functionality

Data Acquisition Mode Parsers

  • DDAParser - Data-Dependent Acquisition (DDA) mode
  • DIAParser - Data-Independent Acquisition (DIA) mode
  • MRMParser - Multiple Reaction Monitoring (MRM) mode
  • PRMParser - Parallel Reaction Monitoring (PRM) mode

Advanced Feature Parsers

  • DDAPasefParser - DDA-PASEF mode (with ion mobility)
  • DIAPasefParser - DIA-PASEF mode (with ion mobility)
  • MSIMaldiParser - MALDI imaging
  • ColumnParser - Column data parsing

Sample Code

Detail sample code. See net.csibio.aird.sample.SampleCode

About

Support for Java, C#, Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 8