Skip to content

AndrewIsaac/HiTIME-CPP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

travis

HiTIME

HiTIME is a program for identifying twin-ion signals in Liquid chromatography-mass spectrometry (LCMS) data. HiTIME is designed to be integrated with existing analysis pipelines, such as those created with OpenMS.

HiTIME filters twin-ion signals in LCMS data. This process re-weights each data point to a Z-score of how well the point matches an idealised twin-ion signal versus alternative ion signatures.

  • The hitime program takes an mzML file as input and produces an mzML file as output. Intentsity values in the input file which correspond to the lower mass in a twin-ion peak are retained and scored highly in the output, all other intensities are downweighted towards zero.

License

HiTIME is released as open source software under the terms of the 3-Clause BSD License. See the contents of the file called LICENSE in the top level of the source code repository for a copy of the terms.

Installation

HiTIME is provided as a Docker container, which can be installed like so (assuming you have docker installed on your computer):

docker pull bjpop/hitime

Docker can be installed on all modern operating systems. Please review the Docker installation instructions for more information.

Building HiTIME from source

HiTIME can be build from source by following the instructions in the notes folder:

  • Mac OS X: notes/build.osx.sh
  • Linux: notes/build.linux.sh

Builiding HiTIME using Docker

Run this command in the top-level directory of the source tree:

docker build -t bjpop/hitime .

Example docker usage

We provide a convenient wrapper script to run the HiTIME docker container.

Test data is included in the data folder within the repository. Running the following command should produce meaningful output saved in results.mzML.

./hitime-docker.sh -i data/testing.mzML -o results.mzML -- -d 6.0201 -r 17 -m 150

You might see some warnings in the output which complain about the format the of input testing.mzML file. You can safely ignore them. It is just OpenMS being strict about the format of the file.

Usage

  -h, --help            Show this help information.
  -l, --listmax         Flag, only output list of local maximum in window
                        defined by M/Z width and retention time width. Default:
                        not set
  -a, --iratio arg      Ratio of doublet intensities (isotope / parent).
                        Defaults to 1.000000
  -r, --rtwidth arg     REQUIRED: Full width at half maximum for retention
                        time in number of scans. Eg: 17
  -m, --mzwidth arg     REQUIRED: M/Z full width at half maximum in parts per
                        million. Eg: 150. If '--listmax', then upper and
                        lower M/Z offset, e.g. 0.25
  -d, --mzdelta arg     REQUIRED: M/Z delta for doublets. Eg: 6.020100
  -z, --confidence arg  Lower confidence interval to apply during scoring (In
                        standard deviations, e.g. 1.96 for a 95% CI).
                        Default: ignore confidence intervals
      --debug           Generate debugging output
      --version         Print version number and exit
  -j, --threads arg     Number of threads to use. Defaults to 1
  -c, --cache arg       Number of input spectra to retain in cache. Defaults
                        to 50
  -i, --infile arg      Input mzML file
  -o, --outfile arg     Output mzML file

For example:

hitime -j 4 -i data/testing.mzML -o results.mzML -d 6.0201 -r 17 -m 150

for a computation using 4 threads, where data/testing.mzML contains the input mass spectrometry data in mzML format, and the output file is called results.mzML.

The parameters defining the taget twin-ion signal are, -d 6.0201 the M/Z diference between the natural and heavy isotope versions of the precursor, -r 17 the retention time (RT) full width half maximum (FWHM) size in number of RT steps (scans), -m 150 the M/Z FWHM size in parts per million (ppm). These values can be determined by measurement of the precursor signal in standard visulisation software.

Local Maxima

HITIME can also be used to filter the data to only output the data point that has the largest value in a region defined by the Retention Time (RT) full width half maximum (FWHM) size, and the M/Z FWHM bounds (+/- bound). E.g.:

hitime -i results.mzML -o max.results.mzML -r 17 -m 0.25 --listmax

This will produce two files, max.results.mzML and max.results.csv. The CSV file is a comma separated text file listing the local maxima. This list can be sorted to help identify the strongest twin-ion signal matches. The fields are RT, M/Z, score.

Indexing your input mzML file:

HITIME assumes that the input mzML file is indexed. To index an input file, the OpenMS FileConverter tool can be used, eg:

FileConverter -ini openms.ini -in input.mzML -out input_indexed.mzML

An example openms.ini can be found in notes/openms.ini.

Data Sets

Small dataset (1.9 MB)

curl -sL https://melbourne.figshare.com/ndownloader/files/14131079 > BSA_SUBSET.mzML

Larger dataset (82 MB)

A pure protein that was reduced, reacted with heavy/light paracetamol and then digested. It contains lots of twin ions. Because these twin-ions are mostly peptide derivatives, they come in multiple charge states which effectively gives us a few different doublet spacings. There are many separated by 6.0201, a good number separated by 3.01005 and a few separated by 2.0067. We have removed any points with intensity <1000 to make the file smaller.

curl -sL https://melbourne.figshare.com/ndownloader/files/14131085 > BSA_FULL.mzML

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 79.2%
  • Shell 9.4%
  • Python 8.3%
  • CMake 2.5%
  • Other 0.6%