Generate entrapment database and calculate false discovery proportion (FDP)
Given each protein in a target database, digest it, shuffle the peptides, and then put the peptides back into proteins. Each peptide is shuffled at most 10 times to get a unique sequence. Depending on the parameter, one target protein can generate multiple entrapment proteins.
Usage:
java -cp EntrapBench.jar entrapment.GenerateDatabase <UniProt fasta file path> <cut sites> <protect sites> <cleavage from C-term: 0=false, 1 = true> <number of entrapment proteins for each target protein> <entrapment prefix> <add prefix>
Example: java -cp EntrapBench.jar entrapment.GenerateDatabase uniprot_human.fasta KR P 1 5 entrapment 0 # Each target protein generates 5 different shuffled entrapment proteins.
Target+entrapment FASTA file example
>sp|A0A2R8Y619|H2BK1_HUMAN Histone H2B type 2-K1 OS=Homo sapiens OX=9606 GN=H2BK1 PE=3 SV=1
MSAEYGQRQQPGGRGGRSSGNKKSKKRCRRKESYSMYIYKVLKQVHPDIGISAKAMSIMNSFVNDVFEQLACEAARLAQYSGRTTLTSREVQTAVRLLLPGELAKHAVSEGTKAVTKYTSSK
>sp|entrapment_0_A0A2R8Y619|entrapment_0_H2BK1_HUMAN Histone H2B type 2-K1 OS=Homo sapiens OX=9606 GN=entrapment_0_H2BK1 PE=3 SV=1
MGQASYERPQGGQRGGRSNSGKKSKKRCRRKYYSMSIEYKVLKSGIDQAPHIVKCESMVQVLDSEANMANIFAFARSGALYQRLTTSTRAVVETQRALPGLLLEKATHEGSVKATVKYSTSK
>sp|entrapment_1_A0A2R8Y619|entrapment_1_H2BK1_HUMAN Histone H2B type 2-K1 OS=Homo sapiens OX=9606 GN=entrapment_1_H2BK1 PE=3 SV=1
MSAQYEGRGQPQGRGGRNGSSKKSKKRCRRKSSYIYYMEKVLKAIHDPGSIQVKCLEAANASEVFDSANIQMFMVRLAGYQSRSTLTTREVVTQARLPALLEGLKSHAGVTEKAVTKSSTYK
>sp|entrapment_2_A0A2R8Y619|entrapment_2_H2BK1_HUMAN Histone H2B type 2-K1 OS=Homo sapiens OX=9606 GN=entrapment_2_H2BK1 PE=3 SV=1
MSQEGYARGPGQQRGGRNGSSKKSKKRCRRKSYEYYMSIKVLKGHQIDISAVPKFNVASLQEASCAENMIMFAVDRLGQSYARTLSTTREVQATVRLALLEGPLKGAHEVTSKATVKSSTYK
>sp|entrapment_3_A0A2R8Y619|entrapment_3_H2BK1_HUMAN Histone H2B type 2-K1 OS=Homo sapiens OX=9606 GN=entrapment_3_H2BK1 PE=3 SV=1
MAGESQYRGQPGQRGGRSNSGKKSKKRCRRKYIMSYESYKVLKQGIDAHPVSIKADQFEIMAFNNVMVCAESSLARASQLGYRSTLTTRVAQEVTRALGPLLLEKEASVTHGKTAVKYSTSK
>sp|entrapment_4_A0A2R8Y619|entrapment_4_H2BK1_HUMAN Histone H2B type 2-K1 OS=Homo sapiens OX=9606 GN=entrapment_4_H2BK1 PE=3 SV=1
MEQSYGARQGQGPRGGRSGNSKKSKKRCRRKEYYSMSYIKVLKIHGSVQDPAIKFECQSANLIEVSAAVDNMAFMRQGLYSARTSTTLRQVETAVRLALLGPELKVATHEGSKATVKYSSTK
Given a target+entrapment database and DIA-NN's report.tsv
, calculate the false discovery proportion related estimations using the equations in Wen et al. (2024)
"combined" method:
Lower bound:
"sample" method:
where
Disclaimer: The equation may be slightly different depending on different target-decoy approaches and the interpretations of the false matches.
Usage:
java -cp EntrapBench.jar entrapment.CalculateFDP <fasta file path> <entrapment prefix> <result file path> <run precursor FDR> <global precursor FDR> <run protein group FDR> <global protein group FDR>
Example: java -cp EntrapBench.jar entrapment.CalculateFDP uniprot_human.fasta entrapment report.tsv 0.01 0.01 0.01 0.01
java -cp EntrapBench.jar entrapment.DiannEntrapmentQValue <entrapment prefix> <entrapment to target ratio> <run-wise precursor q-value threshold> <global precursor q-value threshold> <run-wise protein q-value threshold> <global protein q-value threshold> <result file path> <output file path>
Example: java -cp EntrapBench.jar entrapment.DiannEntrapmentQValue entrapment 1 0.01 0.01 0.01 0.01 report.tsv entrapment_q_values.csv
Note: the "target" here is different from the term "target" in the target-decoy database searching approach. To use this target+entrapment database in the target-decoy approach, need to generate decoy proteins (beforehand or on-the-fly by the tool itself) for both target and entrapment proteins.