You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On some datasets THetA appears to exclude most intervals and fails as a result. Output:
=================================================
Arguments are:
Query File: /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work/theta_input.seg
k: 3
tau: 2
Output Directory: /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work
Output Prefix: theta_results
Num Processes: 1
Graph extension: .pdf
Force: True
Valid sample for THetA analysis:
Ratio Deviation: 0.1
Min Fraction of Genome Aberrated: 0.05
Program WILL cluster intervals.
=================================================
Reading in query file...
Frac with potential copy numbers: 0.364559375662
Reading SNP file at /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work/normal_alleles.tsv
Reading SNP file at /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work/tumour_alleles.tsv
Reading interval file at /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work/theta_input.seg
Calculating BAFs
Determining heterozygosity.
Calculating BAFs.
First round of clustering...
Begin meta clustering...
Classifying clusters...
Plotting classifications...
Determining copy number bounds...
Plotting clusters...
Selecting meta-intervals...
Selected 1 intervals for analysis.
Preprocessing data...
Writing bounds file to /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work/theta_results.n2.withBounds
Estimating time...
Estimated Total Time: 0 second(s)
Performing optimization...
Writing results file to /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work/theta_results.n2.results
Plotting results as a .pdf...
Writing script to run N=3 to /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work/theta_results.RunN3.bash
Frac with potential copy numbers: 0.364559375662
Reading SNP file at /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work/normal_alleles.tsv
Reading SNP file at /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work/tumour_alleles.tsv
Reading interval file at /genesis/shahlab/amcpherson/remixt_data/tmp/comparison/comparison_3/tmp/sim_id/test_proportion_subclonal_2_2/tool_name/theta/create_tool_workflow/sample_id/tumour/theta_work/theta_input.seg
Calculating BAFs
Determining heterozygosity.
Calculating BAFs.
First round of clustering...
Begin meta clustering...
Classifying clusters...
Plotting classifications...
Determining copy number bounds...
Plotting clusters...
Selecting meta-intervals...
Selected 0 intervals for analysis.
--- stderr ---
Traceback (most recent call last):
File "/genesis/shahlab/amcpherson/anaconda/envs/remixtman/src/theta/bin/../python/RunTHetA.py", line 504, in <module>
main()
File "/genesis/shahlab/amcpherson/anaconda/envs/remixtman/src/theta/bin/../python/RunTHetA.py", line 284, in main
resultsfile3, boundsfile3 = run_fixed_N(3, args, intervals, resultsfile2)
File "/genesis/shahlab/amcpherson/anaconda/envs/remixtman/src/theta/bin/../python/RunTHetA.py", line 366, in run_fixed_N
select_meta_intervals_n3(lengths, tumorCounts, normCounts, m, k, force, num_intervals, cluster_scores, lower_bounds, upper_bounds)
File "/shahlab/amcpherson/anaconda/pkgs/theta-0.7-1/src/theta/python/SelectIntervals.py", line 195, in select_meta_intervals_n3
return [column(topLines, i) for i in range(len(topLines[0])-1)]
IndexError: list index out of range
Note that this only occurs if I am providing allele counts. With just the intervals it works fine.
Any suggestions or debug output to look at would be helpful.
The text was updated successfully, but these errors were encountered:
As a preprocessing step, THetA clusters intervals together into "meta-intervals" that are estimated to have the same copy number profile. It does this iteratively, first by clustering intervals for each chromosome, then across the entire genome. The situation that you're running into indicates that all the input intervals had only one cluster. Thus, in your case, either your data doesn't have any large intervals with CNAs, or we are overclustering and merging intervals that should be distinct (or something else is going wrong).
Without the allele counts, we do not cluster so it would make sense that you do not run into this issue running without them, but if there were no copy number abberations, then the resulting solution may simply overfit the data.
We can visually inspect to see what is happening in your case. The clustering outputs a file {PREFIX}_assignment.png. This is a plot of read-depth ratio (x-axis) and average allele frequency deviation (distance from 0.5) on the y-axis. Each point is an interval. The color indicates the cluster that each mutation was assigned to. I attached an example below:
If it looks like there should indeed be more than one cluster, please let me know and I can look into why clustering is failing to produce reasonable results.
On some datasets THetA appears to exclude most intervals and fails as a result. Output:
Note that this only occurs if I am providing allele counts. With just the intervals it works fine.
Any suggestions or debug output to look at would be helpful.
The text was updated successfully, but these errors were encountered: