Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pruning] Refactor phasing #125

Open
wants to merge 55 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
c1342d4
cleanup
zining01 Apr 30, 2021
c7744ee
get rid of unused parameters
zining01 Apr 30, 2021
a25e97e
initial commit apps.R
zining01 May 1, 2021
f582cd4
initial commit gurobi wrapper
zining01 May 2, 2021
c136076
expose threads parameter
zining01 May 3, 2021
a9313a5
add gurobi option to balance
zining01 May 3, 2021
ce35a7d
bug fix in setting up parameters
zining01 May 3, 2021
c335155
UDKJF
zining01 May 3, 2021
4038312
remove Rcplex dependency
zining01 May 3, 2021
7ef4cbd
remove rcplex dependency
zining01 May 3, 2021
f1beeef
add telomeric annotation to loose ends
zining01 May 10, 2021
c82a4e0
add flag to allow cnloh in phased balance
zining01 May 17, 2021
58cb510
merge master
zining01 May 24, 2021
d63c1a5
merge marcin's latest in peel
zining01 May 24, 2021
ea54ad0
initial commit
zining01 May 24, 2021
0dcc42a
add working code for parental graph sim, add force flag to ALT edges …
zining01 May 24, 2021
6406428
fix balance code for edge marginals
zining01 May 25, 2021
b72b967
cleanup parental, expose fix.emarginal in simulate.hets
zining01 May 25, 2021
8adcaa9
merge master
zining01 May 26, 2021
f341b2e
merge master
zining01 May 26, 2021
f4f19a6
merge parental
zining01 May 27, 2021
d0ec5a3
update comments
zining01 Jun 15, 2021
c7553c7
merge master
zining01 Jun 15, 2021
8851640
add basic code for cnloh to phased.binstats
zining01 Jun 24, 2021
8fa33e5
fix bug in phased.binstats
zining01 Jun 26, 2021
03acd1c
merge master
zining01 Jun 26, 2021
be13559
update phased.binstats and balance to penalize CNLOH and reduce false…
zining01 Jul 1, 2021
7c356ad
update phased.postprocess to disjoin gGraph before compressing alleli…
zining01 Jul 5, 2021
be1d2f3
cnloh detection update
zining01 Jul 14, 2021
2d2bd2f
merge latest from master
zining01 Jul 14, 2021
d47a83d
fix no ALT edges CNLOH bug
zining01 Jul 20, 2021
7c3f94b
merge master
zining01 Jul 22, 2021
901ce1b
merge latest
zining01 Jul 22, 2021
350c6c2
merge master with marcin's latest
zining01 Jul 24, 2021
0122c0f
start new postprocessing fx
zining01 Jul 26, 2021
e6f5ed0
merge master
zining01 Jul 26, 2021
ca02541
bug fixes in postprocess
zining01 Jul 26, 2021
356b453
stash changes to apps.R
zining01 Jul 29, 2021
bca20a1
merge master
zining01 Jul 30, 2021
2a0583d
update amp, modify NA node marking in phased.postprocess
zining01 Jul 30, 2021
ff49d3e
bug fix
zining01 Aug 4, 2021
970fbd3
stash most recent for amps)
zining01 Aug 11, 2021
e216f0c
latest with pre-balance collapsed nodes
zining01 Aug 13, 2021
1ed8c74
stash new balance
zining01 Sep 1, 2021
beeaf1c
merge master
zining01 Sep 1, 2021
d3ba43b
initial commit for phasing with haplotypes
zining01 Sep 1, 2021
63592c1
stash changes for phasing
zining01 Sep 13, 2021
b00cc86
merge latest from master
zining01 Feb 10, 2022
e0e80de
update branch
zining01 Mar 22, 2022
bbc6ffd
remove some merge conflict symbols
zining01 Mar 22, 2022
a4e76e3
add everything
zining01 Apr 19, 2022
858c3a6
update binstats
zining01 Apr 21, 2022
b4e9620
fix bugs, initial working balance
zining01 Apr 21, 2022
a8929eb
check in changes for phasing
zining01 May 30, 2022
b9359b1
Update converters.R
SplitInf May 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,993 changes: 1,721 additions & 272 deletions R/apps.R

Large diffs are not rendered by default.

1,856 changes: 0 additions & 1,856 deletions R/apps.txt

This file was deleted.

63 changes: 63 additions & 0 deletions R/balance.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#' @name balance2
#' @title balance2
#'
#' @description
#'
#' beta reimplementation of balance
#'
#' @param gg (gGraph) with node fields
#' - $cn (numeric) CN guess
#' - $weight (numeric)
#' @param lambda
#' @param marginal
#' @param phased (logical) default FALSE
#' @param ism (logical) default TRUE
#' @param M (numeric) default 1000
#' @param epgap (numeric) default 0.1
#' @param trelim (numeric) max size of uncompressed tree in GB
#' @param nodefileind (numeric) one of 0 (no node file) 1 (in memory compressed) 2 (on disk uncompressed) 3 (on disk compressed) default 1
#' @return junction-balanced gGraph
balance2 = function(gg = NULL,
lambda = 100,
marginal = NULL,
phased = FALSE,
ism = TRUE,
M = 1000,
epgap = 0.1,
trelim = 16,
nodefileind = 1) {

}

#' @name check_balance_inputs
#' @title check_balance_inputs
#'
#' @description
#'
#' Validate input gGraph for balance
#'
#' @param gg (gGraph)
check_balance_inputs = function(gg) {
}

#' @name create_node_variables
#' @title create_node_variables
#'
#' @description
#' create node CN, node residual, and helper variables for nodes
#'
#' @param gg
#' @return data.table
create_node_variables = function(gg) {
}

#' @name create_edge_variables
#' @title create_edge_variables
#'
#' @description
#' edge CN, edge residual, and helper variables for edges
#'
#' @param gg
#' @return data.table
create_edge_variables = function(gg) {
}
1 change: 1 addition & 0 deletions R/converters.R
Original file line number Diff line number Diff line change
Expand Up @@ -937,6 +937,7 @@ read.juncs = function(rafile,
while (grepl("^((#)|(chrom)|(chr))", thisline)) {
headers = c(headers, thisline)
thisline = readLines(f, 1)
if(length(thisline)==0) break
}
ln = sum(length(headers), length(thisline))
while (length(thisline) > 0) {
Expand Down
26 changes: 20 additions & 6 deletions R/eventCallers.R
Original file line number Diff line number Diff line change
Expand Up @@ -2523,8 +2523,10 @@ amp = function(gg, jcn.thresh = 8, cn.thresh = 2, fbi.cn.thresh = 0.5, n.jun.hi
ploidy = gg$nodes$dt[!is.na(cn), sum(cn*as.numeric(width))/sum(as.numeric(width))]
keep = (gg$nodes$dt$cn/ploidy) > cn.thresh
gg$clusters(keep)
if (!any(!is.na(gg$nodes$dt$cluster)))

if (!any(!is.na(gg$nodes$dt$cluster))) {
return(gg)
}

tiny = gg$edges$mark(tiny = gg$edges$dt$class %in% c('DEL-like', 'DUP-like') & gg$edges$span <1e4)
ucl = gg$nodes$dt[!is.na(cluster), .(wid = sum(width)), by = cluster][wid > width.thresh, cluster] %>% sort
Expand All @@ -2536,6 +2538,7 @@ amp = function(gg, jcn.thresh = 8, cn.thresh = 2, fbi.cn.thresh = 0.5, n.jun.hi
return(NULL)
if (!length(cl.edges))
return(NULL)

data.table(cluster = cl,
nodes = paste(cl.nodes$dt$node.id,
collapse = ","),
Expand All @@ -2545,6 +2548,14 @@ amp = function(gg, jcn.thresh = 8, cn.thresh = 2, fbi.cn.thresh = 0.5, n.jun.hi
n.jun = length(cl.edges),
n.jun.high = sum(cl.edges$dt[, sum(cn > 3)]),
max.jcn = max(c(0, cl.edges$dt$cn)),
n.nodes = cl.nodes$dt[, .N],
n.inv = cl.edges$dt[class == "INV-like", .N],
n.del = cl.edges$dt[class == "DEL-like", .N],
n.dup = cl.edges$dt[class == "DUP-like", .N],
n.tra = cl.edges$dt[class == "TRA-like", .N],
width = sum(cl.nodes$dt[, width], na.rm = TRUE),
n.chr = length(unique(cl.nodes$dt[, seqnames])),
max.loose.cn = max(c(0, cl.nodes$dt[, loose.cn.left], cl.nodes$dt[, loose.cn.right]), na.rm = TRUE),
max.cn = max(cl.nodes$dt$cn),
footprint = paste(gr.string(cl.nodes$footprint),
collapse = ","))
Expand All @@ -2553,17 +2564,20 @@ amp = function(gg, jcn.thresh = 8, cn.thresh = 2, fbi.cn.thresh = 0.5, n.jun.hi
if (nrow(amps))
{
if (!mark.nos) {
amps = amps[max.jcn >= jcn.thresh,]
amps = amps[max.jcn >= jcn.thresh,
.(nodes, edges, fbi.cn, n.jun, n.jun.high, max.jcn)]
##amps[max.jcn >= jcn.thresh, ]
} else {
## keep only clusters with a sufficient number of nodes but don't filter by jcn
amps = amps[max.jcn >= jcn.thresh |
(n.jun >= min.jun &
(strsplit(nodes, ",") %>% lapply(length) %>% unlist) >= min.nodes),]
amps = amps[max.jcn >= jcn.thresh | (n.jun >= min.jun & n.nodes >= min.nodes)]
amps[, fbi.frac := fbi.cn / max.cn]
amps[, inv.frac := n.inv / n.jun]
amps[, dup.frac := n.dup / n.jun]
amps[, del.frac := n.del / n.jun]
amps[, tra.frac := n.tra / n.jun]
}
}


## implementing decision tree in https://tinyurl.com/srlbkh2
if (nrow(amps))
{
Expand Down
34 changes: 13 additions & 21 deletions R/gGnome.R
Original file line number Diff line number Diff line change
Expand Up @@ -5054,16 +5054,17 @@ gGraph = R6::R6Class("gGraph",
## cap individual signed edge.ids to their capacity
ub[1:nrow(ed)] = pmin(ub[1:nrow(ed)], ed[[cfield]])
}
## sol = Rcplex::Rcplex(Amat = Amat,
sol = Rcplex2(Amat = Amat,
lb = rep(0, ncol(Amat)),
ub = ub,
bvec = b$bvec,
sense = b$sense,
vtype = 'I',
control = list(trace = verbose > 1),
objsense = ifelse(max, 'max', 'min'),
cvec = cvec)

sol = Rcplex2(cvec = cvec,
Amat = Amat,
bvec = b$bvec,
lb = rep(0, ncol(Amat)),
ub = ub,
sense = b$sense,
vtype = 'I',
control = list(trace = verbose > 1),
objsense = ifelse(max, 'max', 'min'))


## find edges used in the solution
opted = ed[round(sol$xopt[1:.N])!=0, ]
Expand Down Expand Up @@ -7023,21 +7024,12 @@ lengths = function(x, use.names = T) {
UseMethod("lengths")
}

#' @name lengths
#' @title lengths
#' @description
#' A vector of walk lengths associated with this walk
#'
#' @param gWalk a \code{gWalk} object
#'
#' @return the number of nodes per walk in the gWalk
#' @export

`lengths.gWalk` = function(x, use.names = FALSE){
return(x$lengths)
}



#' @name length
#' @title length
#' @description
Expand Down Expand Up @@ -8352,7 +8344,7 @@ gWalk = R6::R6Class("gWalk", ## GWALKS
sense = ll$sense
vtype = ll$vtype
}
## sol = Rcplex::Rcplex(cvec = c,

sol = Rcplex2(cvec = c,
Amat = A,
bvec = b,
Expand Down
67 changes: 67 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,73 @@
## appease R CMD check vs data.table
sid=side1=side2=side_1=side_2=silent=snid=splice.variant=splicevar=str1=str2=strand_1=strand_2=subject.id=suffix=tag=threep.cc=threep.coord=threep.exon=threep.frame=threep.pc=threep.sc=threep.sc.frame=to=transcript.id.x=transcript.id.y=transcript_associated=transcript_id=twidth=tx.cc=tx.ec=tx_strand=tx_strand.x=tx_strand.y=txid=type=uid=uids=val=walk.id=walk.iid=walk.iid.x=walk.iid.y=wkid=NULL

#' @name run.gurobi
#' @title run.gurobi
#'
#' @description
#' wrapper to run gurobi with CPLEX-like function call for easy switching bw optimizers
#'
#' @param cvec (numeric)
#' @param Amat (sparse matrix)
#' @param bvec (numeric)
#' @param Qmat (sparse matrix)
#' @param lb (numeric)
#' @param ub (numeric)
#' @param sense (character)
#' @param vtype (variable type)
#' @param objsense (character) default min
#' @param control (list) should have epgap, tilim, trace, ideally
#' @param threads (numeric)
#'
#' @return sol - list with names $x, $epgap, $status
run.gurobi = function(cvec = NULL,
Amat = NULL,
bvec = NULL,
Qmat = NULL,
lb = NULL,
ub = NULL,
sense = NULL,
vtype = NULL,
objsense = 'min',
control = list(epgap = 1e-2, tilim = 360, trace = 2),
threads = 32) {

## build model
model = list(
obj = cvec,
A = Amat,
rhs = bvec,
Q = Qmat,
lb = lb,
ub = ub,
vtype = vtype,
sense = c("E"="=", "G"=">", "L"="<")[sense], ## inequalities are leq, geq (e.g. not strict)
modelsense = objsense)

## params
params = list()
if (!is.null(control$epgap)) {
params$MIPGap = control$epgap
}
if (!is.null(control$tilim)) {
params$TimeLimit = control$tilim
}
if (!is.null(control$trace)) {
params$LogToConsole = ifelse(control$trace > 0, 1, 0)
}

## TODO: set up env list for running on compute cluster

## run gurobi
sol = gurobi::gurobi(model = model, params = params)

## make solution consistent with Rcplex output
## but return all the things
sol$epgap = sol$mipgap

return(sol)
}


#' @name duplicated.matrix
#' @title R-3.5.1 version of duplicated.matrix
Expand Down
3 changes: 0 additions & 3 deletions srcs/src.blank/Makevars

This file was deleted.

3 changes: 0 additions & 3 deletions srcs/src.blank/Makevars.in

This file was deleted.

7 changes: 0 additions & 7 deletions srcs/src.blank/Makevars.win

This file was deleted.

23 changes: 0 additions & 23 deletions srcs/src.blank/Rcplex2.c

This file was deleted.

6 changes: 0 additions & 6 deletions srcs/src.blank/Rcplex2.h

This file was deleted.