Potential bug found in merge.R #3

bentjordan · 2023-06-08T14:06:51Z

Original merge.R had potential bug where sample names of merged matrix were determined by the file sample_names.txt. This file is provided by the user and is directly used to name the samples. If the order of samples in sample_names.txt does not match the order files are read into the script the read counts will not match up with the actual samples.

Original merge.py script

files =list.files(path="star_align",recursive=T,pattern="ReadsPerGene.out.tab",full.names=T)

for (file in files) {
    if (!exists("rc")) {
        rc=read.table(file,header=F,row.names=1,colClasses=c(NA,"NULL",NA,"NULL"),col.names=c("gene_id","NULL",file,"NULL"))
    } else if (exists("rc")) {
        temp_rc=read.table(file,header=F,row.names=1,colClasses=c(NA,"NULL",NA,"NULL"),col.names=c("gene_id","NULL",file,"NULL"))
        rc=cbind(rc,temp_rc)
        rm(temp_rc)
    }
}
head(rc)
ncol(rc)
sample_names=read.table("sample_names.txt")
sample_names
colnames(rc)=sample_names[, 1]
head(rc)
write.csv(rc,"reads_count/reads_count.csv")

The text was updated successfully, but these errors were encountered:

bentjordan · 2023-06-08T14:10:36Z

Updated merge.py script

The updated merge.py script extracts the sample name from the input file. This ensures each sample is properly represented in the read counts matrix

## library is sense stranded
library(stringr)
files <- list.files(
        path = "star_align",
        recursive = TRUE,
        pattern = "ReadsPerGene.out.tab",
        full.names = TRUE)
sample_names <- list()
for (file in files) {
        basename <- basename(file)
        sample <- str_replace(basename, "ReadsPerGene.out.tab", "")
        sample_names <- append(sample_names, sample)
        if (!exists("rc")) {
                rc <- read.table(
                        file, header = FALSE, row.names = 1,
                        colClasses = c(NA,"NULL",NA,"NULL"),
                        col.names=c("gene_id", "NULL", file, "NULL"))
        } else if (exists("rc")) {
                temp_rc=read.table(
                        file,header = FALSE, row.names = 1,
                        colClasses = c(NA, "NULL", NA, "NULL"),
                        col.names=c("gene_id", "NULL", file, "NULL"))
                rc <- cbind(rc, temp_rc)
                rm(temp_rc)
        }
}

colnames(rc) <- sample_names
dir.create(file.path("reads_count"), showWarnings = FALSE)
write.csv(rc, "reads_count/reads_count.csv")

bentjordan added a commit that referenced this issue Jun 8, 2023

#3 Fix. Use name of file to extract sample name

584af3c

bentjordan added the bug Something isn't working label Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug found in merge.R #3

Potential bug found in merge.R #3

bentjordan commented Jun 8, 2023 •

edited

Loading

bentjordan commented Jun 8, 2023

Potential bug found in merge.R #3

Potential bug found in merge.R #3

Comments

bentjordan commented Jun 8, 2023 • edited Loading

bentjordan commented Jun 8, 2023

bentjordan commented Jun 8, 2023 •

edited

Loading