Next – Generation Sequencing

LAB
Free
  • 27 lessons
  • 0 quizzes
  • 10 week duration

Introduction to R Programming

brew install R
https://www.rstudio.com/products/rstudio/download/

Open RStudio and install Bioconductor Installer by typing the following command in the console
panel.

source(“https://bioconductor.org/biocLite.R”)
biocLite()

Now install DESeq 2 package by typing the following command in console panel.

biocLite(“DESeq2”)

The installation should end with *Done (DESeq2). We can check if the package was installed or not
by loading it into the R environment by typing the following command.

Require(DESeq2)

If a package is not installed, “there is no package called DESeq” will be returned. In such instance,
one needs to check the warning or error messages at the end of the package installation. Now import
the tab delimited text file (merged.transcripts.txt) generated using featureCounts by clicking on
“Import Dataset” in the Environment panel and selecting “From CSV” options from the dropdown
list. In the import window change import options “Delimiter:” to Tab and “Comment:” to # (Fig. 4).
Upon successful import, the data will be shown in a panel above the console panel and environment
panel will have new dataset entry.

Check the dimension of the data frame “merged_transcripts” by typing

dim(merged_transcripts)

which returns
[1] 196678 8

There are total 196,678 rows and 8 columns in our data frame merged_transcripts. We only require
the counts data for differential expression analysis hence, let us create a new data frame “countdata”
by retaining columns “Geneid”, and two more colums containing count data of our normal and
tumor samples
.
countdata=merged_transcripts[,c(1,7,8)]

Above command instructs R to retain columns 1, 7, 8 in the merged_transcripts data frame and save
them to new data frame “countdata”. Now check the dimension of countdata.

For DESeq analysis, we need a matrix as input so, transform countdata data frame into matrix of
counts with transcript ids as its row names.

y=as.matrix(countdata[,c(2,3)])
rownames(y)=countdata$Geneid

Change the coloumn name of y to sample names “normal” and “tumor”
colnames(y)=c(“normal”,“tumor”)
Create comparison groups and set the column name to condition and row names to sample id.
group=as.matrix(c(“normal”,“tumor”))
colnames(group)=”condition”
rownames(group)=c(“normal1”,“tumor1”)
If we have biological replicates then the above command will change to
group=as.matrix(c(“normal”, “normal”, “normal”, “tumor”, “tumor”, “tumor”))
colnames(group)= “condition”
rownames(group)=c(“normal1”, “normal2”, “normal3”, “tumor1”, “tumor2”, “tumor3”)

Load DESeq2 if not already loaded and create DESeqDataSet

Require(DESeq2)

dde=DESeqDataSetFromMatrix(countData = y, colData = group, design = ~condition)

Transcripts which have low read counts need to be filtered before differential expression analysis.
Let us retain transcripts which have at least 1 read in any of the samples and store them to “dds” and
compare dimensions of data before and after filtering.

dds=dde[ rowSums(counts(dde)) > 1, ]
dde
dds

We can see only 819 transcripts out of 196,678 have at least 1 read in at least one of the samples.
Now specify the condition that should be considered as reference for differential expression
analysis.

dds$condition<- relevel(dds$condition, ref=“normal”) Perform Differential expression (DE) analysis dds=DESeq(dds)