- 27 lessons
- 0 quizzes
- 10 week duration
Overview
Module 1
Module 2
Module 3
Module 4
Module 5
Module 6
Module 7
Module 8
Module 9
RNASeq Pipeline using edgeR
Practice
Practice
For this practice we use Picard tools [20] to estimate the fraction of PCR duplicates in the mapped
reads. We need to sort SAM files by coordinate and convert them to Binary Alignment/Map (BAM)
All copyrights reserved to BioNome
format as Picard takes sorted BAM format file as input. To work with SAM and BAM files we will
use sambamba [21] and hence install both programs by typing the following commands in terminal.
brew install sambamba
brew install picard-tools
To convert SAM file to BAM use the following command.
sambamba view -f bam -S sam-o normal.bamnormal.sam
-f output file format
-S input file format
-o output file name
input file name
Now sort the bam file by coordinate
sambamba sort normal.bam
This command generates sorted bam file and index information of the bam file normal.sorted.bam
and normal.sorted.bam. bai respectively. This bam file can be visualized using Integrative Genomics
Viewer (IGV) [22]. Now use Picard tools to estimate duplicates percentage.
picardMarkDuplicates I=normal.sorted.bam O=markdup.normal. sorted.bam
M=markdup.normal.txt
I sorted BAM file
O duplicates marked BAM file
M metrics output file name
The metrics file contains details of number of duplicate read pairs, single tons, etc. In our “normal
sample” ~13% of reads are duplicated. To check the file, open it with excel or any other spreadsheet
app. In this practice we are not going to remove duplicates, you can compare these results with
duplicates-removed results by yourselves.