Next – Generation Sequencing

LAB
Free
  • 27 lessons
  • 0 quizzes
  • 10 week duration

Practice

Practice

For this practice we use Picard tools [20] to estimate the fraction of PCR duplicates in the mapped
reads. We need to sort SAM files by coordinate and convert them to Binary Alignment/Map (BAM)
All copyrights reserved to BioNome
format as Picard takes sorted BAM format file as input. To work with SAM and BAM files we will
use sambamba [21] and hence install both programs by typing the following commands in terminal.

brew install sambamba

brew install picard-tools

To convert SAM file to BAM use the following command.

sambamba view -f bam -S sam-o normal.bamnormal.sam

-f output file format
-S input file format
-o output file name
input file name

Now sort the bam file by coordinate

sambamba sort normal.bam

This command generates sorted bam file and index information of the bam file normal.sorted.bam
and normal.sorted.bam. bai respectively. This bam file can be visualized using Integrative Genomics
Viewer (IGV) [22]. Now use Picard tools to estimate duplicates percentage.

picardMarkDuplicates I=normal.sorted.bam O=markdup.normal. sorted.bam
M=markdup.normal.txt

I sorted BAM file
O duplicates marked BAM file
M metrics output file name

The metrics file contains details of number of duplicate read pairs, single tons, etc. In our “normal
sample” ~13% of reads are duplicated. To check the file, open it with excel or any other spreadsheet
app. In this practice we are not going to remove duplicates, you can compare these results with
duplicates-removed results by yourselves.