- 27 lessons
- 0 quizzes
- 10 week duration
Learn At BioNome
Practice
For this analysis we will use featureCounts [23] which is a part of subread application and can be
downloaded from https://sourceforge.net/projects/subread/files/subread-1.5.1/. If you are not
familiar with compiling source code, download binary distribution that suits your operating system.
Identification of Novel Transcripts
In the previous practice we used reference annotation to quantify expression of known genes and
transcripts. Using stringTie [24] program we can assemble novel transcripts in genome guided or de
novo mode. Install stringTie and identify novel transcripts in reference guided mode by the
following steps.
brew install stringtie
stringtie –G human_grch37.gtf –o normal.transcripts.gtfnormal. sorted.bam
stringtie –G human_grch37.gtf –o tumor.transcripts.gtftumor. sorted.bam
stringTie assigns arbitrary transcript IDs to each assembled transcript, therefore each GTFfile
(normal and tumor) may have different set of transcripts. There may be similarities between GTF
files, but the number of transcripts and their exact structure will differ in the output files for each
sample. One solution for this problem is to merge the GTF files and use it for expression
quantification using stringTie merge option.
stringtie –-merge –o meged.gtf –G huam_grch37.gtf normal. transcripts.gtftumor.transcripts.gtf
The “merged.gtf” can be used with featureCounts to generate transcript level summarized read
counts into “merged.transcripts. txt”. To annotate the novel transcrips use gffcompare program.
(https://ccb.jhu.edu/software/stringtie/gffcompare.shtml).