- 27 lessons
- 0 quizzes
- 10 week duration
Overview
Module 1
Module 2
Module 3
Module 4
Module 5
Module 6
Module 7
Module 8
Module 9
RNASeq Pipeline using edgeR
Quality Check
NGS QC metrics
1. Sample internal control
In general, when starting NGS experiment it is better to have high-quality deoxyribonucleic acid and/or ribonucleic acid samples. However, several experiments are performed with degraded nucleic acids. Methods like spectrophotometric (Nanodrop), fluorimetric (Qubit, PicoGreen or RiboGreen assay) and gel electrophoretic ways (Bioanalyzer) are usually used as a QC beginning material. The Agilent’s technology RNA Integrity range (RIN) provides a robust technique for RNA QC and most experiments using samples with RIN>7 as standard value. ( Endrullat et al., 2016)
2. Library internal control
Before starting any sequencing, most NGS libraries are checked on the Bioanalyzer technology. This technology can verifies the insert size and it is needless to say wether libraries have any contaminating adapter-dimers or not. These is a significant issue with exclusion-amplification clump chemistry for certain Illumina platforms.
3. Sequencing internal control
There are many usually used sequencing QC metrics are available or adopted. Sequence analysis viewer (SAV) and FastQC analysis reports are one of the general tools to be used.
4. Sequence analysis viewer (SAV)
SAV is a monitoring tool from Illumina technology which monitor sequencing throughout, or check QC once a run is completed. Illumina’s BaseSpace is another division that contains helpful data presented within the “Per browse and Per Lane Metrics” tables format. These embrace factors like yield, error rate, %>=Q30, Density (K/mm2), Cluster PF(%) and Phas/Prephas (%)
4.1 Yield
Yield is that the range of bases generated within the run. Yield is very important to any or all users, however it is one thing your service supplier can guarantee so you do not got to worry concerning it.
4.2 Error rate
Error rate refers to the share of bases referred to as incorrectly at any one cycle. Error rate is calculated from the reads that are aligned to Illumina’s PhiX control management. If it wasn’t used then %>=Q30 is your best tool to see base quality. Error rate will increase on the read length.
4.3 %Q30
The percentage of bases with a top quality score of 30 or higher. Most Illumina runs can generate more than 70-80% of Q30 data. This price is a mean across the full browse length, and error rate will increase towards the tip of the reads. attributable to this a run will “fail” at the tip of a long-read, however pass Illumina’s specs for the run with reference to Q30 – if a browse is Q40 for bases 1-100, and Q10 for bases 101-150 it’ll pass the Q30 specification, however if you wish the ends of the reads to be prime quality, you’ll be defeated ( Endrullat et al., 2016).
4.4 Density (K/mm2)
The density of clusters on the flow cell is noted as (in thousands per mm2). On MiniSeq, MiSeq, NextSeq and HiSeq 2500 this is often a vital factor to judge if the information are low-quality. It ought to be assessed in tandem with % PF because the 2 along will diagnose issues with over- or under-loading your library. On HiSeq 4000 and X density technology has fixed value of density.
4.5 Cluster PF (%)
In Illumina clumping or clustering process one-molecule ought to generate one cluster with a transparent signal within the base being sequenced. The % PF is a factor that the range of clusters that passed Illumina’s “Chastity filter”. The clusters that don’t pass this filter are typically not allowed for any downstream analysis. The Chastity filter works by conniving the magnitude relation of the very best base intensity to the total of the first and second highest, something but 0.6 value is filtered out. If a cluster was shaped from one-molecule then the chastity score can be 1; if it were shaped from 2 molecules then the signal would be equal and therefore the chastity score are going to be 0.5.
Phas/Prephas%
This is a vital factor to concentrate to – low numbers are what you wish to obtain e.g. 0.1/0.1. Phasing is that the rate at that individual molecules in an exceedingly cluster become out of synchronize with one another, with some may falling behind (phasing), others may jumping ahead (pre-phasing). the worth given is that the share of true signal being lost in every cycle, therefore once a 150 cycles completed, 15% of the information is currently noise. Phasing is one the reason why long-reads are tough. There are many attainable causes for poor phasing/pre-phasing, however to estimate this, properly needs a sample with balanced base-composition (25% of every base), if you recognize your sample to be unbalanced then you’ll got to add further PhiX management. presumptuous your sample isn’t the matter the foremost probably causes are the reagents or flow cell. Check the expiry date of reagents, confirm there have been no issues with fluidics, and verify the temperature wasn’t too high throughout the complete run ( Endrullat et al., 2016).
5. Sequencing reads
If you’re doing read-counting NGS like RNA-seq or ChIP-seq then sequence reads metric is important than yield. But these two factors can be used interchangeably by sequencing service suppliers, therefore it is better to confirm which factor is most vital to the study.