Next – Generation Sequencing

LAB
Free
  • 27 lessons
  • 0 quizzes
  • 10 week duration

Quality Check using fastqc and Multiqc

Data Quality Check

Due to vast amounts of the data, it is practically impossible to check quality of each base present in
fastq file(s). Summarized quality parameters like quality value, read length, base distribution across
the reads, and presence of adapter sequences and duplicated sequences would provide overall
information of data quality. To understand these parameters, we need to understand what they
represent. Quality value: The logarithmic probability of base calling error (Q ¼ log10 P) [12]. To put
it in perspective, Q value 30 means the probability of the base (nucleotide) being wrongly called is
0.001 and Q value 20 means probability of the base being wrongly called 0.01.
All copyrights reserved to BioNome
Read length distribution percent of reads with their respective lengths. Nucleotide distribution: It
visualizes how A, T, G, C are distributed across all the reads at a nucleotide position. If all the reads have same nucleotide at a given position it could be a sequencing artifact. If all the reads have same sequence towards their 30 end it could be adapter sequence. Read Duplication: Few sequences
representing majority of the data indicates presence of rRNA contamination or PCR duplicates. Since
we know data format and which quality parameters to check, let us analyze an example data set
containing four fastq files. This data is generated by sequencing a paired end stranded library using
Illumina HiSeq platform.

1. Installing Git
https://git-scm.com/book/en/v2/GettingStarted-Installing-Git
2. Brew or Linuxbrew
http://linuxbrew.sh/ (For Linux) http://brew.sh/ (For Mac) Check installation of brew by typing brew
install hisat2 to install hisat2 aligner.
3. Install Galaxy Platform (In Biolinux Galaxy is preinstalled) Go to Galaxy project using the
following link and check how to install latest version.
https://new.Galaxyproject.org/admin/get-Galaxy/

To get Galaxy, check for a command line like below.

git clone -b release_16.10 https://github.com/Galaxyproject/Galaxy.git

Copy and paste the “git clone” command in terminal and press enter to download Galaxy into
“galaxy” folder. Once the download is complete, from the same terminal window go to Galaxy folder
by typing cd galaxy and start Galaxy by typing sh run.sh Starting Galaxy for the first instance will
take a bit long time as it requires to set up the environment and acquire some programs. Once the
setup completes, we can see an http link like below at the end of the terminal. The link is used to
access locally installed Galaxy platform.

Copy the local Galaxy http link and paste it in chrome or Firefox web browser. Create an
account by clicking on “user” and providing email and password. Now we need to assign Galaxy
administrator rights to the user we have registered. Close the web browser and stop Galaxy by
pressing “control+c” in the terminal where Galaxy is running. Now go to “config” folder located in
galaxy folder and find “galaxy. ini” file. If the file does not exist, you can copy it from the sample
“galaxy.ini.sample”. Open the file using a text editor and search for “#admin_users ¼” add the
registered email after “¼” and delete the “#” from the line. Save the “galaxy.ini” file after
modifications. Now restart the Galaxy from the command line (terminal) by typing
sh run.sh

For data QC, most widely used application is FastQC. Try to install it by your own (for help visit
https://wiki.galaxyproject.org/ Learn).

Quality Check Using FastQC

Since we have installed FastQC tool in Galaxy, we will use it to check the quality of our data.
1. Open Galaxy in web browser.
2. Import data into Galaxy by clicking on “Get Data” or .
3. Click on FastQC from your installed tools (use top left search bar).
4. Select multiple datasets option ( ) and select all the files.
5. Click execute and wait for the process to complete.
6. Once the analysis is complete, please check the “webpage” output (in the history panel on left side)
by clicking on eye icon ( ). Check quality parameters of all four fastq files.