Chapter 5 ChIP-seq data analysis

5.1 QC

cd ~/project/cutrun/0_RawData/blast
mkdir QC
cd QC 
nohup fastqc -t 8 ../*.fastq.gz -o QC/ &

5.2 Fastq file

zcat Brg1KD_1_1.fastq.gz | head
# @SRR6782542.1 NS500730:396:H5TCHBGX3:1:11101:9911:1052/1
# TNCCAGCCATGCTCGGTTATC
# +
# A#EEEEEEEEEEEEEEEEEEE
  • the fisrt line
    • start with @
    • SRR6782542.1
    • NS500730:396:H5TCHBGX3: sequencer number
    • 1: lane
    • 11101: tail coordinate
    • 9911: X coordinates in tail
    • 1052: Y coordinates in tail
    • /1 or /2: paired end reads
  • the second line: sequence ‘ATCGN’
  • the third line: +
  • the fourth line: Quality information

5.3 mm10

Downloading genome annotation from iGenomes

The iGenomes project provides a considerable number of genomic in a form that can readily installed for use with aligners such as Bowtie

https://support.illumina.com/sequencing/sequencing_software/igenome.html

In this case however, genomic data formatted for use with the Bowtie 2 aligner can be downloaded from the Bowtie site and then unpacked.

cd ~/project
mkdir indexes
mkdir indexes/mm10
cd indexes/mm10
wget ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/mm10.zip
unzip mm10.zip

5.4 bowtie2 alignment