Chapter 4 uliCUT&RUN data analysis
For analyzing the uliCUT&RUN datasets, just begin with the bdg(bedGraph) files.
Steps before getting the bed file (CUT&RUN protocol on GitHub):
trim reads to 25 bases (while keeping paired)
split reads using novocraft (also trims off barcode with -l option)
Align reads using Bowtie2 (align to yeast or to mouse)
Remove duplicate reads (created through PCR) using Picard.
Remove low quality reads (MAPQ <10)
Make Size Distribution file for mm10; Make size distribution for sacCer3, spike in
Make size classes (
1-120 for TF, 150-500 for histones)Homer analysis
makeTagDirectory
makeUCSCfile
make aggregation plots or heatmaps over specific anchor location
call peaks
find motifs
4.1 Experimental design
Biological question:
In this project we are only interested in samples from blastocysts, and the goal is to find out which NANOG binding regions are influenced by the depletion of BRG1.
| cellType | Experiment | Antibody | Rep |
|---|---|---|---|
| blastocysts | EGFPKD | NoAb | 4 |
| blastocysts | EGFPKD | NANOG | 4 |
| blastocysts | Brg1KD(Smarca4KD) | NoAb | 4 |
| blastocysts | Brg1KD(Smarca4KD) | NANOG | 4 |
| blastocysts | NanogkD | NoAb | 4 |
| blastocysts | NanogkD | NANOG | 4 |
Hainer, S. J., et al. (2019).
Hints:
Fig F shows
GEFP-KD with NANOG Antibodygroup does not affect the enrichment of NANOG, so actually these samples are used as the control group in the analysis.From EGFP-KD and Smarca4-KD conditions. I think we might tell which NSNOG binding regions are BRG1 related and which are not by analyzing the two samples to find the differential NANOG peaks.
4.2 bedGraph
The score is placed in column 4, not column 5
Track lines are compulsory, and must include type=bedGraph. Currently the only optional parameters supported by Ensembl are:
name: unique name to identify this track when parsing the file
description: label to be displayed under the track in Region in Detail
priority: integer defining the order in which to display tracks, if multiple tracks are defined.
graphType: either ‘bar’ or ‘points’
zcat Brg1KD_1.bedGraph.gz | head
# track type=bedGraph name="Brg1KD_Nanog_1_1-120 Total Tags = 1.16e+05, normalized to 1.00e+07" description="Brg1KD_Nanog_1_1-120 Total Tags = 1.16e+05, normalized to 1.00e+07" color=123,110,212 visibility=full yLineOnOff=on autoScale=on yLineMark="0.0" alwaysZero=on graphType=bar maxHeightPixels=128:75:11 windowingFunction=maximum smoothingWindow=off
# chr1 3011588 3011599 41.40
# chr1 3011599 3011692 82.80
# chr1 3011692 3011703 41.40
# chr1 3048182 3048232 41.40
# chr1 3048232 3048286 82.80Genome browser tracks were generated from mapped reads using the “makeUCSCfile” command.
Mapped reads were aligned over specific regions using the “annotatePeaks” command to make 20 bp bins over regions of interest and sum the reads within each bin. Peaks were called using parameters similar to those previously described (Skene et al., 2018) but implemented in HOMER using the “findPeaks” command with the following parameters: -style factor < or > histone -P 1 -poisson 0.01 -F 0.5 -L 2 -LP 0.001 -i noab.
Motifs were identified using the “findMotifs” command.