Chapter 3 Download Data
3.1 ChIP-seq
Chen, X., et al. (2008). Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell, 133(6), 1106-1117.
3.1.1 samFile
cat <<END > samFile.txt
SRR002004 ES_Nanog_1 ES_Nanog
SRR002005 ES_Nanog_2 ES_Nanog
SRR002011 ES_Nanog_3 ES_Nanog
SRR002010 ES_Nanog_4 ES_Nanog
SRR002009 ES_Nanog_5 ES_Nanog
SRR002008 ES_Nanog_6 ES_Nanog
SRR002007 ES_Nanog_7 ES_Nanog
SRR002006 ES_Nanog_8 ES_Nanog
SRR002012 ES_Oct4_1 ES_Oct4
SRR002013 ES_Oct4_2 ES_Oct4
SRR002014 ES_Oct4_3 ES_Oct4
SRR002015 ES_Oct4_4 ES_Oct4
SRR002025 ES_Sox2_1 ES_Sox2
SRR002024 ES_Sox2_2 ES_Sox2
SRR002023 ES_Sox2_3 ES_Sox2
SRR002026 ES_Sox2_4 ES_Sox2
SRR002021 ES_Smad1_1 ES_Smad1
SRR002020 ES_Smad1_2 ES_Smad1
SRR002022 ES_Smad1_3 ES_Smad1
SRR001991 ES_E2f1_1 ES_E2f1
SRR001990 ES_E2f1_2 ES_E2f1
SRR001989 ES_E2f1_3 ES_E2f1
SRR001988 ES_E2f1_4 ES_E2f1
SRR002034 ES_Tcfcp2I1_1 ES_Tcfcp2I1
SRR002033 ES_Tcfcp2I1_2 ES_Tcfcp2I1
SRR002032 ES_Tcfcp2I1_3 ES_Tcfcp2I1
SRR002031 ES_Tcfcp2I1_4 ES_Tcfcp2I1
SRR001987 ES_CTCF_1 ES_CTCF
SRR001986 ES_CTCF_2 ES_CTCF
SRR001985 ES_CTCF_3 ES_CTCF
SRR002035 ES_Zfx_1 ES_Zfx
SRR002036 ES_Zfx_2 ES_Zfx
SRR002037 ES_Zfx_3 ES_Zfx
SRR002038 ES_Zfx_4 ES_Zfx
SRR002019 ES_STAT3_1 ES_STAT3
SRR002018 ES_STAT3_2 ES_STAT3
SRR002017 ES_STAT3_3 ES_STAT3
SRR002016 ES_STAT3_4 ES_STAT3
SRR002000 ES_Klf4_1 ES_Klf4
SRR002001 ES_Klf4_2 ES_Klf4
SRR002002 ES_Klf4_3 ES_Klf4
SRR002003 ES_Klf4_4 ES_Klf4
SRR001992 ES_Esrrb_1 ES_Esrrb
SRR001993 ES_Esrrb_2 ES_Esrrb
SRR001994 ES_Esrrb_3 ES_Esrrb
SRR001995 ES_Esrrb_4 ES_Esrrb
SRR002039 ES_c-Myc_1 ES_c-Myc
SRR002040 ES_c-Myc_2 ES_c-Myc
SRR002041 ES_c-Myc_3 ES_c-Myc
SRR002042 ES_c-Myc_4 ES_c-Myc
SRR002046 ES_n-Myc_1 ES_n-Myc
SRR002045 ES_n-Myc_2 ES_n-Myc
SRR002044 ES_n-Myc_3 ES_n-Myc
SRR002043 ES_n-Myc_4 ES_n-Myc
SRR001996 ES_GFP_1 ES_GFP
SRR001997 ES_GFP_2 ES_GFP
SRR001998 ES_GFP_3 ES_GFP
SRR001999 ES_GFP_4 ES_GFP
SRR023866 ES_p300_1 ES_p300
SRR023867 ES_p300_2 ES_p300
SRR023868 ES_p300_3 ES_p300
SRR023869 ES_p300_4 ES_p300
SRR002027 ES_Suz12_1 ES_Suz12
SRR002028 ES_Suz12_2 ES_Suz12
SRR002029 ES_Suz12_3 ES_Suz12
SRR002030 ES_Suz12_4 ES_Suz12
END3.2 uliCUT&RUN
Hainer, S. J., et al. (2019). Profiling of pluripotency factors in single cells and early embryos. Cell, 177(5), 1319.
For CUT&RUN data, I’ll begin with the BED format file, which can be downloaded from the GEO (GEO accession: GSE111121)
In this project we are only interested in samples from blastocysts, and here are 28 samples in total.
| cellType | Experiment | Antibody | Rep |
|---|---|---|---|
| blastocysts | None | NoAb | 2 |
| blastocysts | None | CTCF | 2 |
| blastocysts | EGFPKD | NoAb | 4 |
| blastocysts | EGFPKD | NANOG | 4 |
| blastocysts | Brg1 | NoAb | 4 |
| blastocysts | Brg1 | NANOG | 4 |
| blastocysts | Nanog | NoAb | 4 |
| blastocysts | Nanog | NANOG | 4 |
3.2.1 samFile
Save as samFile.txt.
# create a file to store the acc number and experiment information
cat <<END > samFile.txt
GSM3022469 blast_NoAb_rep1 NoAb_1 NoAb
GSM3022470 blast_NoAb_rep2 NoAb_2 NoAb
GSM3022471 blast_CTCF_rep1 CTCF_1 CTCF
GSM3022472 blast_CTCF_rep2 CTCF_2 CTCF
GSM3022473 blast_EGFPKD_NoAb_rep1 EGFPKD_NoAb_1 EGFPKD_NoAb
GSM3022474 blast_EGFPKD_NoAb_rep2 EGFPKD_NoAb_2 EGFPKD_NoAb
GSM3022475 blast_EGFPKD_NoAb_rep3 EGFPKD_NoAb_3 EGFPKD_NoAb
GSM3022476 blast_EGFPKD_NoAb_rep4 EGFPKD_NoAb_4 EGFPKD_NoAb
GSM3022477 blast_EGFPKD_Nanog_rep1 EGFPKD_1 EGFPKD
GSM3022478 blast_EGFPKD_Nanog_rep2 EGFPKD_2 EGFPKD
GSM3022479 blast_EGFPKD_Nanog_rep3 EGFPKD_3 EGFPKD
GSM3022480 blast_EGFPKD_Nanog_rep4 EGFPKD_4 EGFPKD
GSM3022481 blast_Brg1KD_NoAb_rep1 Brg1KD_NoAb_1 Brg1KD_NoAb
GSM3022482 blast_Brg1KD_NoAb_rep2 Brg1KD_NoAb_2 Brg1KD_NoAb
GSM3022483 blast_Brg1KD_NoAb_rep3 Brg1KD_NoAb_3 Brg1KD_NoAb
GSM3022484 blast_Brg1KD_NoAb_rep4 Brg1KD_NoAb_4 Brg1KD_NoAb
GSM3022485 blast_Brg1KD_Nanog_rep1 Brg1KD_1 Brg1KD
GSM3022486 blast_Brg1KD_Nanog_rep2 Brg1KD_2 Brg1KD
GSM3022487 blast_Brg1KD_Nanog_rep3 Brg1KD_3 Brg1KD
GSM3022488 blast_Brg1KD_Nanog_rep4 Brg1KD_4 Brg1KD
GSM3022489 blast_NanogKD_NoAb_rep1 NanogKD_NoAb_1 NanogKD_NoAb
GSM3022490 blast_NanogKD_NoAb_rep2 NanogKD_NoAb_2 NanogKD_NoAb
GSM3022491 blast_NanogKD_NoAb_rep3 NanogKD_NoAb_3 NanogKD_NoAb
GSM3022492 blast_NanogKD_NoAb_rep4 NanogKD_NoAb_4 NanogKD_NoAb
GSM3022493 blast_NanogKD_Nanog_rep1 NanogKD_1 NanogKD
GSM3022494 blast_NanogKD_Nanog_rep2 NanogKD_2 NanogKD
GSM3022495 blast_NanogKD_Nanog_rep3 NanogKD_3 NanogKD
GSM3022496 blast_NanogKD_Nanog_rep4 NanogKD_4 NanogKD
END3.2.2 download
save the below bash script as downfile.sh.
The link looks like below:
https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3022nnn/GSM3022469/suppl/GSM3022469_blast_NoAb_1_1-120.ucsc.bedGraph.gz
https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3022nnn/GSM3022473/suppl/GSM3022473_blast_EGFPKD_NoAb_rep1_1-120.ucsc.bedGraph.gz
#!/bin/bash
while read gsm rep sampleName experiment
do
gsmSub=(${gsm:0:7}nnn)
# repHead=(${rep:0:$((${#rep}-4))})
# repTail=(${rep:$((${#rep}-1)):1})
# ftp=$(echo "https://ftp.ncbi.nlm.nih.gov/geo/samples/"$gsmSub"/"$gsm"/suppl/"$gsm"_"$repHead$repTail"_1-120.ucsc.bedGraph.gz -O "$sampleName".bedGraph.gz")
ftp=$(echo "https://ftp.ncbi.nlm.nih.gov/geo/samples/"$gsmSub"/"$gsm"/suppl/"$gsm"_"$rep"_1-120.ucsc.bedGraph.gz -O "$sampleName".bedGraph.gz")
wget $ftp
done < samFile.txt