Cite ChIP-Array2:Wang, Panwen, Jing Qin, Yiming Qin, Yun Zhu, Lily Yan Wang, Mulin Jun Li, Michael Q. Zhang, and Junwen Wang. "ChIP-Array 2: integrating multiple omics data to construct gene regulatory networks." Nucleic acids research (2015): gkv398. PMID: 25916854
ChIP-Array v2.0: integrating multiple omics data to construct gene regulatory network
Panwen Wang# , Jing Qin#, Yiming Qin, Yun Zhu, Lily Yan Wang, Junwen Wang* Centre for Genomic Sciences and Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China; Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China. # The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint first authors.ChIP-Array is a web-based tool integrating transcription factor (TF) binding and transcriptome data to construct a gene regulatory network controlled by a transcription factor. It detects both direct and indirect targets. Both type of targets are subsets of differentially expressed genes (DEGs). Formers are intersection of DEGs and ChIP factor binding enriched genes and latters are intersection of DEGs and putative binding sites-enriched genes. The putative binding sites are predicted by scanning motifs in the promoter regions of genes. Besides ChIP-seq/chip, microarray data, which are the main input for ChIP-Array, more types of omics data are available and enable us to understand the TF regulation more comprehensively. For example, histone modification ChIP-seq data allow us to investigate the histone modification during the TF regulation.
ChIP-Array v2.0 inherited the basic functions of ChIP-Array, but enhanced in several aspects. 1) It allows users to integrate several types of additional omics data for transcriptional regulation studies, including long chromatin interaction data, open chromatin regions and epigenetic marks. Long-range chromatin interaction data can be used to infer the long-range enhancer-promoter regulations. Interplay between the TF and histone modifications can be studied by comparing their binding positions in the genome. Moreover, open chromatin regions and histone modifications provides information of tissue-specific cis-regulatory regions for the inference of active transcription factor binding sites when detecting the indirect targets. 2) Besides TFs and chromatin modifiers, new version can also construct downstream network regulated by histone modification or other treatment/perturbation. 3) We update our motif database for human, mouse, rat, fruit fly, worm, yeast and Arabidopsis. The number of motifs are increased from 1151 position weight matrixes (PWMs) of 894 TFs from 6 species to 6548 PWMs of 4481 TFs from 7 species. We also manually curate ChIP-seq/chip data of TFs so that users can select them as input or use them instead of putative binding sites when detecting indirect targets. Besides, we have incorporated the enhancer sequences from VISTA and also curated experimentally validated enhancers from 180 publications. 4) We offer an additional target detection method “rank products” (differentially expression rank and a rank based on the concentration of peaks around transcription start site) to score the direct or indirect targets [2,3]. We also allowed users to combine different regulatory networks to investigate the co-occupied targets. 5) We provide a more user-friendly web interface. All input, built-in data and results can be visualized in JBrowse around the targets. Running time depends on the number of data files and indirect targets. Example is about 5 minutes. 6) We used ChIP-Array v2.0 to build a network library containing regulatory networks of 28 TFs and chromatin modifiers in mouse embryonic stem cell. Information of these networks are freely accessible.
Figure 1. ChIP-Array v2.0 workflow
Figure 1A shows how we generate the regulatory network with both direct and indirect targets. There are generally two methods. One is called “direct”, which is used in ChIP-Array v1. By this method, we infer the targets as the intersection set of differentially expressed genes and TF binding enriched genes. It means that a gene is regarded as a target if it is differentially expressed and has binding sites in its promoter region. The other method is called “Rank Product”. We have two ranks for the genes, one is the rank by differential expression (E). It’s usually ranked by the statistical values of that differentially expressed genes. For example, False Discovery Rate or P-Value. The other rank is the peak concentration around the gene’s transcription start site (TSS). The concentration is represented as:
k is the number of peaks around the gene in a certain distance (e.g. 10kb), di is the ratio between of distance from the ith peaks to the TSS and the certain distance (10kb). For example, di=0.5 means this peak is 5kb away from the TSS. Then a cutoff is applied to keep the significant targets. If a direct target is also a transcription factor, we try to find the indirect target. Users can try to use our curated ChIP-X data or the putative TFBS as the binding data. If the putative TFBS is used, other types of omics data is applied there to filter the TFBS. Combining the direct targets and indirect targets, we get the complete regulatory network.
Figure 1B shows the analysis, visualization of the network. We can visualize those ChIP-X data, TFBS and other OMICS data in JBrowse. We can also perform the Gene Ontology and pathway enrichment analysis to investigate the function of the regulatory network. MEME is also used to analyze the ChIP-X data and find the motif. It can also compare against our motif database and find the possible cooperative TFs.1. Qin J, Li MJ, Wang P, Zhang MQ, Wang J. Nucleic acids research. 2011;39(Web Server issue):W430-6.
2. Breitling R, Armengaud P, Amtmann A, Herzyk P. FEBS letters. 2004;573(1-3):83-92.
3. Wang S, Sun H, Ma J, Zang C, Wang C, Wang J, et al. Nature protocols. 2013;8(12):2502-15.