Protein-DNA interactions

Human Protein-DNA Interactome (hPDI)

What is hPDI?
The hPDI database holds experimental protein-DNA interaction data for humans identified by protein microarray assays. The current release of hPDI contains 17,718 preferable DNA binding sequences for 1013 human DNA-binding proteins.

Who created the data and maintains hPDI?
This project is a collaboration between Drs. Zhu, Qian, and Seth Blackshaw labs in Johns Hopkins School of Medicine. The hPDI database is maintained by the Qian's bioinformatics lab at Wilmer Eye Institute, Johns Hopkins School of Medicine, Baltimore, MD.

How to use hPDI?
Users may perform the following tasks on the web.

  • Protein view: users can search a protein of interest. The protein view pages will provide the relevant information of the protein such as the protein annotation, protein class, DNA binding sequences / logos, and the position weight matrix (PWM).
  • Class view: users can browse the DNA binding profiles for a selective TF family or protein class for a quick overview.
  • Motif search: the database can be queried with user-defined DNA sequences and the "best matching" motifs are re-turned. Using this function, users can find the potential binding proteins (both TFs and uDBPs) for a given DNA sequence.
  • All the DNA sequences and their binding proteins, as well as the protein sequences, can be downloaded from our server.

What's the statistics of the current release of hPDI?
Overall statistics
No. of PDIs17718
No. of DNA-binding proteins1013
No. of DNA-binding logos437
Mean sequences bound per protein17
TF family No.
Total 493
Zf-C2H2 95
Homeobox 44
Other TF families 36
HLH 22
Hormone_recep 17
zf-CCHC 12
Myb 11
HMG_box 11
Ets 10
MH 8
bZIP_1 6
Fork head 6
IRF 6
TFs without known DBDs 209
uDBP classes No.
Total 520
RNA-binding proteins 207
All other categories 132
Mitochondrial proteins 97
Chromatin-associated proteins 73
Other nucleic acid-binding 50
DNA repair & replication 50
Transcriptional co-regulators 43
Protein kinases 14
Note: Some proteins may belong to more than one protein class.

What kind of data does each entry hold?
Protein name This is the standardized Entrez gene symbol
Locus ID Locus ID with an external link to NCBI gene website
Class Transcription factors or unconventional DNA-binding proteins
Sub class TFs:
HLH, Homeobox, Zf-C2H2, zf-C4, bZIP_1, Ets, Myb, Fork head, Paired-box, Head-Shock, HMG_box, RHD, IRF, bZIP_2, and other categories

uDBPs:
protein kinases, chromatin-associated proteins, RNA-binding proteins, transcriptional co-regulators, other nucleic acid-binding proteins rather than TFs and RNA-binding proteins, protein associated with DNA repair & replication, mitochondrial proteins, and all other categories.

Protein sequence Protein sequence used in protein microarray chips
Binding logo Binding logos are graphical representations of the consensus sequences, generated by weblogo. For details see the next question below
Matrix Position frequency matrix
Binding motifs All the binding sequences of the protein

Why only some proteins in hPDI have consensus sequences (binding logos)
We used AlignAce to generate consensus sequences of a protein from its binding sequences, but only proteins bind to 3 - 29 binding sequences in hPDI were used. On the one hand, proteins need to bind a minimum of three sequences to retrieve a consensus sequence. On the other hand, proteins bind to more than 30 (this cutoff is arbitrary) may bind to either DNA sequences non-specifically or the T7 promoter in each probe. Users should be noted that depending on the parameters used in logo generating software, such as AlignAce, logos may be slightly different.

How hPDI compares with the other similarly scoped databases, such as TRANSFAC
For human TFs, hPDI hosts DNA binding sequences of 493 TFs where more than 300 are unique in our collection compared with TRANSFAC. Whereas, TRANSFAC also holds DNA binding sequences of a number of human TFs not collected in hPDI. Even some TFs appear in both databases, users can actually combine our data collection with TRANSFAC for better definition of DNA binding profiles for human TFs since the sources of these databases are completely independent. For unconventional DNA binding proteins (uDBPs), hPDI is the only database hosting DNA binding sequences.

How to cite hPDI?
The database paper:
Xie, Z., Hu, S.H., Blackshaw, S., Zhu, H. and Qian, J. (2009) hPDI: a database of experimental human protein-DNA interactions, Bioinformatics.(in press)


The original data source paper:
Hu, S.H., Xie, Z., Onishi, A., Yu, X.P., Jiang, L.Z., Lin, J., Rho, H.S., Woodard, C., Wang, H., Jeong, J.S., Long, S.Y., He, X.F., Blackshaw, S., Qian, Q. and Zhu, H. (2009) Profiling the Human Protein-DNA Interactome Reveals ERK2 as a Transcriptional Repressor of Interferon Signalling, Cell, 139, 610-622.


How to contact us?
We appreciate your comments and suggestions.
For question regarding the experiments and protein microarrays, you can contact heng (dot) zhu (at) jhmi (dot) edu or sblack (at) jhmi (dot) edu
For question regarding the bioinformatics analysis and database, you can contact jiang (dot) qian (at) jhmi (dot) edu or zhi (dot) xie (at) jhmi (dot) edu