What is hPDI?
The hPDI database holds experimental protein-DNA interaction data for humans identified by protein microarray assays. The current release of hPDI contains 17,718 preferable DNA binding sequences for 1013 human DNA-binding proteins.
Who created the data and maintains hPDI?
This project is a collaboration between Drs.
Zhu, Qian, and
Seth Blackshaw labs
in Johns Hopkins School of Medicine. The hPDI database is maintained by the
Qian's bioinformatics lab at Wilmer Eye Institute, Johns Hopkins School of Medicine, Baltimore, MD.
How to use hPDI?
Users may perform the following tasks on the web.
- Protein view: users can search a protein of interest. The protein view pages will provide the relevant information of the protein such as the protein annotation, protein class, DNA binding sequences / logos, and the position weight matrix (PWM).
- Class view: users can browse the DNA binding profiles for a selective TF family or protein class for a quick overview.
- Motif search: the database can be queried with user-defined DNA sequences
and the "best matching" motifs are re-turned. Using this function, users can find the potential binding proteins (both TFs and uDBPs) for a given DNA sequence.
- All the DNA sequences and their binding proteins, as well as the protein sequences, can be downloaded from our server.
What's the statistics of the current release of hPDI?
Overall statistics
No. of PDIs | 17718 |
No. of DNA-binding proteins | 1013 |
No. of DNA-binding logos | 437 |
Mean sequences bound per protein | 17 |
TF family | No. |
Total | 493 |
Zf-C2H2 | 95 |
Homeobox | 44 |
Other TF families | 36 |
HLH | 22 |
Hormone_recep | 17 |
zf-CCHC | 12 |
Myb | 11 |
HMG_box | 11 |
Ets | 10 |
MH | 8 |
bZIP_1 | 6 |
Fork head | 6 |
IRF | 6 |
TFs without known DBDs | 209 |
uDBP classes | No. |
Total | 520 |
RNA-binding proteins | 207 |
All other categories | 132 |
Mitochondrial proteins | 97 |
Chromatin-associated proteins | 73 |
Other nucleic acid-binding | 50 |
DNA repair & replication | 50 |
Transcriptional co-regulators | 43 |
Protein kinases | 14 |
Note: Some proteins may belong to more than one protein class.
What kind of data does each entry hold?
Protein name | This is the standardized Entrez gene symbol |
Locus ID | Locus ID with an external link to NCBI gene website |
Class | Transcription factors or unconventional DNA-binding proteins |
Sub class | TFs:
HLH, Homeobox, Zf-C2H2, zf-C4, bZIP_1, Ets, Myb, Fork head, Paired-box, Head-Shock, HMG_box, RHD, IRF, bZIP_2, and other categories
uDBPs:
protein kinases, chromatin-associated proteins, RNA-binding proteins, transcriptional co-regulators, other nucleic acid-binding proteins rather than TFs and RNA-binding proteins, protein associated with DNA repair & replication, mitochondrial proteins, and all other categories. |
Protein sequence | Protein sequence used in protein microarray chips |
Binding logo | Binding logos are graphical representations of the consensus sequences, generated by weblogo. For details see the next question below |
Matrix | Position frequency matrix |
Binding motifs | All the binding sequences of the protein |
Why only some proteins in hPDI have consensus sequences (binding logos)
We used AlignAce to generate consensus sequences of a protein from its binding sequences, but only proteins bind to 3
- 29 binding sequences in hPDI were used. On the one hand, proteins need to bind a minimum of three sequences to retrieve a consensus sequence. On the other hand, proteins bind to more than 30 (this cutoff is arbitrary) may bind to either DNA sequences non-specifically or the T7 promoter in each probe. Users should be noted that depending on the parameters used in logo generating software, such as AlignAce,
logos may be slightly different.
How hPDI compares with the other similarly scoped databases, such as TRANSFAC
For human TFs, hPDI hosts DNA binding sequences of 493 TFs where more than 300 are unique in our collection compared with TRANSFAC. Whereas, TRANSFAC also holds DNA binding sequences of a number of human TFs not collected in hPDI. Even some TFs appear in both databases, users can actually combine our data collection with TRANSFAC for better definition of DNA binding profiles for human TFs since the sources of these databases are completely independent.
For unconventional DNA binding proteins (uDBPs), hPDI is the only database hosting DNA binding sequences.
How to cite hPDI?
The database paper:
Xie, Z., Hu, S.H., Blackshaw, S., Zhu, H. and Qian, J. (2009) hPDI: a database
of experimental human protein-DNA interactions, Bioinformatics.(in press)
The original data source paper:
Hu, S.H., Xie, Z., Onishi, A., Yu, X.P., Jiang, L.Z., Lin, J., Rho, H.S.,
Woodard, C., Wang, H., Jeong, J.S., Long, S.Y., He, X.F., Blackshaw, S., Qian,
Q. and Zhu, H. (2009) Profiling the Human Protein-DNA Interactome Reveals ERK2
as a Transcriptional Repressor of Interferon Signalling, Cell, 139,
610-622.
How to contact us?
We appreciate your comments and suggestions.
For question regarding the experiments and protein microarrays, you can contact
heng (dot) zhu (at) jhmi (dot) edu or sblack (at) jhmi (dot) edu
For question regarding the bioinformatics analysis and database, you can contact
jiang (dot) qian (at) jhmi (dot) edu or zhi (dot) xie (at) jhmi (dot) edu