BioLiP database is constructed using know protein structures in PDB. The overall flowchart of the database construction is shown below, which includes three major steps.
Step 1. For each entry in the PDB, the 3D structure in mmCIF format is downloaded. For each protein chain (called receptor), the following information (if any) is collected either from the mmCIF file or from the SIFTS project: catalytic site residues mapped from the Catalytic Site Atlas; annotated Enzyme Commission (EC) numbers; Gene Ontology (GO) terms, UniProt accessions, and the PubMed ID, with which the abstract of the research paper can be downloaded. Modified residues of proteins and nucleic acids are mapped to standard residue types either based on the _pdbx_struct_mod_residue record of the mmCIF file when possible or based on atomic compositions.
Step 2. Ligands, which are defined as small molecules, are extracted from the mmCIF file. Three kinds of ligands are collected in the BioLiP database: regular ligands (labeled with "." by the _atom_site.label_seq_id record), including metal ions; DNA/RNA; and peptides with less than 30 residues. The binding affinity (if any) for each ligand is taken from the original literature, Binding MOAD, PDBbind-CN, and Binding DB databases.
Step 3. The ligand binding sites on the protein receptors are identified by the following procedure. First, all inter-molecular atomic interactions (i.e., receptor-ligand atom pairs within sum of van der Waals radius plus 0.5 Å) are calculated. Second, protein residues with at least two inter-molecular atomic interactions to a ligand are labeled as ligand binding residues. Third, two or more ligand binding residues for the same ligand are grouped into the same binding site.
Step 4. Each ligand with at least one binding site on a protein receptor is submitted to a composite automated and manual procedure to assess its biological relevance, which is illustrated in the right panel of the figure below.
BioLiP can be queried by RESTful API.
Chemical information for ligand can be queried by its 3-letter Chemical Component Dictionary (CCD) used by PDB. For example, to show chemical information for FMB (Formycin B):
Ligand-protein interactions can be searched by PDB ID, ligand, UniProt accession, EC number, GO term, and PubMed ID:
Alternatively, ligand-protein interactions can be searched by sequence of protein receptors or polymer ligands (DNAs, RNAs or peptides):
Reference:
Jianyi Yang, Ambrish Roy, and Yang Zhang.
BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions, Nucleic Acids Research, 41: D1096-D1103 (2013) (
download the PDF file).