PubChemis adatabaseofchemicalmoleculesand their activities againstbiological assays.The system is maintained by theNational Center for Biotechnology Information(NCBI), a component of theNational Library of Medicine,which is part of the United StatesNational Institutes of Health(NIH). PubChem can be accessed for free through aweb user interface.Millions of compound structures and descriptive datasets can be freely downloaded viaFTP.PubChem contains multiple substance descriptions and small molecules with fewer than 100 atoms and 1,000 bonds. More than 80 database vendors contribute to the growing PubChem database.[2]

PubChem
Content
DescriptionChemicals and their bioassays
OrganismsHumans and other animals
Contact
Research centerNCBI
Primary citationPMID15879180
Access
Websitepubchem.ncbi.nlm.nih.gov
Download URLFTP
Web serviceURLPUG-View[1]
Miscellaneous
LicensePublic domain

History

edit

PubChem was released in 2004 as a component of the Molecular Libraries Program (MLP) of the NIH. As of November 2015, PubChem contains more than 150 million depositor-provided substance descriptions, 60 million unique chemical structures, and 225 million biological activity test results (from over 1 million assay experiments performed on more than 2 million small-molecules covering almost 10,000 unique protein target sequences that correspond to more than 5,000 genes). It also containsRNA interference(RNAi) screening assays that target over 15,000 genes.[3]

As of August 2018, PubChem contains 247.3 million substance descriptions, 96.5 million unique chemical structures, contributed by 629 data sources from 40 countries. It also contains 237 million bioactivity test results from 1.25 million biological assays, covering >10,000 target protein sequences.[4]

As of 2020, with data integration from over 100 new sources, PubChem contains more than 293 million depositor-provided substance descriptions, 111 million unique chemical structures, and 271 million bioactivity data points from 1.2 million biological assays experiments.[5]

Databases

edit

PubChem consists of three dynamically growing primary databases. As of 5 November 2020 (number of BioAssays is unchanged):

  • Compounds, 111 million entries[5](up from 94 million entries in 2017[4]), contains pure and characterized chemical compounds.[6]
  • Substances, 293 million entries[5](up from 236 million entries in 2017[7]and 163 million in Sept. 2014[8]), contains also mixtures,extracts,complexesand uncharacterized substances.
  • BioAssay,bioactivityresults from 1.25 million[9](up from 6,000 in Sept. 2014[10])high-throughput screeningprograms with several million values.

Searching

edit

Searching the databases is possible for a broad range of properties including chemical structure, name fragments,chemical formula,molecular weight,XLogP,andhydrogen bonddonor and acceptor count.

PubChem contains its own onlinemolecule editorwithSMILES/SMARTS andInChIsupport that allows the import and export of all commonchemical file formatsto search for structures and fragments.

Each hit provides information about synonyms, chemical properties, chemical structure including SMILES and InChI strings, bioactivity, and links to structurally related compounds and other NCBI databases likePubMed.

In the text search form the database fields can be searched by adding the field name in square brackets to the search term. A numeric range is represented by two numbers separated by a colon. The search terms and field names are case-insensitive. Parentheses and thelogical operatorsAND, OR, and NOT can be used. AND is assumed if no operator is used.

Example (Lipinski's Rule of Five):

0:500[mw] 0:5[hbdc] 0:10[hbac] -5:5[logp]

Database fields

edit

Identification numbers
Identification number in current database [UID]
Substance identification number [SID]
Compound identification number [CID]
BioAssay identification number [BAID], [AID]

General
Any database field [ALL]
Comment [CMT]
Deposition date [DDAT], [DEPDAT]
Depositor's external ID [SRID], [SRCID]
Source name [SRC], [SRCNAM], [SRCNAME]
Source release date [SRD], [SRDAT], [RLSDAT]
Medical Subject Heading(MeSH) term [MSHT], [MESHT]
MeSH tree node [MSHN], [MESHTN]
MeSH pharmacological actions [PHMA], [PHARMA]

Substance properties
Substance synonyms [SYNO]
IUPAC name [UPAC], [IUPAC]
International Chemical Identifier(InChI) [INCHI]
Molecular weight [MW], [MWT], [MOLWT]
Chemical elements [ELMT], [EL]
Non-Hydrogen atoms [HAC], [HACNT]
Isotopecount [IAC], [IACNT]
Totalformal charge [TFC], [CHG], [CHRG]
Chiralatom count [ACC], [ACCNT]
Defined chiral atom count [ACDC], [ACDCNT]
Undefined chiral atom count [ACUC], [ACUCNT]
Hydrogen bondacceptor count [HBAC], [HBACNT]
Hydrogen bond donor count [HBDC], [HBDCNT]
Tautomercount [TC], [TCNT], [TTMC]
Rotatable bond count [RBC], [RBCNT]
XLogP[11] [XLGP], [LOGP]

Compound properties
Compound synonyms [CSYN], [CSYNO]
Component count [CC], [CCNT]
Covalent unit (molecule) count [CUC], [CUCNT]
Total bioactivity count [TAC]

See also

edit

References

edit
  1. ^Kim, Sunghwan; Thiessen, Paul A.; Cheng, Tiejun; Zhang, Jian; Gindulyte, Asta; Bolton, Evan E. (9 August 2019)."PUG-View: programmatic access to chemical annotations integrated in PubChem".Journal of Cheminformatics.11(1): 56.doi:10.1186/s13321-019-0375-2.PMC6688265.PMID31399858.
  2. ^"PubChem Source Information".The PubChem Project.USA: National Center for Biotechnology Information.
  3. ^Kim, Sunghwan; Thiessen, Paul A.; Cheng, Tiejun; Yu, Bo; Shoemaker, Benjamin A.; Wang, Jiyao; Bolton, Evan E.; Wang, Yanli; Bryant, Stephen H. (2016)."Literature information in PubChem: associations between PubChem records and scientific articles".Journal of Cheminformatics.8:Article 32.doi:10.1186/s13321-016-0142-6.PMC4901473.PMID27293485.
  4. ^ab"Search Results for all compounds".Retrieved28 January2016.
  5. ^abcKim, Sunghwan; Chen, Jie; Cheng, Tiejun; Gindulyte, Asta; He, Jia; He, Siqian; Li, Qingliang; Shoemaker, Benjamin A; Thiessen, Paul A; Yu, Bo; Zaslavsky, Leonid; Zhang, Jian; Bolton, Evan E (8 January 2021)."PubChem in 2021: new data content and improved web interfaces".Nucleic Acids Research.49(D1): D1388–D1395.doi:10.1093/nar/gkaa971.PMC7778930.PMID33151290.
  6. ^"all[filt] - PubChem Compound Results".The PubChem Project.USA: National Center for Biotechnology Information.Retrieved7 January2011.
  7. ^"all[filt] - PubChem Substance Results".The PubChem Project.USA: National Center for Biotechnology Information.Retrieved28 January2016.
  8. ^"all[filt] - PubChem Substance Results".The PubChem Project.USA: National Center for Biotechnology Information.Retrieved7 January2011.
  9. ^"all[filt] - PubChem BioAssay Results".The PubChem Project.USA: National Center for Biotechnology Information.Retrieved28 January2016.
  10. ^"all[filt] - PubChem BioAssay Results".The PubChem Project.USA: National Center for Biotechnology Information.Retrieved7 January2011.
  11. ^Cheng T (Nov 2007). "Computation of octanol-water partition coefficients by guiding an additive model with knowledge".Journal of Chemical Information and Modeling.47(6): 2140–2148.doi:10.1021/ci700257y.PMID17985865.
edit