[go: up one dir, main page]

RCSB PDB Help

Web APIs Overview

The Application Programming Interface or APIs provide programmatic access to the PDB archive. APIs that power the rcsb.org website are:

  • Data API serves to retrieve data when you know the PDB identifier
  • Search API serves to find out what identifiers match a certain search condition
  • ModelServer API is a service for accessing subsets of macromolecular model data
  • VolumeServer API is a service for accessing subsets of volumetric data
  • 1D Coordinate Server API serves alignments between structural and sequence databases and integrates protein positional features from multiple resources
  • Alignment API provides a streamlined method for programmatically executing structure alignment calculations

Stay up-to-date with API developments by viewing (or subscribing) to the RCSB PDB API announcements Google group.

Data API

All static data that is exposed in rcsb.org is available in the Data API. The schema follows the mmCIF dictionary, extended with annotations coming from external resources. The core PDB data is split up into core objects, one per level of the structural data hierarchy, with entity subdivided into polymeric and non-polymeric subschemas (differing from the mmCIF dictionary). These are some of the core objects:

  • core_entry: data that relates to a PDB entry or Computed Structure Model (CSM). Identified by an entry_id, which can be an alphanumeric PDB-ID or a CSM-ID that starts with AF_ or MA_
  • core_polymer_entity: data for each polymeric molecular entity in an entry (e.g., protein, DNA, and RNA). Identified by entry ID and entity ID separated by a _ character, e.g. 3PQR_1
  • core_nonpolymer_entity: data for each non-polymeric small chemical entity in an entry (e.g., enzyme cofactors, ligands, ions, etc). Identified by entry ID and entity ID separated by a _ character
  • core_branched_entity: data for branched molecules (e.g., oligosaccharides). Identified by entry ID and entity ID separated by a _ character
  • core_assembly: data for each biological assembly in an entry. Identified by entry ID and assembly ID separated by a _ character
  • core_polymer_entity_instance: an instance of a certain polymeric molecular entity, also known as chain. Identified by entry ID and asym ID separated by a _ character
  • core_chem_comp: a chemical component. Identified by a unique alphanumeric code chem_comp_id

Both internal additions to the mmCIF dictionary and external resources annotations are prefixed with rcsb_. In each core object, the rcsb_<core_object>_container_identifiers field holds the cardinal identifiers for the objects and any parent/child. Additionally every core object contains a single string identifier in field rcsb_id.

The data is available via 2 different interfaces:

REST API Full Reference

The REST API permits the retrieval of all data for one core object at a time.

GraphQL API (GraphiQL interface)

The GraphQL interface offers more flexible data retrieval, essentially making it possible to grab any piece of data from any level of the hierarchy in a single query. To use it programmatically POST your GraphQL queries under the data.rcsb.org/graphql endpoint.

All output from both REST and GraphQL interfaces is offered in JSON format.

Data API Tutorial

Search API

The search API programmatically exposes all search functionality available at rcsb.org. It is possible to perform queries with arbitrary Boolean logic across all data available in the RCSB PDB data API via a convenient JSON-format query language. At the root level it is also possible to combine text-based searches (any text/numerical field in the RCSB PDB data API) with protein/nucleotide sequence search (mmseqs2 software) and Structure similarity searches (BioZernike software, described in Guzenko et al 2020). All output from the Search API is offered in JSON format.

Search API Tutorial

Search API Full Reference

This Search API example shows how to find all PDB structures and CSMs for a given UniProtKB sequence

Python API Client

The rcsbsearchapi package provides a Python interface to the RCSB PDB Search API. You can use it to fetch lists of PDB IDs corresponding to advanced query searches. RCSB PDB maintains the current version of this package on GitHub. The package was originally developed by Spencer Bliven and updated by several others.

ModelServer API

The ModelServer is a service for accessing subsets of macromolecular model data. It delivers atomic coordinates together with annotations in the primary data files in a compressed BinaryCIF encoding (BCIF). Structure data can be served at different levels of granularity (e.g., assembly, polymer chain, ligand), and ligand data may also be delivered in popular chemical informatics formats (e.g., SDF, MOL, MOL2).

ModelServer API Specification

The specification of the BinaryCIF format can be found at: https://github.com/molstar/BinaryCIF.

VolumeServer API

The VolumeServer is a service for accessing subsets of volumetric data. It automatically downsamples the data depending on the volume of the requested region to reduce the bandwidth requirements and provide near-instant access to even the largest data sets.

VolumeServer API Specification

Both ModelServer and VolumeServer are part of Mol* (D. Sehnal, A.S. Rose, J. Kovca, S.K. Burley, S. Velankar (2018) Mol*: Towards a common library and tools for web molecular graphics MolVA/EuroVis Proceedings.doi:10.2312/molva.20181103).

1D Coordinate Server API

The RCSB PDB 1D Coordinate Server compiles alignments between structural and sequence databases and integrates protein positional features from multiple resources. Alignment data is available for NCBI RefSeq (including protein and genomic sequences), UniProt and PDB sequences. Protein positional features are integrated from UniProt, CATH, SCOPe and RCSB PDB and collected from the RCSB PDB Data Warehouse.

1D Coordinate Server API Tutorial

Alignment API

The Alignment API serves as a comprehensive platform for the seamless computation of structure alignments. Structure alignment focuses on making an optimal superposition of the 3D coordinates of biological macromolecules to establish a residue-residue correspondence between sequences of related structures. This API provides a streamlined method for programmatically executing structure alignment calculations.

Alignment API Tutorial

API Rate Limits

The RCSB PDB APIs implement rate-limiting measures to ensure fair usage. While access to static files is not restricted, all RCSB PDB APIs have rate limits in place. We recommend starting with a handful of requests per second. If you exceed the limit, the service will respond with a 429 HTTP error code, indicating that you need to reduce your request rate as described below.

If you encounter this error, you can retry your query after a short waiting period or add a waiting period after each request. The exponential backoff strategy, which gradually increases the time between retries, is a common way to find an acceptable request rate. Additionally, if you are using a public IP address that is shared with other users, such as certain universities or when using a VPN, you might encounter rate-limiting issues earlier due to shared resources.

License

RCSB PDB Web Services usage is available under the same terms and condition as RCSB PDB Web Portal (see usage policies)

Contact RCSB PDB with questions or suggestions for specific services.



Please report any encountered broken links to info@rcsb.org
Last updated: 10/25/2024