[go: up one dir, main page]

RCSB PDB Help

Website FAQ

Frequently Asked Questions

This page contains a number of frequently asked questions and their answers. The questions are organized into several sections to match content on the Documentation Home Page:
FAQs About Structural Biology Data
FAQs About Search Tools
FAQs About Advanced Searching
FAQs About Visualize
FAQs About Explore
FAQs About Programmatic Access
FAQs About Miscellaneous Topics
If this page does not address your questions, please visit our contact us page and share your question with us.

FAQs About Structural Biology Data

This section of FAQs lists questions about the PDB and data available from the RCSB PDB. It is grouped into four sub-sections, with questions about the PDB contents; relationship between PDB, wwPDB, and RCSB.org; accessing data from the RCSB.org; and using these data to research and education.

Questions About PDB Contents

What is the PDB?

The PDB or Protein Data Bank is an archive for three dimensional structural data of biological macromolecules and their various complexes with each other and with small molecule ligands such as ions, cofactors, inhibitors, and drugs. Managed by members of wwPDB, a worldwide consortium, the PDB provides free access to structural data, tools, and resources to explore biological macromolecules in atomic detail. Learn more about PDB history and important milestones.

What is in the PDB?

The PDB includes 3D structure coordinates, relevant experimental data files, and various meta-data about the structures. Learn more about the contents of the PDB.

What can you do with PDB data?

You can visualize the 3D structures of biomolecules and their assemblies using molecular visualization tools, e.g., Mol*. Visualizing 3D structures of biomolecules can shed light on their properties, interactions, and functions. You can also use these data for computational analysis, structure prediction, drug design, and education.

Why does knowing about biomolecular structures matter?

Visualizing the shapes, analyzing interactions of biomolecular structures can provide insights into its functions in biological process in health and disease. It can be used for hypothesis generation, integration of various types of data, and for designing new features and properties (e.g., in drug design).

How do I learn more about the quality of a structure?

A series of FAQs about validation of 3D structural data is available from the wwPDB website.

Questions About the PDB, wwPDB, and RCSB.org

What is the relationship between PDB and wwPDB?

The Protein Data Bank (or PDB) is an archive of primarily experimentally determined 3D structures of biological macromolecules. It is managed by the worldwide PDB (wwPDB) partnership which was established in 2003 to ensure joint management of the PDB archive as a global public good (Berman et al. 2003). It was co-founded by Research Collaboratory for Structural Bioinformatics PDB (RCSB PDB), Protein Data Bank in Europe (PDBe), and Protein Data Bank Japan (PDBj). In 2023, two specialist data resources, Electron Microscopy Data Bank (or EMDB), Biological Magnetic Resonance Data Bank (or BMRB) have also joined the wwPDB. The Protein Data Bank at China (or PDBc) has recently joined wwPDB as an associate member of wwPDB.

The goal of this collaboration is that the data served by all of these centers remains the same. However, each resource maintains a website with unique tools and features for visualizing and analyzing the data.

What is RCSB.org?

The research-focused web portal for RCSB PDB is referred to as RCSB.org. It provides tools that support query, browsing, visualization, analysis, comparison and mapping of annotations for all 3D structures of biomolecules available from this portal.

What is the Structure Summary page?

Each structures available from the RCSB.org (experimentally determined or computed structure models) has a dedicated page that presents various types of information about it - as a \ quick snapshot of the contents of the structure and related details. Learn more about the structure summary page.

How do I link to a Structure Summary page for a PDB ID?

The correct syntax for linking to a Structure Summary page for a current experimental structure by PDB ID on the site is as follows (example 4HHB):

/structure/4HHB

Does the RCSB PDB link to data from any external data resources?

The RCSB PDB combines the primary data from PDB archival files with data from external resources to enhance the query and display functionality on the RCSB PDB website. A listing of such external data is provided in the table at external resources page.

Questions About Accessing Data available from RCSB.org

What are the URLs to download files?

You can read all about options for downloading files here.

When downloading files, how do you use a gzipped(.gz) file?

A .gz file is a compressed file similar to a .zip. To open a .gz file, simply double click it to open it in your default archiving utility. If you do not have one, free programs like 7-Zip are available which will uncompress the file for you.

When are new PDB entries released?

The PDB archive is updated each week on or about Wednesday 00:00 UTC (Coordinated Universal Time) with new entries, modified entries, and updated status information.

Updates are prepared on the previous Friday. Citation updates and release requests should be sent to deposit@deposit.rcsb.org by noon ET on the preceding Thursday to be included in an update; changes made after an update has been packaged will appear with the following update.

The files in the PDB archive have the Friday timestamp of the internal update packaging.

From the RCSB PDB site, the most recent release is timestamped and linked on every page from the top right header.

How can I maintain a local copy of all PDB files?

Users can maintain their own local copy of all PDB files using rsync. Example scripts are available in the section "Automated Download of Data" on the File Download Services page.

Questions About Using RCSB.org Data for Research, Training, and more

What software do I need to take full advantage of the RCSB PDB website?

Almost all features of the RCSB PDB web site require a modern web browser with JavaScript and cookies enabled. If you are experiencing difficulties, please try upgrading to the latest browser version. Here is a list of tested browsers that are supported, grouped by desktop operating systems:
Microsoft Windows:
Chrome latest version
Firefox latest version
Microsoft Edge latest version (Windows 10+)
Internet Explorer 11 (Windows 7 and 8) (limited support, may be slow, use Chrome or Firefox for best experience)
Apple Mac OSX:
Chrome latest version
Firefox latest version
Safari latest version
Linux (Ubuntu, Redhat, CentOS, etc.):
The web browser should be installed from your Linux distributions package manager and needs to be a recent version, such as; Mozilla Firefox version 32 or newer, Chrome version 45 or newer
Mobile support
We currently offer limited support for browsing the site on a mobile device:
Android 5 or newer
iPhone 5 or newer

How do I cite a structure/the RCSB PDB?

Please see our Policies & References page.

How are the 3D structural data organized in the PDB?

Since biomolecules are hierarchical structures, they are archived and accessible from the PDB at different levels of hierarchy. Learn more about hierarchies of 3D structural data - including entities, entries, assemblies, and instances. See also a short video about this.

Why do some structures have two different chain identifiers?

The two sets of chain IDs in some structures represent polymer or ligand chain ID assigned by the PDB during biocuration of the structure (label id) and the other is assigned by the author (auth id). The same rationale is applied to residue numbers too (i.e., a sequential number assigned during biocuration and another specified by the author, usually to match related structures or following a convention used in the field of study.)

For example, in the PDB entry 1cbw, the amino acid F [auth G] Leu 18 [auth 33] is a Leucine residue in a chain labeled F, which the author called chain G, and the residue number is assigned 18 but the author refers to it as 33. Learn more about chain IDs and also more about this exception in chain ID assignment.

If you would like to find a polymer chain based on what is listed in the manuscript you should use the author assigned chain iD. However, if you are using any of the RCSB.org bioinformatics tools you can use the label ID.

FAQs About Search Tools

Several options are available to search the archive using the top search box on RCSB.org (also referred to as Basic search). Learn more about Basic search options.

How do I search for structures with a particular protein name?

You can start by typing the protein name in the top search box on RCSB.org (i.e., Basic search). Note that protein names may be composed of multiple words - e.g., Insulin receptor or Succinate semialdehyde dehydrogenase. To ensure that the query is designed for the full name, make sure that you select the option in Uniprot Name from the pulldown options of the autocomplete suggestions that appear when you type the protein name in the top search box. Learn more about Basic search and ways to specify the complete protein name.

How do I include/exclude computed structure models when I search?

By default the CSM structures are not included in the search. To include CSMs in the search results you need to turn on the Include CSM toggle switch (located on the right of the top search box). Learn more about Basic search options related to this.

Can I search for structures using a protein sequence?

Yes, you can paste the sequence of a protein of interest (in FASTA format) in the top search box. This will be recognized as a sequence based search. Learn more about Basic search options related to this.

Can I find all structures that match a specific UniProt identifier?

Yes you can paste the UniProt accession ID in the top search box to launch the search. Learn more about Basic search options related to this. Alternatively, in the macromolecules section of the structure summary page (SSP), underneath the grey bar titled UniProt, you can click on the "Find proteins for UniProt ID" option to launch a search.

FAQs About Advanced Searching

There are several options for searching the PDB based on it specific properties (or attributes), sequence, structure, presence of ligand or chemical components and more. These search options can be combined using Boolean operators (AND, OR, NOT) to create complex queries.

Where can I learn more about Search Attributes?

A listing, explanations, and examples of Attributes or properties used for organizing and searching data in the PDB are available in the Attribute Details.

How can I find a structure with a specific mutation?

Find the structure of interest using either the protein name or UniProt identifier. Now open the group sequence page showing the sequences of all polymers in the PDB that match this UniProt ID. Scroll through the page to see mutations marked as pink bubbles on the purple sequence bar. Select the desired mutations from this display. For example see mutations in the hemoglobin alpha subunit protein.
You may also search by listing the specific mutation in the top search bar.

How can I find nucleic acid containing structures in the PDB?

In the advanced search query builder you can specify the Structure Attribute > Polymer Entity Type > DNA or RNA. Learn more about Attribute Search.
You may also search for all structures and then refine the search results by selecting DNA and RNA in the Polymer Entity Type options in the refinement options listed in the left hand column.

Can I search for small molecules/drugs present in the Chemical Component Dictionary (CCD)?

If you know the chemical component ID of the drug you can type that in the top search box and select "in Chemical ID" option in the autocomplete options presented.
If you know the full name of the drug, brand name, synonym, DrugBank ID etc. you can type that in the Chemical Attributes options (in the advanced search query builder).
If you wish to find the chemical components/drugs that match the query remember to change the results return type from Structures to Molecular Definitions and run the search.
Learn more about Attribute Search.

If you do not have the name or chemical component ID you can use the formula, descriptors or a drawing to find the drug molecule in the chemical component dictionary. Learn more about Chemical Similarity Search.

How can I find structures bound to a specific ligand or drug molecule?

If you have the chemical component ID, name, formula or other descriptors for the ligand/drug you can first find it as described above. Open the ligand summary page and click on the options for finding all structures in the PDB with that molecule in it. Alternatively, if you have found at least one structure with the ligand of interest, there is an option in the Small molecules section of the structure summary page where you can run the same query (i.e., Query on ). Learn more about Attribute Search.

How can I find all structures with an US FDA drug bound to it?

This search can be done as Learn more about this query and see other search examples.

Can I find all structures that match a list of PDB identifiers that I have identified from literature?

You can type the list of PDB IDs in the top search box on RCSB.org.
Alternatively, in the advanced search query browser select Structure Attributes > ID(s) and Keywords and type in the IDs of interest.
If you have the PubMed ID of the article describing the structures, you can search by PubMed ID or citation title too.
Learn more about Attribute Search.

How can I find membrane protein structures in the PDB?

In the advanced search query builder select Structure Attributes and start typing membrane in the options box to see various options for identifying membrane proteins. Learn more about Attribute Search.

Can I browse through structures representing different classification (e.g., enzyme classes based on E.C. or Enzyme Commission numbers) ?

There are several options for Browsing the structures available from RCSB.org. Learn more about Browsing options. See subsections to learn more about browsing by E.C. classification of enzymes.

How do I search for structures determined using a specific experimental method?

In the advanced search query builder you can type "method" in the options box for Structure Attributes. From the options shortlisted, select the appropriate one for your query. Learn more about Attribute Search.
Alternatively, you can refine the search results to retain only matches that were solved using a specific experimental method of interest - by clicking on the Experimental Method options in the refinement options listed in the left hand column.

How do I find structures that are within a specific resolution range?

In the advanced search query builder you can type "resolution" in the options box for Structure Attributes. From the options shortlisted, select the appropriate one for your query. Learn more about Attribute Search.
Alternatively, you can refine the search results to retain only matches that are within a specific resolution range - by clicking on the "Refinement Resolutions" options listed in the left hand column.

How do I find AlphaFold structures?

In the advanced search query builder you can type "Source" in the options box for Structure Attributes. From the options shortlisted, select the appropriate one under Computed Structure Models (i.e., Source Database) for your query. Remember to turn on the Include CSMs toggle switch before launching the search. Learn more about Attribute Search.
Alternatively, you can refine the search results to retain only matches that are listed as AlphaFold - by clicking on the "CSM Source Database" options listed in the left hand column.

Can I find all structures derived from a specific organism?

Open the advanced search query builder and the structure attribute options. You can specify the scientific name of the organism of interest in the Structure attributes > Polymer Molecular Features > Scientific Name of the Source Organism. Learn more about Attribute Search.

FAQs About Visualize

The default molecular visualization tool for RCSB.org is Mol*. Learn more about Mol* and how to use it in the few questions listed here and a more extensive list of scenarios listed along with Mol* documentation pages.

How do I use Mol* to display the structure of the PDB entry listed in a publication?

Open the structure summary page for the entry. Learn more about the structure summary page.

Click on the Structure tab or on the hyperlinked word "Structure" at the bottom of the thumbnail image shown in the top left corner of the page structure summary page. This opens the RCSB.org Mol* viewer with the structure displayed.

Can I use Mol* to visualize 3D structural coordinates that I have on my computer?

Yes, you can do so using the standalone Mol* tool available from RCSB.org. You can also access this link by clicking on the Visualize Menu in the top blue bar (available at the top of all RCSB.org pages) and select "Mol* (MolStar)"

How do I change the representation of specific polymers, domains, residues and ligands in a structure?

There are several options for selecting all or parts of a structure in Mol* to change representations, color, or hide. Learn more about making selections and changing representations. Learn details about making selections.

How do I use Mol* to create images that look like those featured in RCSB PDB Molecule of the Month articles?

You can select the Miscellaneous > Illustrative option in the Component Panel Preset options for this. Learn more about Components Panel options.

Are there worked out examples of specific Mol* scenarios?

You can access many more Mol* specific FAQs and scenarios.

FAQs About Explore

The data available from RCSB.org can be explored to learn more about the structure, its symmetry, ligands bound to it, annotations gathered from other trusted data resources, and based on related structures available from RCSB.org.

What is in the Structure Summary page?

The structure summary page presents information about the specific 3D structure. Primary data include information about structural coordinates, sequences of biological macromolecules, information about any small molecules/ligands present in the structure, details about structure determination method(s), authors and publication information.
While secondary data include information related to one or more components integrated from other data resources, mapped onto the 3D structure(s), and made available at RCSB.org - e.g., functional and mutational information about macromolecule(s) from UniProt. Learn more about Structure Summary pages.

What is the ligand of interest (or LOI)?

Ligand, such as cofactors, inhibitors, ions, are molecules that are found bound to structures at structurally and/or functionally interesting locations. Learn more about Ligands of interest in PDB structures.

What is in the Ligand (or BIRD) Summary page?

All ligand and Biologically Interesting molecules Reference Dictionary (BIRD) molecules included in the chemical component dictionary maintained the wwPDB is assigned a 3 or 5 character identifier and has a summary page that displays its 2D and 3D structures, chemical formula and other details, and links to other data resources (e.g., DrugBank, PubChem etc.) with information about the molecule. The page also presents a way to quickly search for all structures in the archive that include this component (ligand or BIRD molecule). See an example of the ligand summary page for ATP.

What is in the Group Summary page?

The Group Summary Pages (GSPs) provide overviews of key features, properties, sequence alignments, and annotations of any predetermined or custom group of structures. You can use the charts and sequence/structure comparison options presented here to learn about trends in conformations of a protein in a range of contexts (presence of a binding partner, presence of mutations etc.). Learn more about Group Summary pages.

What can I learn from the sequence annotation viewer?

Sequence Annotations Viewer provides graphical summaries of PDB protein biological and structural features and their relationships with UniProtKB entries. Learn more about the Sequence Annotations view.

What can I learn from the Genome viewer?

Genome View provides graphical summaries of the correspondences between PDB entity sequences and genomes. Learn more about Genome View.

FAQs About Programmatic Access

There are several ways in which you can programmatically interact with the PDB and related data available from RCSB.org - e.g., APIs, scripts, and more

Where can I find scripts for Batch Downloads?

You can find various Batch download scripts and various other file download options.

Does the RCSB PDB have APIs for accessing data programmatically?

Yes, learn more about webservices and APIs. Stay up-to-date with API developments by viewing (or subscribing) to the RCSB PDB API announcements Google group.

Are there any tutorials for RCSB PDB APIs?

Learn how to use the APIs by exploring the tutorials - e.g., Data API Tutorial, Search API Tutorial, sequence alignment and positional features API tutorial, Alignment API Tutorial.

Are there any limits on the number of requests that can be made using the APIs?

Yes, the RCSB PDB APIs implement rate-limiting measures to ensure fair usage so we recommend starting with a handful of requests per second. If you encounter this error, you can retry your query after a short waiting period. Learn more about API search limits.

FAQs About Miscellaneous Topics

These are a few questions that were not directly related to any of the sections above but may be of interest to RCSB.org users

Is there a discussion forum for the RCSB PDB website?

Yes, the pdb-l@lists.wwpdb.org, is an open electronic mailing list for questions and discussions with the PDB user community about protein structure analysis and related topics. You can subscribe at https://lists.wwpdb.org/list/pdb-l.lists.wwpdb.org. An archive of this list can be found at https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org.

What is PDB-101?

PDB-101 is a view of the RCSB PDB that places educational materials front and center. It packages together the resources of interest to teachers, students, and the general public to promote exploration in the world of proteins and nucleic acids.



Please report any encountered broken links to info@rcsb.org
Last updated: 11/5/2024