A Tour of the PSI-Nature
Structural Biology Knowledgebase

The PSI-Nature Structural Biology Knowledgebase (PSI SBKB) is designed to turn the products of the Protein Structure Initiative into knowledge that is important for understanding living systems and disease. This "one-stop shop" provides users with the available genetic, structural, functional and experimental information about a particular protein of interest.

This walkthrough will introduce you to the features and search capabilities of the PSI SBKB.

Navigating the PSI SBKB

The PSI SBKB homepage makes many features available from one central place.

Features available on the PSI SBKB homepage

The central search box is the main entry point to find out more information about a protein. You can search by protein or nucleotide sequence, PDB ID (Protein Data Bank atomic 3D coordinates file ID) or also conduct a search by text. These will be described in the second half of this tutorial.

Central view: Structural Biology Update - a view of available research highlights this month.

Left Navigation Menu

Provides access to the our scientific resources, the Structural Biology Update content, and information about the Protein Structure Initiative.

Right navigation menu

E-alerts: subscribe to Nature's Email alert service or download RSS feeds which lists the monthly content and weekly structure updates.

Functional Sleuth: shows all of the PSI structures solved by the large-scale centers that do not yet have functional annotation information.

Propose Targets: a way for groups outside the PSI to submit targets and benefit from the PSI's high-throughput structure determination pipeline.

Latest PSI Statistics: a count of the protein structures solved by the PSI efforts.

See Latest Structures: a list of the PSI structures released for public use, updated weekly. (Available also as an RSS feed)

A closer look at features to browse

Functional Sleuth

In Functional Sleuth, we present structures determined by the PSI efforts whose functions are still unknown. Clicking on a structure in a gallery will perform a query of the Knowledgebase site to provide a starting point to explore each structure.

Propose Targets

The PSI Centers have developed high-throughput protein production and structure determination pipelines and new experimental methods as solutions to a number of experimental bottlenecks. The greater scientific community is invited to benefit from these efforts as well. Each PSI Center entertains target nominations for structure determination, which are vetted for feasibility and consistency with the overall PSI goals. Investigators can create an account and submit their target proposals using this feature on the PSI SBKB, and a decision is usually received within one month. Proposals accepted for structure determination must adhere to the PSI rules, most notably that structural data, including, must be deposited in the public database, the PDB, within 4 weeks of completion of the structure.

See Latest Structures

As a rule, the PSI centers must deposit their protein structures to the Protein Data Bank within 4 weeks of their completion so that these structures can quickly reach the biological communities that use them for clinical and basic studies. The "See Latest Structures" feature shows all structures released that week. This list can also be delivered to your web browser in an RSS feed.

The Structural Biology Update page

The Structural Biology Update keeps readers current in advances made by the PSI and in the fields of structural biology and structural genomics. It delivers editorials describing the latest research findings and technical highlights, an Events Calendar, recent articles from Nature News, and a Research Library categorized by experimental topic.

The Search box is also available on the right side of the page; this allows users to search anything else that they see of interest while browsing the articles and sites on the PSI SBKB.

Features available on the Structural Biology Update site

Research Advances

Editorials about recent protein structures and new techniques/methods are written each month by the Nature Publishing Group, focusing on topics that could be of broader scientific interest to anyone. Research highlights published in various Nature Journals that relate to proteins are also shared here.

Featured Molecule

Each month, the Featured PSI Molecule gives a detailed portrait of a biological molecule solved by the PSI efforts. Using interactive illustrations and generalized explanations, each article describes the features of these biologically significant targets for students of all ages.

Research Library

The Research Library is a catalog of all PSI publications to date, in addition to recent structural results and technological advances from the broader structural biology community. Updated monthly, this specialized resource is organized by subject (see below) so that users can find papers related to various solutions to problems in the protein pipeline, such as a new DNA vector for protein expression, to novel NMR or x-ray structure determination methods, to new protein function prediction resources.


To present a broader view of the latest in science in general, we provide the latest news from the Protein Structure Initiative, Nature News and other NPG publications. The monthly newsletter, "PSI in the Spotlight", contains PSI and NIGMS news items such as funding announcements, press releases, new (or changes to existing) policies, and conference reports.

Calendar of Events

An events calendar keeps the community in touch with upcoming conferences, events, and workshops that promote a structural view of biology. We also invite the community to let us know about events you would like to post here - write us at comments@sbkb.org.


Further Information about the Protein Structure Initiative

The PSI SBKB also contains site help and information regarding the PSI program, its mission, and its policies, found in the left navigation menu in the "About" links.

The "About this site" menu contains information about the SBKB - a "getting started" tutorial and classroom exercises made by OpenHelix and the SBKB group, contact information, site map, terms of us, and references in case you wish to cite the SBKB.

The "About PSI" menu has information on the PSI overall mission and goals, their biomedical themes. It also shows active funding opportunities to either become part of the PSI efforts, or to collaborate with current consortia, with links to the NIH/NIGMS announcements and notices.

The PSI centers link gives information about each PSI center and their research projects.


Searching the PSI SBKB

The PSI-Nature SBKB can be searched by one-letter code protein sequence, nucleotide sequence, plain text and Protein Data Bank identifier (PDB ID) code. The following section describes how to use these search options.

The PSI SBKB consists of a main searchable database linked with modules (PSI resources) that provide additional information about the query terms.

Searchable by sequence and PDB ID:

Experimental Data Tracking databases
TargetDB and PepcDB
Structures from the PDB
Annotations from external biological resources
Protein Model Portal - homology models
Materials Repository - DNA clones

Searchable by text:

Technology Portal - a repository of technical reports and methods provided by the PSI centers, searchable by center and by experimental step.
Publications Portal - a list of all articles published by the PSI centers.
PSI Centers - search text from within the PSI centers web sites

Next, we will discuss searching these features in detail.

Searching by Sequence or PDB ID

The PSI-Nature SBKB maintains a database of the sequences of PSI protein targets and the sequences of all solved protein structures released by the Protein Data Bank. Sequence searches are performed using the BLASTP program with an E-value cutoff of 10 for sequences less than or equal to 50 amino acids (150 nucleotides) or a E-value cutoff of 0.001 for sequences 51 amino acids or longer. To search for a particular protein sequence, enter the one-letter amino acid sequence in the search form, select the by Sequence radio button and press Search. Nucleotide sequence searched are also supported, using the BLASTX program to determine possible reading frames and displaying closely matched protein sequences.

An example query is available by selecting the by Sequence radio button, pressing "example query", and then pressing the Search button. These options are highlighted in the figure below.


The PSI-Nature SBKB maintains a database of the identifier codes for all experimental structure entries released by the Protein Data Bank. To search for a particular Protein Data Bank entry, enter the structure's 4-letter ID code in the search form, select the by PDB id radio button and press Search. An example query (2BEI) is available on the site to explore these features.

Results of a Sequence or PDB ID Search

The results of sequence and PDB ID searches are first displayed as a summary of available records relating to the input query. An example of a Results Summary is shown below.


To view query result details individually, select the DB REPORT tab at the top of the summary page. From this summary, you can view the type of information you seek:

1. Structures - displays a list of experimental structures within the PDB. The structures tab will also show all genetic, structural, and functional annotations attributed to a structure through a "notebook" view (described later)

2. Models - supplied by the Protein Model Portal (http://www.proteinmodelportal.org), displays computational models related to the sequence

3. Targets - supplied by the experimental data tracking (EDT) database, TargetDB, (http://targetdb.pdb.org), displays information on the experimental progress and status of targets selected for structure determination. Target sequences will also have annotations, even in the absence of a 3D structure.

4. Protocols - supplied by the EDT database, PepcDB, (http://pepcdb.pdb.org), displays status history, stop conditions, reusable text protocols and contact information collected from the NIH PSI and other structural genomics centers.

5. Materials - supplied by the PSI Materials Repository, (http://psimr.asu.edu/) displays DNA clones available for purchase.


The Structures Tab

The Structures tab of the DB Report provides the essentials details about any structures matching the input query. If the query results for a sequence search are displayed, then the percent of sequence identity (percent exact sequence similarity) with the input sequence is displayed for each matching structure entry (I), as well as the E value (E).

The Structures section presents:

  • a link to the RCSB PDB Structure Explorer Page,
  • a download option for the PDB format structure data file,
  • a thumbnail of the structure, which when clicked, will launch the interactive FirstGlance molecular viewer application, and
  • a "post-it" with a list of possible annotation types, which when clicked, launches a rich "notebook" view of all annotations connected to this structure (described later).

Other reference information includes:

  • PubMed and DOI for the primary citation (when available),
  • Title of the deposited structure (may not be the same as the related publication),
  • Authors
  • Structure entry deposition and release dates, and
  • Experimental method used to obtain the model.

If the structure was solved by a PSI project then this information is provided along with the associated PSI Target identifier. There is also a glossary of terms available in the upper right hand corner which defines these headings. A glossary is present for each tab.


To view the other reports, click on their tab headings (Models, Protocols, etc.)

The Annotations Notebook

Each protein target and protein structure has many biological descriptions, or annotations, attached to them. The SBKB assembles the annotations from over 150 PSI and other genomic, structural, functional, and evolutionary resources to provide you with most of the information available today about that protein sequence. These annotations are organized into a "notebook", classified by scientific topic : Gene-level view, protein-level view, structural view, biological functions, cellular localization, biochemical pathways, medicinal relationships and references.

First, you can quickly get a sense of how many annotations exist through the "quick table". By hovering the mouse over a hot linked chain ID, a quick table will appear showing you if annotations exist for ~35 popular resources. Every database that contains an annotation will be highlighted in green, and clicking on the resource name will take you directly to that record in the main "notebook" view.


The full list of annotations are available in the Notebook view. In the figure of a typical protein-level annotation notebook page below, links are provided to the databases UniProtKB (comprehensive protein database), Pfam (a protein family and motifs database), InterPro (protein family assignment), and Gene3D (predictive structural annotation).

From this view the user can see what annotation databases have data relating to the sequence, and can go directly to the record by following the link.

The Glossary of Terms, available in the top-right corner, defines these headings; in this case, the glossary describes what kind of information each linked database provides.

The Models Tab

Computational Models associated with a query sequence or structure are shown in this section.

In the case of a sequence query, the number of models that have been predicted for this sequence are presented along with a link to the details for each model. In the case of the PDB ID query, the number of computational models which are based on information from this experimental structure is presented.

All of these results are obtained by a remote query to the PSI Protein Models Portal, which collects and maintains this information. In the example below, there are 4 models from three modeling databases available. To explore, follow the "view" link to go to the PSI Protein Models Portal.

Example: using the same sequence search example,

Step 1: Once you see the results of your search, follow the "view" link.


Step 2: Explore the available pre-computer models. Included here is a graphical explanation of how the similar sequence, structures, and models relate to each other, along with domain information in gray. Also, the list of proteins IDs from UniProt that relate to the sequence. Lastly - the list of models themselves, along with a pictorial clue of model reliability with the little traffic light icon.

Part of a full Model report from the Protein Model Portal is as follows:

The Sequence Summary:
red: your query
blue: the model you are viewing.
this model consists of residues 27-357 of your query sequence.

Domain Annotation:

Reports what protein domains are recognized in your query sequence, with a link to InterPro for further information.In this example, the model is of the GDPD domain of the protein.

Structural Model:

The computation model is presented, with information related to its creation. You can also display an interactive view the model and also download its coordinates for further evaluation.

Model Quality:

Protein structure models are computational predictions which may contain errors. Based on the sequence identity to the template, a model is assigned to one of three categories of modeling complexity (see PMP for more details).

Target-Template Alignment:

The target-template alignment provided on the model info pages are generated dynamically by structural superposition of model and template structures using the program MAMMOTH.


The Targets Tab

Information about matching protein targets is shown in the Targets tab of the DB report.

The information provides the user with a status summary of the work performed on the target already. Information in this summary includes:

  • the TargetID, with a link to the record in TargetDB
  • the protein sequence alignment between your query sequence and similar sequences found in the database
  • reported target status
  • source organism
  • and PSI Target Category

The annotations "post-it", quick table, and notebook views described in the structures section is also available, and well as a Glossary of Terms in the top right corner that defines these headings.

You can read the full record by clicking on the TargetID in the report (ex. GO.74365)


The full Targets report from TargetDB is as follows:

General information, such as when the latest update occurred, the responsible center, status information, source organism and target sequence.

If the target's experimental structure was successfully determined, a link to the RCSB PDB Structure Explorer page is also given.

Links to domain annotation and function prediction databases are provided, along with calculated biochemical and biophysical parameters for the sequence.


The Protocols Tab

The Protocols section provides links to the Protein Expression Purification and Crystallization Database (PepcDB).

The information provided in this tab expands upon the information listed in the Targets tab by providing links to the experimental protocols. Information in this summary includes:

  • the TargetID, with a link to the record in PepcDB
  • the protein sequence alignment between your query sequence and similar sequences found in the database
  • links to the protocols used at each step of protein production and structure determination

Each experimental step is a link to a detailed protocol used by the structural determination center. These protocols can suggest an experimental strategy that shortens the time needed to obtain protein samples for further research.


A Glossary of Terms is available in the top right corner that defines these headings.

You can read the full report by clicking on the TargetID, or you can also read individual protocols used during the production of this protein by clicking on the experimental step (ex. expression)

The full Protocols report from PepcDB is as follows:


General information, such as the TargetID, responsible center, and UniProt entry name.

Other useful information includes the CloneID, and a link to purchase the target DNA clone, available through the PSI Materials Repository.

Then, it provides derived protein information that may elucidate structure and function, as in the Targets tab.

The novel feature is the experimental summary of this target - number of trials attempted, how far the trial progressed (and if work was stopped), as well as the protocols used during the protein production process.

Since the search query can begin from a protein sequence of interest, this database will show which protocols were successful (or unsuccessful) on similar sequences.

In this way, PepcDB can be used as a tool for experimental design.


The Materials Tab

The Materials tab provides information about the availability of relevant target DNA clone materials at the PSI Materials Repository (PSI MR). The PSI MR is a resource that provides an on-line searchable database of archived PSI genetic materials, transfer, storage and maintenance of PSI plasmids in a highly quality-controlled manner at centralized on-site and off-site locations, and the facilities to distribute PSI plasmids and supporting information for research purposes within the U.S. and abroad.

From our initial search example, the PSI MR has 7 similar target clones available to order.

The information provided in this tab:

  • the TargetID, with a link to the record in TargetDB
  • A link to order to clone
  • A link to a detailed record about the target's DNA sequence (DNA insert).
  • A link to information about the DNA vector in which the target sequence resides.

Selecting one of the last three links will transfer you to the PSI-MR DNASU web site (http://psimr.asu.edu).

To see further information about this DNA clone and the vector, including antibiotic resistance for positive selection, click on the Clone Details link. An example of a record is shown below.


Searching the PSI SBKB using plain text

The PSI-Nature SBKB maintains a 'plain text' index of all content in web pages and documents at the PSI Center web sites , PSI Technology and Publications Portal, and the Annotations Module.

To the search the PSI-Nature SBKB by plain text, enter the appropriate words in the search form, select the by Text radio button and press Search. An example query (the word "membrane") is available by selecting the "by plain text" radio button, selecting the example query link, and pressing the Search button.

The results of the text search are presented as list of pages containing the input search term (e.g. membrane) as shown below.

In the Site Search, all instances of ‘membrane' that occur on the PSI centers web site are found, including 6 highlights written for the SBKB that somehow talk about membranes and membrane proteins.

Clicking on the Structural Publication tab will show all structural articles that contain the query term; in this case, all structural publications that contain the term membrane.

These records include links to protein structures that contain the search term as well. The PubMed identifier, DOI number, and PubMed Central links to the article are provided when available, and by selecting the "Read More" link, the full citation and abstract of the article will appear.

Clicking on the Methods tab will show all PSI-published articles and reports containing the search term that focus on methodology. By selecting the "Read More" link, the full citation will be shown. In this way, you can search for new methods developed by the PSI efforts to help your own research.

Lastly, explore the site on your own.

This tutorial has walked through all of the features available that you can use towards your own research. With this "one-stop shop", you can find various sorts of assistance, from structural and annotation information about your protein, to reports and protocols about how to obtain it.

If you have any questions or comments, or would like to suggest future features for the PSI SBKB, please contact us at comments@sbkb.org.