Searching  Pathway Database 

How to Use a Pathway Tools Website

Contents

    1  Overview

    2  Selecting the Database to Search

    3  Searching Pathway/Genome Databases
        3.1  Quick Search
        3.2  Search Menu: Object Searches
        3.3  Search Menu → Compounds
        3.4  Search Menu → Genes/Proteins/RNAs
        3.5  Search Menu → Reactions
        3.6  Search Menu → Pathways
        3.7  Search Menu → Advanced Search
        3.8  Ontology Searches
        3.9  Search Menu → Google This Site
        3.10  Search Menu → BLAST
        3.11  Search Menu → Search Full-text Articles

    4  Cellular Overview
        4.1  Organization of the Cellular Overview
        4.2  Getting Started
        4.3  Summary of the Mouse Commands
        4.4  Summary of the General Commands
        4.5  Organism Selection
        4.6  Searching and Highlighting
        4.7  Omics Viewer (Overlay Experimental Data)
        4.8  Getting Started
        4.9  Omics Dataset File Format
        4.10  Examples
        4.11  Color Scale
        4.12  Omics Viewer Results
        4.13  Submitting Highlight Operations via a HTTP Get (URL)
        4.14  Submitting Expression Data Via an HTTP GET.
        4.15  Submitting Expression Data Via an HTTP POST
        4.16  Submitting Expression Data Via an HTTP GET or POST for Display on Individual Pathways

    5  Regulatory Overview
        5.1   Summary of the Mouse Commands
        5.2  Organism Selection
        5.3  Layout Selection
        5.4  Highlighting Genes and Regulatory Relationship Arrows
        5.5  Redisplay Highlighted Genes Only
        5.6  Omics Viewer for Regulatory Overview

    6  Comparative Analysis

    7  BLAST Search

    8  Web Accounts

    9  Web Services
        9.1  BioPAX XML Services
        9.2  Pathway Tools XML Services

    10  How to Learn More

1  Overview

This document describes how to use Web sites based on the Pathway Tools software from SRI International. Since multiple Web sites such as BioCyc, YeastCyc, AraCyc, and MouseCyc are all based on the same underlying software, the same usage instructions apply to all. (Note that differences in configuration and in software version may introduce some variability among sites).

2  Selecting the Database to Search

Unless otherwise indicated, all Pathway/Genome Database searches are restricted to a single database. In most cases, a database describes a single organism – although a small number of multi-organism Pathway/Genome Databases exist (examples include MetaCyc and PlantCyc). The database against which searches will be conducted is indicated below the Quick Search box in the page banner, and at the bottom of the Search and Tools pull-down menus.

To search a different database, click on the ‘change’ link (found below the Quick Search box, and at the bottom of the Search and Tools menus). In the dialog that pops up, you can either search for the organism of interest in the scrollable list, or you can start typing in its name.

When a large number of databases is available, the alphabetical index to the left of the database list provides a convenient shortcut for scrolling to a desired part of the alphabet. If you start typing an organism name, the full list of databases will be replaced by a list of databases matching the string you typed— you can use the mouse or the up/down arrows on your keyboard to select the desired database. Lists of your recently used databases and the site’s most popular databases provide shortcuts for selecting those databases.

If the site supports user accounts, and you are logged in, you may select one database as your preferred database. This database will be your default selection when starting a new web session.

Once you have selected the desired database, click OK to exit the dialog. The page will reload, and the text under the Quick Search box should now indicate the newly selected database. Note that if you are looking at a page that contains data from a particular organism, selecting a new database will not affect the contents of the current page – the new selection will apply only to your future searches.

3  Searching Pathway/Genome Databases

3.1  Quick Search

The Quick Search box in the upper right hand corner of every page is useful if you know the name (or part of the name) or database identifier of the object you are searching for. You may use this box to search for genes, proteins, compounds, RNAs, reactions, pathways, operons, and GO terms. If the query string matches a single object, the page for that object will be displayed immediately. If there are multiple matches, the full list of matches will be shown, organized by the type of object (e.g. gene, protein, etc.).

Some examples of what can be entered into the Quick Search box include:

A few additional rules govern searches:

3.2  Search Menu: Object Searches

The Search menu contains links to specialized search pages for Compounds, Genes/Proteins/RNAs, Reactions and Pathways. Each such page contains options for searching using a number of different criteria, either individually or in combination. When the page is initially loaded, only the name searches are active, but by clicking on the different search bars, you can enable or disable additional search criteria. If multiple search criteria are specified for a given search, then unless otherwise specified the results must satisfy all of them (that is, an AND connector is used to combine the different criteria).

The results of all object searches is a table containing the names of all objects that satisfy the search, with hyperlinks to their corresponding data pages, along with any additional columns relevant to the particular search. The table will initially be sorted alphabetically by name, but small triangles in the column headers allow the user to sort by any column, in either ascending or descending order.

The sections below describe the different search criteria that are available for each object type.

3.3  Search Menu → Compounds

3.4  Search Menu → Genes/Proteins/RNAs

3.5  Search Menu → Reactions

3.6  Search Menu → Pathways

3.7  Search Menu → Advanced Search

The Advanced Search tool facilitates generation of queries that are more complex than those supported by the object search tools described above. Using the Advanced Search tool, you can write queries that combine data from multiple organisms or multiple types of objects, and you can search fields that are not supported by the individual object search pages. Detailed instructions for using the Advanced Search tool to construct complex queries are available here.

3.8  Ontology Searches

An ontology is a carefully constructed vocabulary of terms, often called a controlled vocabulary. The terms are organized into a classification hierarchy (also called a taxonomy). Ontologies can be used to browse and search for objects by drilling down from more general categories to more specific ones. Each Pathway/Genome Database contains several ontologies. Those that can be searched are available from the Ontologies sub-menu in the Search menu. These ontologies can also be accessed from the object search page for their particular object type. The browseable ontologies are:

3.9  Search Menu → Google This Site

The Search Menu → Google This Site command uses Google to perform a full text search over this entire Web site. Searches will not be restricted to the selected database, and can locate text strings found in page comments, help pages, and other page content not queryable by other means. Submitting this form will direct the user outside this Web site to a page generated by Google. A Google full text search is also offered as an option when a Quick Search fails to return any result (or does not return the desired result).

3.10  Search Menu → BLAST

This facility (not available for MetaCyc) allows you to perform sequence-similarity searches using the BLAST program to compare your protein or nucleic acid sequence against the complete genome of the selected organism database.

3.11  Search Menu → Search Full-text Articles

Textpresso is a package for indexing and searching a corpus of biological literature. Textpresso searches are available for searching a large Escherichia coli literature corpus only at the BioCyc Web site, and are available only when EcoCyc is the selected database.

4  Cellular Overview

The Cellular Overview diagram depicts the biochemical machinery of an organism as described in a PGDB. Each node in the diagram (such as the small circles and triangles) represents a single metabolite, and each blue line represents a single bioreaction. This page describes the organization of the Cellular Overview and the operations users can perform to interrogate it. Different PGDBs will have different components of the diagram present or absent depending on what was included by the PGDB authors.

Note: The Cellular Overview has been tested on Internet Explorer 8.0, Firefox 3.5, Safari 4.0 and Chrome 2.0. It is recommended not to use Internet Explorer for the Cellular Overview since its performance can be very poor. The performance of the three other browsers are much better compared to Internet Explorer.

Note: The desktop version of Pathway Tools that you can install locally provides different and additional operations on the Web Overview. click here for more details.

4.1  Organization of the Cellular Overview

Within the cytoplasmic membrane, the small-molecule metabolism of the organism is depicted in several regions. The glycolysis and the TCA cycle pathways, if present, will be placed in the middle of the diagram to separate predominately catabolic pathways on the right from pathways of anabolism and intermediary metabolism on the left. The existence of anaplerotic pathways prevents rigid classification. The majority of pathways operate in the downward direction. Signal transduction pathways, if present, run along the bottom of the diagram. Pathways are grouped into related clusters as indicated by the shaded regions.

The large group of individual reactions at the right of the diagram represent reactions of small-molecule metabolism that have not been assigned to any pathway.

The shapes of the metabolite icons represent various compound classes. The different shapes used are as follows:

The one or more cellular membranes of the organism are depicted, depending on the cellular architecture of the organism, and on whether that architecture was specified when the PGDB was created. Transporters will be depicted in the membrane in which they reside as blue lines whose arrowhead indicates the direction of transport. For gram-negative bacteria, periplasmic proteins will be depicted when identified in the PGDB.

4.2  Getting Started

The Cellular Overview is accessible from the command Tools → Cellular Overview. The current selected organism, as displayed on the right in the banner of the Web page, is used to generate the Cellular Overview diagram. The generation of the diagram can take some time if it was not previously generated by the Web server.

Once the Cellular Overview diagram is displayed, the most common operation is to move it left, right, up or down, since sometimes the entire overview cannot fit in the Web page. This can be done by holding down your left mouse button in a blank area then moving the mouse in the desired direction. This is called a panning operation. Panning can also be done by a small increment by clicking the arrows on the widget located on the left top of the screen.

To zoom-in or zoom-out, you can use the icon in the form of a ladder on the left of the overview Web page. Each step of the ladder is a zoom level. You can select any one of them at any time. You can also click a plus or minus sign (displayed on the top and bottom of this ladder) to zoom-in (increase size) or zoom-out (decrease size) the Cellular Overview. By increasing the zoom level (i.e., going up in the ladder), names of compounds, enzymes, reactions, and pathways are eventually displayed.

Note that depending on the speed of the server, generating large Cellular Overviews (i.e., a zoom-in near the top of the ladder) might require some time.

Mousing over a Cellular Overview icon (e.g., a ‘tee’ icon for a tRNA) displays information about the object in a small tooltip popup. Click the ‘Keep Open’ button to keep that informational window open; drag the window by its title to re-position it.

Note for Mac users with a one-button mouse: left-click is the usual click, and right-click is the Mac control-click (i.e., you hold down the control key and click). But the exact keys can be customized on your Mac via the system preferences panel.

All the commands for the Cellular Overview are available from the right-clicking menu or the menu Cellular Overview in the top menu bar.

The Cellular Overview can display your experimental data. See the Omics Viewer Section 4.7 below.

4.3  Summary of the Mouse Commands

4.4  Summary of the General Commands

The commands in the Cellular Overview menu are:

The following sections describe in more detail these operations and some others.

4.5  Organism Selection

Selecting a new organism through the organism selector does not immediately change the Cellular Overview to this organism. At any moment you can display the complete Cellular Overview of the selected organism by selecting the command Display Cellular Overview in the menu obtained by right-clicking menu in a blank area, or from the menu bar Cellular Overview → Display Cellular Overview. If the selected database has no cellular diagram available, the next invoked command will display a warning to that effect. For example, MetaCyc, which is a multi-organism database, has no cellular diagram.

4.6  Searching and Highlighting

In this document, ‘Searching’ and ‘Highlighting’ are synonymous terms. There are several commands to search for reactions, pathways, enzymes, genes, and compounds. The search commands are available from the right-click menu and the the Cellular Overview menu from the top menu bar.

When a search is done, the objects found are highlighted in the Cellular Overview diagram which also creates a new overlay. The list of overlays is shown in the Layer Switcher panel on the right of the Overview Web page. This panel might be minimized, in which case a small icon with a plus-sign is shown. Click on the plus-sign icon to open the panel. From this panel you can activate or deactivate specific overlays. You cannot delete an individual overlay. But all highlighting, i.e., all overlays, can be removed by using the command Clear All Highlighting.

Since each overlay corresponds to a search operation, an overlay is identified with the keyword you entered to do the search. This is the name of the overlay. Next to each name a button labeled ‘List.’ Clicking ‘List’ opens a small dialog window listing the objects found for the corresponding search. Each object name is a hyperlink—clicking any of these links centers the Overview on the corresponding object and a red marker emphasizes its location.

4.7  Omics Viewer (Overlay Experimental Data)

The Pathway Tools Omics Viewer uses the Metabolic Overview for an organism to illustrate the results of high-throughput experiments in a global metabolic pathway context. The Omics Viewer can also be used for the Regulatory Overview, but only genes are involved in that case. Genes (in the case of a gene expression experiment) and proteins (in the case of a proteomics experiment) that are involved in metabolism are mapped to reaction steps in the Metabolic Overview, and the range of data values levels in a given experimental dataset is mapped to a spectrum of colors. Reaction steps in the Metabolic Overview are colored according to the corresponding data value. Similarly, for metabolomics experiments, compound nodes are colored according to the data value for the corresponding compound. This facility enables the user to see instantly which pathways are active or inactive under some set of experimental conditions.

The Omics Viewer can be used for:

The Omics Viewer can show absolute data values (such as the concentration of a metabolite or protein, or the absolute expression level of a gene), or it can be used to compare two sets of experimental data by computing a ratio and mapping the ratios onto a color spectrum.

The superposition of multiple sets of experimental data on the metabolic overview can also be animated to show, for example, how gene expression levels of enzymes change with time over the course of an experiment.

4.8  Getting Started

The command Overlay Experimental Data (Omics Viewer), available from the right-click menu and the top menu bar Cellular Overview, overlays experimental data over the Cellular Overview diagram.

Once the Overlay Experimental Data command is invoked, a window will open, called the Omics Form, where you can specify a data file to upload and various parameters to control the interpretation of the data. The parameters are documented in the window but more details follow on the file format and the parameters to specify.

4.9  Omics Dataset File Format

Experimental data is imported from a file provided by the user that is stored on the user’s computer. Each line of the file contains data for a single gene, protein, reaction or metabolite, and is of the form:

<name‑or‑ID> <data‑column1>...<data‑columnN>

Columns are separated by the tab character. Lines that start with # or ; are taken to be comment lines and are ignored by the program.

<name‑or‑ID> can be either a common name for an object (the BioCyc data typically includes extensive synonym lists, and every attempt is made to match a name to the appropriate target), or the BioCyc internal ID for the object. Gene IDs from sequencing projects (such as the E. coli B-numbers) are generally acceptable and unambiguous. For protein or reaction data, EC numbers may be used. You must specify whether the entities in the <name‑or‑ID> column are genes, proteins, reactions, compounds, or a mixture.

The numbers in the data columns can represent either absolute or relative values. If the data values represent absolute numbers, you may choose to visualize either a single column of absolute data values (select ‘Absolute’ and one data column), or the ratio of two data columns as relative data values (select ‘Relative’ and two data columns). If the data values themselves represent relative numbers, then you need supply only a single column number, and select ‘Relative’. An entry (a row of data for a gene or other object) may contain any number of data columns (for example, if you wish to compile measurements from several experiments or time points into a single file), but only those data columns specified will be visualized at a time—all other columns will be ignored.

4.10  Examples

Single gene expression experiment: Sample datafile and brief description See Cellular Overview for this data using ratio of columns 11 and 12.
Time series gene expression animation: Sample datafile and brief description See Cellular Overview for this data using columns 2 to 5.

4.11  Color Scale

The color scale used depends on the type and, by default, the range of the data. Thus, a particular color may correspond to one gene expression level for one dataset, and a different gene expression level for another dataset, depending on the range of values or the supplied maximum cutoff value for each dataset. We use the spectrum from yellow/green to red, with yellow representing the lowest expression levels or ratios in the dataset, blue representing values in the middle, and red representing the highest values. Reactions for which no data was provided are drawn in black. The legend for mapping colors to data values is shown in the key, which is drawn to the right of the overview for a single experiment, or to the left for an animation.

A maximum cutoff value is chosen. By default, this is computed from the data. Alternatively, the user may supply a maximum cutoff value to use. Supplying the same maximum cutoff value for multiple experiments ensures that the same color scale is used for each one, so that the displays are directly comparable.

The minimum cutoff value is determined based on the maximum cutoff value and the other parameters. For absolute data values, we use a minimum cutoff value of zero. For relative data values that are not logs, we use the inverse of the maximum cutoff. For relative data values that are logs, we use the negative of the maximum cutoff. The color spectrum is then mapped evenly along a log scale between the maximum cutoff and the minimum cutoff.

In many cases, several genes or proteins, each with their own expression level or concentration, will map to a single reaction. This is because the reaction might be catalyzed by an enzyme complex made up of several gene products, or the reaction might be catalyzed by several isozymes, each with its own gene or genes. Since a reaction can only be colored a single color, we must choose which data value to use. For absolute data values, we choose the maximum. For relative data values, we choose the value whose log has the greatest deviation from zero, under the assumption that the user is primarily interested in identifying the entities whose behavior differ most between the two datasets.

4.12  Omics Viewer Results

Once the form to upload the data is submitted, by clicking the submit button at the bottom of the Omics Form, the data is processed by the Web server. The time to process the file depends on the speed of the server and the amount of data in the file. The results are returned to your browser in the form of highlighted objects (e.g., reactions). If several data experiments are loaded from the same file (i.e., several data columns are provided from the uploaded file), an animation is created where each step of the animation corresponds to one experiment (i.e., one column).

A small dialog window is opened to display the color scale for the experiment(s) and buttons to control the animation, if any. You can pause, restart, go forward or backward, increase or decrease the animation speed from this window.

Overlaying exprimental data can be done at any zoom level. Once the data is uploaded and overlayed, zooming out or in can be done, and the corresponding highlighting will be adjusted accordingly.

The tooltips for highlighted objects show the experimental data. The data displayed changes during an animation.

4.13  Submitting Highlight Operations via a HTTP Get (URL)

You can submit via a Web browser or other Web navigating software, to a Pathway Tools server, a URL containing a description of which objects to highlight in a Cellular Overview Diagram for a specific organism or a pre-defined expression data file residing on the Web server for the Omics Viewer. This URL also typically specifies a zoom level. Such a URL is also known as a HTTP Get operation. Using one of the provided operation, see table below for a list, it is also possible to submit expression data. But the amount of data to send is limited by the maximum URL length. To submit a large amount of data, say more than 50 expression values, use the HTTP GET or POST method as described in the next section. The resulting Cellular Overview will be returned as a HTML page.

Essentially, such a URL can be used instead of manually selecting an organism, visiting the Cellular Overview Web page, and performing highlighting interactively.

The general form of such a URL is:

     <host>/overviewsWeb/celOv.shtml?zoomlevel=<integer>&orgid=<orgid>&<op>=<string>

The notation <...> should not be used literally but represents a value or parameter to specify. For example, <integer> represents an integer. (See below for a URL example.)

The host depends on the server you want to access. For example,for BioCyc, the <host> is BioCyc.org.

All the parameters (after the question mark ’?’) are optional, but it is recommended to have at least the organism id (orgid) specified as otherwise a default organism will be used which depends on the Web server used. The zoomlevel parameter specifies an integer value. The first zoom level is 0 (smallest overview) and currently the highest is 6 (largest Overview).

You can specify zero, one, or more ‘op’ operations. The following table gives a summary of the possible operations and the corresponding highlight operations. Notice that all these operations correspond to the operations available from the top menu bar when a Cellular Overview diagram is displayed.

The possible operations ‘op’ for highlighting are in the following table. The operation xnids is special as it accepts expression data as well. An expression value can be specified after each name.

Op Highlight Operation
rnids Highlight reaction names or frame ids.
rsubs Highlight reaction substrings.
recns Highlight reaction EC numbers.
pnids Highlight pathway names or frame ids.
psubs Highlight pathway substrings.
gnids Highlight gene names or frame ids.
gsubs Highlight gene substrings.
enids Highlight enzyme names or frame ids.
esubs Highlight enzyme substrings.
cnids Highlight compound names or frame ids.
csubs Highlight compound substrings.
xnids Highlight a mix of names and frame ids with or without expression data.

The string specified after the ‘=’ for an operation must not be quoted and any special character must be URL encoded. The string is not case-sensitive.

The following URL would open the Cellular Overview for organism Escherichia coli K-12 substr. MG1655 at zoom level 0, and create three highlighting overlays: 1) reactions with names having substring ‘hydro’, 2) reactions with names having substring ‘oxy’, 3) reactions, compounds, and proteins related to genes having a name with substring ‘arg’.

     biocyc.org/overviewsWeb/celOv.shtml?zoomlevel=0&orgid=ECOLI&rsubs=hydro&rsubs=oxy&gsubs=arg

Such URLs with highlighting operations can be automatically generated using the command Cellular Overview → Generate Bookmark for Current Cellular Overview.

4.14  Submitting Expression Data Via an HTTP GET.

The Omics Viewer can be started by using a URL link with expression data that resides on a Web server. In that case, no data is sent from the browser but parameters are specified in the URL. In other words, a GET request can be done to start the Cellular Omics Viewer with data on a Web server and the parameters specified on the GET request itself.

For example the following URL link, when clicked, would start the Cellular Overview Omics Viewer with the expression data from file time-series.txt on the Webserver BioCyc.org (from a subdirectory expr-examples) at zoom level 0, for organism ecoli, and using columns 2 to 4 from the file. The file contains gene names (or frame ids) in the first column.

  biocyc.org/overviewsWeb/celOv.shtml?omics=t
     &url=http://biocyc.org/expr-examples/time-series.txt
     &zoomlevel=0&orgid=ECOLI&column1=2-4&class=gene

All parameters that can be specified, except the parameter datafile, are listed in Section Submitting ExpressionData Via a HTTP POST. The additional parameters that can be specified for a URL (aka GET request) are listed in the following table.

omics Its value must be the letter t. This parameter says to start the Omics Viewer.
orgid An organism identifier. This is the unique identifier of the PGDB corresponding to the desired Cellular Overview. The data file (specified below) must correspond to this identifier.
zoomlevel An integer from 0 to 6 to specify the zoom level of the Cellular Overview.
url The path to the file containing the expression data to upload in the Omics Viewer. This path uses protocol ‘file://’ or ‘http://’. The file protocol refers to a file on the Web server accessed, not your local computer. The http protocol can upload a file from any Web server, not just the Web server accessed to start the Omics Viewer.

4.15  Submitting Expression Data Via an HTTP POST

You can submit data to the Cellular Omics Viewer without using the GUI interface (i.e., the dialog window) but using an HTTP POST method. This would typically be done by users that prefers to design their own GUI interface to submit the expression data or to perform expression data analysis from another software or Web site.

An HTTP POST method is a standard mechanism to submit data to a Web server. It is composed of a URL, parameters and their value, and data.

In most cases, for a Web page, you would use a POST method when using a HTML form. Technical details about HTML forms can be found at Forms at W3C. We currently support only the application/x-www-form-urlencoded content type as the expression data is not sent as binary, but as text.

The general URL syntax to send the POST method request is

        http://biocyc.org/<orgid>/overview-expression-map

where <orgid> is the organism identifier for the database to access.

The data section of the post data, named ‘datafile’, can contain multiple lines of data. Each line is a row of a table, and the table can have multiple columns. The column of data are separated by the tab character. The first column, having index 0, contains the name or frame ids of the objects to consider for the Omics Viewer. They might be genes, compounds, reactions, proteins, pathways, or a mix of these.

If you are using your own HTML form, you do not need to consider the encoding details of the data sent as the browser automatically do the encoding. But you do need to know the name of the parameters to use in your form, their possible values, and their meaning. The following table summarize that information.

The Cellular Overview accepts the following parameters:

Name of Parameters

Possible Values Meaning

datafile

multi-line of data as a table Contains all the data to use for the Omics Viewer. Some columns of data might not be referred by the following parameters which is not considered an error.
expressiontype absolute relative In absolute mode, the range of values used for the colors is based on the data itself. For relative, the range of values for the colors is assumed symmetric around 0. That is the maximum absolute value in your data is used as the positive maximum value, and the range is extended, if required, on the negative side to make the range symmetric.
numcolumns 1 2 1 means expression value is based on one data column, 2 means two columns are used for expression data, one numerator and one denominator
column1 integer(s) or range of integers The columns containing the data (for numerators, if numcolumns is 2).
column2 integer(s) or range of integers The columns containing the denominators, if numcolumns is 2.
color default specify 3-color The color scheme to use. The default uses the full color spectrum for the values. The ‘specify’ scheme uses the full color spectrum, but a maximum cut-off value can be specified. The 3-color scheme uses only three colors, and you specify a maximum threshold to use to mark the values red when they are above it, yellow if they are below its additive inverse or blue if the values are in-between.
maxcutoff a number The full color spectrum is used. The maximum value to use when color scheme ‘specify’ is specified.
threshold a number The maximum threshold value to use when 3-color is specified.
log on off If you want your data to be interpreted as a log scale, specify ‘on’, otherwise specify ‘off’. In ‘off’ mode, all negative data are discarded.
class gene protein compound reaction nil The type of names or frame ids in column 0. The value ‘nil’ means any type.

You might also submit a POST request using any programming language, open a network connection to our Web server on port 80, and submit a correctly formed HTTP request. The data that follows the header needs to conform to the application/x-www-form-urlencoded content type encoding, and the parameters name needs to be the ones mentioned in the table above.

An example of parameters with data follows. There are nine rows of data, the expressiontype is relative, the values are not made of ratio (numcolumns=1), the log scale is used, the class of objects specified in the data section are genes, columns 1 and 2 are used to create an animation, no column2 value is given since this is not using a ratio, the color scheme is the default one, and there are no threshold nor a maximum cutoff value specified.

-----------------------------2120074538104283772813445706

Content-Disposition: form-data; name="datafile"; filename="example.txt"

Content-Type: text/plain

b0468	0.104350612	0.033206048	
b0469	-0.186798562	-0.095079653	
b0470	0.013754626	-0.047423932	
b0471	-0.040057242	-0.047031817	
b0472	0.199058776	0.226302741	
b0473	-0.067916761	-0.430108036	
b0474	0.084574625	-0.11661899	
b0475	-0.067991385	0.09949957	
b0476	0	0	0.930017971	

-----------------------------2120074538104283772813445706
Content-Disposition: form-data; name="expressiontype"

relative
-----------------------------2120074538104283772813445706
Content-Disposition: form-data; name="numcolumns"

1
-----------------------------2120074538104283772813445706
Content-Disposition: form-data; name="log"

on
-----------------------------2120074538104283772813445706
Content-Disposition: form-data; name="class"

gene
-----------------------------2120074538104283772813445706
Content-Disposition: form-data; name="column1"

1-2
-----------------------------2120074538104283772813445706
Content-Disposition: form-data; name="column2"
-----------------------------2120074538104283772813445706
Content-Disposition: form-data; name="color"

default
-----------------------------2120074538104283772813445706
Content-Disposition: form-data; name="maxcutoff"
-----------------------------2120074538104283772813445706
Content-Disposition: form-data; name="threshold"
-----------------------------2120074538104283772813445706--

4.16  Submitting Expression Data Via an HTTP GET or POST for Display on Individual Pathways

Instead of displaying the entire cellular overview, you may wish to see omics data superimposed on one or only a few pathways of interest. In order to do this, you must know the Pathway Tools object identifiers of the pathways you wish to display.

To submit data via the HTTP POST method for display on specified pathways, use the same syntax and parameters described in Section Submitting Expression Data Via a HTTP POST, with an additional parameter named “pathways”. The value of this parameter should be one or more pathway identifiers, separated by whitespace or punctuation. The result will be a table containing one row for each pathway. Each row contains the pathway name, the pathway diagram with data superimposed, and a list of enzymes and genes in the pathway. If the data represents multiple timepoints, then the table will contain a diagram column for each timepoint.

To submit data via the HTTP GET method for display on specified pathways, use the same syntax and parameters as for POST, described above, but instead of submitting data using the datafile parameter, you must specify a URL from which the data can be retrieved, using the url parameter described in Submitting Expression Data Via a HTTP GET. Following is an example URL to retrieve a table of three particular pathways with omics data superimposed:

  http://biocyc.org/ECOLI/overview-expression-map?url=http://myserver/mydata.txt
     &expressiontype=relative&numcolumns=1&column1=1&log=on&class=gene
     &color=default&pathways=GLYCOLYSIS,TCA,TRPSYN-PWY

You can use the same URL, but omitting all the omics parameters, to retrieve a table of selected pathways without any omics data, for example:

  http://biocyc.org/ECOLI/overview-expression-map?pathways=GLYCOLYSIS,TCA,TRPSYN-PWY

You can also add most of the same omics parameters to the URL for a regular pathway page to see omics data added to the regular pathway diagram. For example, the URL to display the pathway page for the TCA pathway in EcoCyc at a detail level that shows all the enzyme and gene names is:

  http://biocyc.org/ECOLI/new-image?type=PATHWAY&object=TCA&detail-level=2

To add omics data to this page, supply the omics parameters, as in the following example:

  http://biocyc.org/ECOLI/new-image?type=PATHWAY&object=TCA&detail-level=2
     &url=http://myserver/mydata.txt&expressiontype=relative&numcolumns=1
     &column1=1&log=on&class=gene&color=default

There is no way to add omics data to standard pathway pages using the HTTP POST method.

5  Regulatory Overview

Note: The regulatory overview has been tested on Internet Explorer 7.0, Firefox 3.3, Safari 4.0 and Chrome 2.0. It is recommended not to use Internet Explorer for the regulatory overview since its performance can be very slow when manipulating a large number (more than 100) of highlighted genes. The performance of the three other browsers are much better compared to Internet Explorer.

The regulatory overview enables you to visually analyze the regulatory relationships between genes for a specific organism. These relationships are based on the regulatory data available in the database (i.e., PGDB) of the organism. Currently, the relationships are based on transcriptional regulatory data (future versions may cover other types of regulation).

The regulatory overview is represented as a network with nodes and arrows (i.e., arcs). Each node represents a gene of a specific organism. There is an arrow from gene A to gene B if and only if A regulates B.

When first displayed, the overview does not show any regulatory arrow relationships since, typically, their great number would clutter the overview. These arrows can be selectively added by using the highlighting commands. See the sections below for more information on highlighting commands.

Not all organisms have regulatory data in their PGDB. If the command Tools → Regulatory Overview is grayed out, no regulatory overview can be displayed for the selected organism. Otherwise, by selecting the command Tools → Regulatory Overview a regulatory overview Web page will open and the complete regulatory overview of the selected organism will be displayed. The menu Regulatory Overview will be added to the top menu bar. It has several commands specifically for the regulatory overview.

It is possible to display a regulatory subnetwork of a specific organism by doing a series of highlighting and then use the command Redisplay Highlighted Genes Only. This command will create a new, smaller layout of the regulatory network that contains the genes that are highlighted only. Genes that do not regulate, or are not regulated by any highlighted genes, are not included in the subnetwork. Further operations can be done on this subnetwork as for the complete overview. See the Section Redisplay Highlighted Genes Only below for more details.

The most common operation is to move the regulatory overview left, right, up or down, since sometimes the entire network cannot fit entirely in the Web page. This can be done by holding down your left mouse button in a blank area then moving the mouse in the desired direction. This is called a panning operation. Panning can also be done by a small increment by clicking the arrows on the graphic at the top left of the screen called the panning widget.

To zoom-in or zoom-out, you can use the icon in the form of a ladder on the left of the overview Web page. Each step of the ladder is a zoom level. You can select any one of them at any time. You can also click a plus or minus sign (displayed on the top and bottom of this ladder) to zoom-in (increase size) or zoom-out (decrease size) the regulatory network. By increasing the zoom level (i.e., going up in the ladder), the gene names might overlap the network nodes— increasing the zoom level should remove such overlaps. The last zoom level (i.e., the last step of the ladder) will always force the display of all gene names in the network.

Note that depending on the speed of the server, generating large regulatory network overviews (i.e., a zoom-in near the top of the ladder) may require some time. They might have been already generated or they might need to be generated by the server. Accordingly, the response time might vary.

Mousing over a gene node displays a tooltip with data about the genes, its product, the possible ligand, the direct regulatees and regulators. Left-clicking the gene node will open a new Web page containing even more data specific for the gene.

Other more complex visual commands can be reached by right-clicking on genes or in a blank area. This is discussed in detail in the following sections.

Note for Mac users with a one-button mouse: left-click is the usual click, and right-click is the Mac control-click (i.e., you hold down the control key and click). But the exact keys to use may be customized on your Mac via the preferences panel.

5.1   Summary of the Mouse Commands

The following sections describe in more details these operations and some others.

5.2  Organism Selection

Selecting a new organism through the organism selector does not immediately change the regulatory overview to this organism. The next operation such as zoom-in or zoom-out will apply to the new selected organism. At any moment you can display the complete regulatory overview of the selected organism by selecting the command Display Complete Regulatory Overview under the right-clicking menu in a blank area or from the top menu bar Regulatory Overview → Display Complete Regulatory Overview. If the selected database has no regulatory data, the next regulatory command will display a warning to that effect.

5.3  Layout Selection

For any organism, there are two layouts available: nested ellipses or top to bottom.

The layout nested ellipses uses up to three ellipses to display the gene nodes. The inner most ellipse contains, in alphabetical order of the gene names, the genes that have the largest number of regulatees. The middle ellipse contains genes that regulate at least one gene. The outer ellipse contains the genes that have no regulatees. They might be displayed as groups of genes regulated by the same set of genes (a multi-regulon). This is typically done using triangles or a short straight line if the group is small.

The layout top to bottom uses several straight rows to display the gene nodes. Each row contains genes that do not directly regulate each other. The top row contains the genes that regulate the largest number of genes. The bottom row contains genes that do not regulate any genes. In between rows contain genes that regulate some other genes. As for the nested ellipses layout, this row might have genes grouped in straight lines or triangles.

5.4  Highlighting Genes and Regulatory Relationship Arrows

There are several commands to highlight genes and show the regulatory relationship arrows between them.

Two commands use the gene name, or a substring of gene names, or a gene frame-id. Both of these commands are available by right-clicking in a blank area, or from the top menu bar under Regulatory Overview. The command Highlight Gene By Name or Frame ID highlights at most one gene. It is essentially a search command since you might not know the location of that gene in the regulatory network. Once found, the regulatory network will be centered on the location of the gene. The command Highlight Genes By Substring may highlight several genes. Selecting the command opens a panel from which you can enter a string of characters. Once clicking the button labeled Highlight in the panel, the genes highlighted have a name that contains the given string (this is a case-insensitive search). For this command it is also possible to include the regulatory relationships between the genes found.

The command HighlightGenesByGeneOntologyTerms accessible from the right-clicking menu enables you to select one or more Gene Ontology (GO) terms. The genes that produce proteins annotated with the selected GO terms will be highlighted. The option Include Relationships Arrows enables you to add relationship arrows between the highlighted genes. Note that if you are displaying a subnetwork, there might be genes with such products in the organism but that these might not be in the subnetwork. In such a case, a warning is given that no genes have been highlighted.

Right-clicking on a gene will open a menu of highlighting commands specific to that gene. The menu may contain from one to seven commands. Since some genes do not have any regulators or/and any regulatees, this list of commands may vary from gene to gene. Here are the list of all possible commands available from this menu where name will be the gene name (e.g., trpA) on which the right-clicking was done. The highlighting is done with one a specific color but that color changes from one executed highlighting command to the next.

When a highlighting operation is done, a new overlay is created. The list of overlays is shown in the Layer Switcher panel on the right of the overview Web page. This panel may be minimized, in which case a small icon with a plus-sign is shown. Click on the plus-sign icon to open the panel. From this panel you can activate or deactivate specific overlays. This is particularly useful if you use the command Redisplay Highlighted Genes Only.

All highlighting can be removed by using the command Clear All Highlighting.

For more information about highlighting, see Section Redisplay Highlighted Genes Only.

5.5  Redisplay Highlighted Genes Only

The command Redisplay Highlighted Genes Only will display a regulatory network by considering only the genes that are highlighted. The layout is changed to “top to bottom” since it is usually a better layout when using a small set of genes. This command would be used after a series of highlighting operations to select a set of genes to analyze closely. The current displayed regulatory network will be removed and a new regulatory network will be displayed. The active highlighting will remain active. All overlays (active or not) will also remain. It is useful to keep the deactivated overlays since you may come back to the complete regulatory network and reactivate them to recreate a new regulatory subnetwork. Note that genes that do not regulate or are not regulated by any highlighted genes are not included in the subnetwork.

To redisplay the complete regulatory network, use the command Display Complete Regulatory Overview accessible when right-clicking in a blank area. The current active overlays remain active and the deactivated overlays are not removed.

The information in tooltips within a subnetwork display (produced when mousing over gene nodes) are restricted to that subnetwork. That is, the tooltip’s list of regulatees and regulators are for the subnetwork, not for the entire regulatory network of the organism. However, when you transition from a subnetwork display back to the display of the entire network, any highlighting done on a subnetwork will be expanded for the entire regulatory network to show relationships within the full network. For example, if gene A has four direct regulatees in a subnetwork, but twenty regulatees in the entire network, when the operation Highlight Gene A and its Direct Regulatees is applied in the subnetwork, only the four regulatees are highlighted, but once you redisplay the entire network, the twenty regulatees will be highlighted.

5.6  Omics Viewer for Regulatory Overview

The Pathway Tools Omics Viewer for the Regulatory Overview illustrates the results of high-throughput experiments in the context of gene regulation. Genes that are involved in regulation are mapped to gene icon in the Regulatory Overview diagram, and the range of data values levels in a given experimental dataset is mapped to a spectrum of colors. This facility enables the user to see instantly which genes are active or inactive under some set of experimental conditions.

The Omics Viewer for the Regulatory Overview is very similar to the Omics Viewer for the Cellular Overview. The main difference is that the data file must contain in its first column gene names or frame ids. To start the Omics Viewer for the regulatory overview, use the command Overlay Experimental Data (Omics Viewer) under the Regulatory Overview menu.

See the Omics Viewer Section 4.7 for more information on how to use the Omics Viewer.

6  Comparative Analysis

Comparative Analysis allows users to generate summaries of individual PGDBs, or compare statistics between PGDBs. Currently we support comparative analysis of reactions, pathways, compounds, proteins, orthologs, transporters, and transcription units. Prior to running the comparative analysis, you will be prompted to select one or more PGDBs for which to perform the analysis. To access the Comparative Analysis tool, go to: Tools Menu → Comparative Analysis.

7  BLAST Search

Pathway Tools has an optional feature that allows Pathway / Genome Databases (PGDB) that have sequence data to be searched using NCBI BLAST.

To access the Web interface for BLAST searches, go to: Search Menu → BLAST.

Documentation on the use of the Web interface for NCBI BLAST can be found here.

8  Web Accounts

Pathway Tools Web accounts give you the ability to have frequent users enrich their experience when accessing PGDBs via the Web.

Web site accounts provide several benefits. Through your account you can:

The Web accounts system is optional for a Pathway Tools Web server. If enabled, you should see a login prompt at the upper right corner of any Web page from a Pathway Tools Web server. Please see the Pathway Tools User Guide for more information on how to set up web accounts for your website.

9  Web Services

Pathway Tools data is available in XML format via several different REST-based web services (as well as in a variety of XML and non-XML based downloadable formats). All of the URLs in this section assume that you are attempting to access a Pathway Tools web server located at http://host.domain.org. Obviously you should instead use the actual web address of the Pathway Tools site you are attempting to access (such as http://websvc.biocyc.org).

9.1  BioPAX XML Services

Pathway data for an individual pathway is available in BioPAX format (both BioPAX Level 2 and Level 3). The URL to access a pathway in BioPAX format is:

http://host.domain.org/[ORGID]/pathway-biopax?type=[2|3]&object=[PATHWAY]

where

Example URLs:

9.2  Pathway Tools XML Services

Any pathway, reaction, compound, gene, protein, RNA or transcription-unit object in a BioCyc database can be retrieved in ptools-xml format, an XML format that is based on and closely resembles the underlying Pathway Tools schema. A single object can be requested using its internal BioCyc identifier, or a query can be issued using the BioVelo query language to retrieve multiple objects.

For information on interpreting ptools-xml format, see the Guide to ptools-xml and the Pathway Tools Schema Guide.

The URL to access an object in ptools-xml format is

http://host.domain.org/getxml?[ORGID]:[OBJECT-ID]
or
http://host.domain.org/getxml?id=[ORGID]:[OBJECT-ID]&detail=[none|low|full]

where

Example URLs:

The URL to issue a BioVelo query that returns a list of objects in ptools-xml format is

http://host.domain.org/xmlquery?[QUERY]
or
http://host.domain.org/xmlquery?query=[QUERY]&detail=[none|low|full]

where

Example URLs:

10  How to Learn More

    Search  Pathway Database 

Return to SGD Send a Message to the SGD Curators