Using the Pathway Tools Overview Expression Viewer

The Pathway Tools Metabolic Overview may be used to view expression levels for all genes that code for metabolic enzymes. The color of each reaction step in the Overview will reflect the expression level of the gene that encodes the enzyme for that step. The range of expression levels in a given expression dataset is mapped to a spectrum of colors. This facility enables the user to see instantly which pathways are turned on or off under some set of experimental conditions.

In addition to showing absolute expression levels, we can compare two sets of expression data by computing the ratio of the expression levels and mapping the ratios onto a color spectrum. There are a variety of expression comparisons that one might wish to perform this way:

The superposition of multiple sets of expression data on the metabolic overview can also be animated to show, for example, how expression levels of enzymes change with time over the course of an experiment.

At SGD

For information about the features of the Expression Viewer as it is implemented at SGD, please read the SGD help page.

Examples

Single expression experiment: Sample datafile Sample display
Time series animation: Sample datafile Sample display

Expression Dataset File Format

Expression data is imported from a file provided by the user that is stored on the user's computer. Each line of the file contains data for a single gene and is of the form:
<gene-name-or-ID>	<data-column1>...<data-columnN>
Columns are separated by the tab character. Lines that start with # or ; are taken to be comment lines and are ignored by the program.

The numbers in the data columns can represent either absolute or relative expression levels. If the data values represent absolute expression levels, you may choose to visualize either a single column of absolute expression levels (select "Absolute" and one data column), or the ratio of two data columns as relative expression levels (select "Relative" and two data columns). If the data values themselves represent relative expression levels, then you need supply only a single column number, and select "Relative". An entry (a row of data for a gene) may contain any number of data columns (for example, if you wish to compile measurements from several experiments or time points into a single file), but only those data columns specified will be visualized at a time -- all other columns will be ignored.

Color Scales

The color scale used depends on the type and, by default, the range of the data. Thus, a particular color may correspond to one expression level for one dataset, and a different expression level for another dataset, depending on the range of values or the supplied maximum cutoff value for each dataset. We use the spectrum from yellow/green to red, with yellow representing the lowest expression levels or ratios in the dataset, blue representing values in the middle, and red representing the highest values. Reactions for which no data was provided are drawn in gray. The legend for mapping colors to expression data is shown in the key, which is drawn to the right of the overview for a single expression experiment, or to the left for an animation.

A maximum cutoff value is chosen. By default, this is computed from the data. Alternatively, the user may supply a maximum cutoff value to use. Supplying the same maximum cutoff value for multiple expression experiments ensures that the same color scale is used for each one, so that the displays are directly comparable.

The minimum cutoff value is determined based on the maximum cutoff value and the other parameters. For absolute expression levels, we use a minimum cutoff value of zero. For relative expression levels that are not logs, we use the inverse of the maximum cutoff. For relative expression levels that are logs, we use the negative of the maximum cutoff. The color spectrum is then mapped evenly along a log scale between the maximum cutoff and the minimum cutoff.

In many cases, several genes, each with their own expression level, will map to a single reaction. This is because the reaction might be catalyzed by an enzyme complex made up of several gene products, or the reaction might be catalyzed by several isozymes, each with its own gene or genes. Since a reaction can only be colored a single color, we must choose which expression level to use. For absolute expression levels, we choose the maximum. For relative expression levels, we choose the value whose log has the greatest deviation from zero, under the assumption that the user is primarily interested in identifying the genes whose expression levels differ most between the two datasets.

Expression Viewer Results

After you submit your expression dataset to the Pathway Tools, the Overview Expression Viewer returns several results:
  1. The Overview Diagram, colorized with expression data.
  2. The color key for the Overview.
  3. For single expression experiments, some basic statistics computed from the data file. The program counts and lists gene names that could not be resolved, or for which data was missing or malformed. Since not all genes will code for enzymes, and therefore not all will correspond to reactions in the Metabolic Overview, we compile separate statistics for only those that are represented in the Overview and for the dataset as a whole. The statistics that we compute and tabulate are: number of values, minimum, maximum and median values, and mean and standard deviation of the natural logs of the values. These statistics are not computed when generating animations
  4. A histogram that shows the distribution of values in the dataset. This histogram is displayed directly beneath the color key. The data value range is divided into 50 intervals, using the same criteria that we use for assigning colors. The number of data values in each interval is shown on the histogram, colored appropriately. To the left of the vertical axis is the histogram for the genes that are represented in the overview. To the right of the axis is the histogram for all other genes.

Animation Controls

An expression time series can be displayed as an animation by specifying multiple data column numbers. The result will be a Dynamic HTML page that initially plays the animation in a continuous loop, showing how the expression values and histogram change with each experiment. Four buttons control the animation. They can be used to stop and restart the animation, and step through the individual timepoints.

Note that older browsers that do not support Dynamic HTML will not be able to run the animation.