Using the Pathway Tools Overview Expression Viewer
The Pathway Tools Metabolic Overview may be used to view expression
levels for all genes that code for metabolic enzymes. The color of
each reaction step in the Overview will reflect the expression level
of the gene that encodes the enzyme for that step. The range of
expression levels in a given expression dataset is mapped to a
spectrum of colors. This facility enables the user to see instantly
which pathways are turned on or off under some set of experimental
conditions.
In addition to showing absolute expression levels, we can compare
two sets of expression data by computing the ratio of the expression
levels and mapping the ratios onto a color spectrum. There are a
variety of expression comparisons that one might wish to perform this
way:
- Show how expression levels change over the course of an
experiment by comparing a late time point with an early time point.
- Compare expression levels between two sets of experimental growth
conditions, e.g. minimal vs. rich medium.
- Compare expression levels with and without some gene mutation.
- Compare expression levels between two E. coli strains.
The superposition of multiple sets of expression data on the metabolic
overview can also be animated to show, for example, how expression levels
of enzymes change with time over the course of an experiment.
At SGD
For information about the features of the Expression Viewer as it is
implemented at SGD, please read the SGD help page.
Examples
Expression data is imported from a file provided by the user that is
stored on the user's computer. Each line of the file contains data
for a single gene and is of the form:
<gene-name-or-ID> <data-column1>...<data-columnN>
Columns are separated by the tab character. Lines that
start with # or ; are taken to be comment
lines and are ignored by the program.
The numbers in the data columns can represent either absolute or
relative expression levels. If the data values represent absolute
expression levels, you may choose to visualize either a single column of
absolute expression levels (select "Absolute" and one data column), or
the ratio of two data columns as relative expression levels (select
"Relative" and two data columns). If the data values themselves
represent relative expression levels, then you need supply only a
single column number, and select "Relative". An entry (a row of data
for a gene) may contain any number of data columns (for example, if
you wish to compile measurements from several experiments or time
points into a single file), but only those data columns specified will be
visualized at a time -- all other columns will be ignored.
The color scale used depends on the type and, by default, the range of
the data. Thus, a particular color may correspond to one expression
level for one dataset, and a different expression level for another
dataset, depending on the range of values or the supplied maximum
cutoff value for each dataset. We use the
spectrum from yellow/green to red, with yellow representing the lowest
expression levels or ratios in the dataset, blue representing values
in the middle, and red representing the highest values. Reactions for
which no data was provided are drawn in gray. The legend for mapping
colors to expression data is shown in the key, which is drawn to the
right of the overview for a single expression experiment, or to the
left for an animation.
A maximum cutoff value is chosen. By default, this is computed
from the data. Alternatively, the user may supply a maximum cutoff
value to use. Supplying the same maximum cutoff value for multiple
expression experiments ensures that the same color scale is used for
each one, so that the displays are directly comparable.
The minimum cutoff value is determined based on the maximum cutoff
value and the other parameters. For absolute expression levels, we
use a minimum cutoff value of zero. For relative expression levels
that are not logs, we use the inverse of the maximum cutoff. For
relative expression levels that are logs, we use the negative of the
maximum cutoff. The color spectrum is then mapped evenly along a log
scale between the maximum cutoff and the minimum cutoff.
In many cases, several genes, each with their own expression level,
will map to a single reaction. This is because the reaction might be
catalyzed by an enzyme complex made up of several gene products, or
the reaction might be catalyzed by several isozymes, each with its own
gene or genes. Since a reaction can only be colored a single color,
we must choose which expression level to use. For absolute expression
levels, we choose the maximum. For relative expression levels, we
choose the value whose log has the greatest deviation from zero, under
the assumption that the user is primarily interested in identifying
the genes whose expression levels differ most between the two
datasets.
After you submit your expression dataset to the Pathway Tools, the Overview
Expression Viewer returns several results:
- The Overview Diagram, colorized with expression data.
- The color key for the Overview.
- For single expression experiments, some basic statistics computed
from the data file. The program counts and lists gene names that
could not be resolved, or for which data was missing or malformed.
Since not all genes will code for enzymes, and therefore not all will
correspond to reactions in the Metabolic Overview, we compile separate
statistics for only those that are represented in the Overview and for
the dataset as a whole. The statistics that we compute and tabulate
are: number of values, minimum, maximum and median values, and mean
and standard deviation of the natural logs of the values. These
statistics are not computed when generating animations
- A histogram that shows the distribution of values in
the dataset. This histogram is displayed directly beneath the
color key. The data value range is divided into 50 intervals, using the same
criteria that we use for assigning colors. The number of data values
in each interval is shown on the histogram, colored appropriately. To
the left of the vertical axis is the histogram for the genes that are
represented in the overview. To the
right of the axis is the histogram for all other genes.
Animation Controls
An expression time series can be displayed as an animation by
specifying multiple data column numbers. The result will be a Dynamic
HTML page that initially plays the animation in a continuous loop,
showing how the expression values and histogram change with each
experiment. Four buttons control the animation. They can be used to
stop and restart the animation, and step through the
individual timepoints.
Stop the animation at the current timepoint
Start playing the animation from the current
timepoint
Go back one timepoint
Go forward one timepoint
Note that older browsers that do not support Dynamic HTML will not be able
to run the animation.