Scores for drugs based on the genes that are significantly influenced by them. To compute a score for each drug, the scores of all influenced genes based on “GTEx (all)” (X-axis) and “CMap” (Y-axis) are averaged with weights based on the fold change of the interactions. The position of each drug in this plot is therefore a result of how ubiquitous the genes that it influences are.
Note: Hover over the markers to see drug names.
The notion of ubiquitous genes or housekeeping genes implies some kind of enrichment of important genes. Within groups of genes with certain biological associations, these genes should be overrepresented. We use GeneOntology terms as well as some other gene set sources to represent this concept. The following plot shows the number of associated terms for a non-overlapping sliding window of 500 genes along the ranking of ubiquity using our default parameters. The terms have been obtained using a gene set enrichment analysis with the tool g:Profiler. We observe that the most ubiquitous genes have many more known biological implications than any other bucket of genes. The genes of average ubiquity have almost no associations with GeneOntology terms. The number of associations rises again for the least ubiquitous genes.
Note: Click on the legend items to toggle single sources. A double-click will isolate a single source of interest.
Ubigen is a tool for analyzing genes for their ubiquity within expression
datasets. We provide results from common data sources for expression data
including GTEx and the
Human Protein Atlas within this interactive
web interface as well as via an HTTP API. The freely available
ubigen R package also supports analyses
within custom datasets from the command line or interactively.
You can easily analyze your genes of interest for their ubiquity. In the
initial state of the application, you can select genes by their HGNC symbols
from a dropdown menu. To paste your genes as a whitespace separated list of
either HGNC symbols or Ensembl gene IDs (in the form ENSG00000164362), change
the top dropdown from “Select from list” to the desired input method. It is also
possible to select sample data (genes involved in glycolysis according to the
KEGG pathways database [KEGG:hsa00010+M00001]).
Once your genes have been selected, the user interface will automatically switch to the “Your genes” tab. The following information will be available online and for download (to download the visualizations, click on the camera button in their top right corner):
The panel on the left side offers multiple controls to change the parameters of the method. These include the selection of the expression dataset and the choices for each of the different criteria used to compute the ubiquity score as well as their weight contributing to that score. These are the criteria:
| Criterion | Description | Default Weighting |
|---|---|---|
| Fraction of samples with high expression | Fraction of samples that express the gene based on a specified threshold. The threshold is computed separately for each sample and corresponds to the 95th percentile of expression within that sample. It is possible to select the median (i.e., the 50th percentile) or zero as reference values, too. | 50% |
| Expression level | The median expression level (logarithmic) is determined for each gene across all samples. Alternatively, it is possible to select the mean expression. | 25% |
| Expression variation | Measure of the variation of a gene’s expression between samples. By default, the interquartile range (IQR) normalized by the median is used. Other options include the IQR itself, the standard deviation and the coefficient of variation. | -25% |
The “GSEA” tab provides a simple gene set enrichment analysis using g:Profiler. This shows the significance of over-representation of annotated genes in the top part of the ranking which is a way to insinuate the biological or at least scientific relevance of the scoring.
Because of the computational requirements we can not publicly provide this
service. We are open to suggestions for inclusion of other general and publicly
available expression datasets into the public service
on request. It is always possible to run
the algorithms and also the interactive web interface on your own hardware using
the ubigen R package.