Scores for drugs based on the genes that are significantly influenced by them. To compute a score for each drug, the scores of all influenced genes based on “GTEx (all)” (X-axis) and “CMap” (Y-axis) are averaged with weights based on the fold change of the interactions. The position of each drug in this plot is therefore a result of how ubiquitous the genes that it influences are.
Note: Hover over the markers to see drug names.
The notion of ubiquitous genes or housekeeping genes implies some kind of enrichment of important genes. Within groups of genes with certain biological associations, these genes should be overrepresented. We use GeneOntology terms as well as some other gene set sources to represent this concept. The following plot shows the number of associated terms for a non-overlapping sliding window of 500 genes along the ranking of ubiquity using our default parameters. The terms have been obtained using a gene set enrichment analysis with the tool g:Profiler. We observe that the most ubiquitous genes have many more known biological implications than any other bucket of genes. The genes of average ubiquity have almost no associations with GeneOntology terms. The number of associations rises again for the least ubiquitous genes.
Note: Click on the legend items to toggle single sources. A double-click will isolate a single source of interest.
Ubigen is a tool for analyzing genes for their ubiquity within expression
datasets. We provide results from common data sources for expression data
including GTEx and the
Human Protein Atlas within this interactive
web interface as well as via an HTTP API. The freely available
ubigen
R package also supports analyses
within custom datasets from the command line or interactively.
You can easily analyze your genes of interest for their ubiquity. In the
initial state of the application, you can select genes by their HGNC symbols
from a dropdown menu. To paste your genes as a whitespace separated list of
either HGNC symbols or Ensembl gene IDs (in the form ENSG00000164362
), change
the top dropdown from “Select from list” to the desired input method. It is also
possible to select sample data (genes involved in glycolysis according to the
KEGG pathways database [KEGG:hsa00010+M00001
]).
Once your genes have been selected, the user interface will automatically switch to the “Your genes” tab. The following information will be available online and for download (to download the visualizations, click on the camera button in their top right corner):
The panel on the left side offers multiple controls to change the parameters of the method. These include the selection of the expression dataset and the choices for each of the different criteria used to compute the ubiquity score as well as their weight contributing to that score. These are the criteria:
Criterion | Description | Default Weighting |
---|---|---|
Fraction of samples with high expression | Fraction of samples that express the gene based on a specified threshold. The threshold is computed separately for each sample and corresponds to the 95th percentile of expression within that sample. It is possible to select the median (i.e., the 50th percentile) or zero as reference values, too. | 50% |
Expression level | The median expression level (logarithmic) is determined for each gene across all samples. Alternatively, it is possible to select the mean expression. | 25% |
Expression variation | Measure of the variation of a gene’s expression between samples. By default, the interquartile range (IQR) normalized by the median is used. Other options include the IQR itself, the standard deviation and the coefficient of variation. | -25% |
The “GSEA” tab provides a simple gene set enrichment analysis using g:Profiler. This shows the significance of over-representation of annotated genes in the top part of the ranking which is a way to insinuate the biological or at least scientific relevance of the scoring.
Because of the computational requirements we can not publicly provide this
service. We are open to suggestions for inclusion of other general and publicly
available expression datasets into the public service
on request. It is always possible to run
the algorithms and also the interactive web interface on your own hardware using
the ubigen
R package.
Ubigen provides programmatic access via an HTTP API. Please see the documentation below which includes usage examples using the commonly available command cURL. There is also an exemplary Python script to show how to use the API in your own applications.
You can use the API endpoint /ranking
to download the dataset formatted in
CSV, either with the default parameters, or any combination of parameters. The
following optional query parameters are supported. They work for all queries
to the API.
Parameter | Value | Meaning |
---|---|---|
dataset |
gtex_all (default), gtex_tissues or hpa_tissues |
Which expression dataset to use as the base for all analyses. |
cross_sample_metric |
above_95 (default), above_zero or above_median |
How to determine which samples to include for a genes sample proportion. |
level_metric |
median_expression_normalized (default), mean_expression_normalized , median_expression or mean_expression |
Metric for assessing the overall expression level. |
variation_metric |
iqr_expression_normalized (default), sd_expression_normalized , iqr_expression or sd_expression |
Metric for assessing the expression variation between samples. |
cross_sample_weight |
numeric (default: 0.5 ) |
Weight of the cross-sample metric for the final score. |
level_weight |
numeric (default: 0.25 ) |
Weight of the expression level metric for the final score. |
variation_weight |
numeric (default: -0.25 ) |
Weight of the cross-sample metric for the final score. |
Example using cURL:
# Download the ranking based on default parameters.
curl "https://ubigen.uni-rostock.de/api/ranking" > ubigen_default.csv
# Use an alternative expression dataset.
curl "https://ubigen.uni-rostock.de/api/ranking?dataset=hpa_tissues" > ubigen_hpa.csv
# Ignore expression variation and focus on the mean expression level.
curl "https://ubigen.uni-rostock.de/api/ranking?level_metric=mean_expression_normalized?variation_weight=0" > ubigen_custom.csv
Custom genes can be submitted to the API using a POST
request. Include your
genes of interest as a whitespace separated list in the request body.
Do not forget to add the correct Content-Type: text/plain
header (see
examples)!
For downloading the ranking data for a user defined gene set, use a POST
request on the /ranking
endpoint. This supports all query parameters to
customize the ranking as defined above.
Example using cURL:
# Download the data for five random genes using default parameters.
curl -X POST \
-H "Content-Type: text/plain" \
-d "ENSG00000168907 ENSG00000182872 ENSG00000188763 ENSG00000196531 ENSG00000161638" \
"https://ubigen.uni-rostock.de/api/ranking" > results.csv
If you are interested in a summary of the properties of your custom gene set in
relation to other genes within the ranking, use a POST
request to the
/summary
endpoint. This supports all query parameters to customize the ranking
as defined above.
The request returns a JSON map with the following values:
Value | Meaning |
---|---|
median_percentile |
Median percentile of the selected genes within the ranking. |
median_score |
Median score of the selected genes. |
median_score_reference |
Median score of all other genes. |
p_value |
p-value for the alternative hypothesis that the selected genes have different scores than the other genes (Wilcoxon rank sum test). |
change |
Estimate of the effect size (Wilcoxon rank sum test). |
conf_int_lower |
Lower bound of the estimated 95% confidence interval (Wilcoxon rank sum test). |
conf_int_upper |
Upper bound of the estimated 95% confidence interval (Wilcoxon rank sum test). |
Example using cURL:
# Summarize the results for five random genes using default parameters.
curl -X POST \
-H "Content-Type: text/plain" \
-d "ENSG00000168907 ENSG00000182872 ENSG00000188763 ENSG00000196531 ENSG00000161638" \
"https://ubigen.uni-rostock.de/api/summary"
The above command generates a JSON map like this:
{
"median_percentile": 0.8859,
"median_score": 0.2567,
"median_score_reference": 0.1934,
"p_value": 0.0013,
"change": 0.0875,
"conf_int_lower": 0.0456,
"conf_int_upper": 0.4808
}