Ubigen

Download CSV

Hover over the markers to see details on each gene. Click or drag within the figure to select genes of interest. Double-click removes the selection.

Click on gene names to view them using the GTEx website. There, you can see the tissue specific expression behavior derived from the samples that this analysis is also based on.

Download CSV

Hover over the markers to see the HGNC symbols for the genes. Click or draw within the figure to select genes of interest. Double-click removes the selection.

Download CSV

Drug effects

Scores for drugs based on the genes that are significantly influenced by them. To compute a score for each drug, the scores of all influenced genes based on “GTEx (all)” (X-axis) and “CMap” (Y-axis) are averaged with weights based on the fold change of the interactions. The position of each drug in this plot is therefore a result of how ubiquitous the genes that it influences are.

Note: Hover over the markers to see drug names.

Number of interesting genes along the ranking

The notion of ubiquitous genes or housekeeping genes implies some kind of enrichment of important genes. Within groups of genes with certain biological associations, these genes should be overrepresented. We use GeneOntology terms as well as some other gene set sources to represent this concept. The following plot shows the number of associated terms for a non-overlapping sliding window of 500 genes along the ranking of ubiquity using our default parameters. The terms have been obtained using a gene set enrichment analysis with the tool g:Profiler. We observe that the most ubiquitous genes have many more known biological implications than any other bucket of genes. The genes of average ubiquity have almost no associations with GeneOntology terms. The number of associations rises again for the least ubiquitous genes.

Note: Click on the legend items to toggle single sources. A double-click will isolate a single source of interest.

What is Ubigen?

Ubigen is a tool for analyzing genes for their ubiquity within expression datasets. We provide results from common data sources for expression data including GTEx and the Human Protein Atlas within this interactive web interface as well as via an HTTP API. The freely available ubigen R package also supports analyses within custom datasets from the command line or interactively.

How can I analyze my genes of interest?

You can easily analyze your genes of interest for their ubiquity. In the initial state of the application, you can select genes by their HGNC symbols from a dropdown menu. To paste your genes as a whitespace separated list of either HGNC symbols or Ensembl gene IDs (in the form ENSG00000164362), change the top dropdown from “Select from list” to the desired input method. It is also possible to select sample data (genes involved in glycolysis according to the KEGG pathways database [KEGG:hsa00010+M00001]).

Screenshot of the gene selector

Which information does the tool give for my genes of interest?

Once your genes have been selected, the user interface will automatically switch to the “Your genes” tab. The following information will be available online and for download (to download the visualizations, click on the camera button in their top right corner):

The overview plot showing the overall distribution of ubiquity with your custom genes highlighted.
A textual summary comparing your selected genes with the overall ranking. This includes the p-value and confidence interval resulting from a Wilcoxon rank sum test with the alternative hypothesis that your genes have different scores than other genes. Please also note the given effect size.
A boxplot comparing the scores of your genes with those of all other genes.
A detailed table (downloadable as CSV) including scores, ranks and percentiles as well as the computed parameters for your genes.

Screenshot of information on custom genes

How can I change the parameters of the method?

The panel on the left side offers multiple controls to change the parameters of the method. These include the selection of the expression dataset and the choices for each of the different criteria used to compute the ubiquity score as well as their weight contributing to that score. These are the criteria:

Criterion	Description	Default Weighting
Fraction of samples with high expression	Fraction of samples that express the gene based on a specified threshold. The threshold is computed separately for each sample and corresponds to the 95th percentile of expression within that sample. It is possible to select the median (i.e., the 50th percentile) or zero as reference values, too.	50%
Expression level	The median expression level (logarithmic) is determined for each gene across all samples. Alternatively, it is possible to select the mean expression.	25%
Expression variation	Measure of the variation of a gene’s expression between samples. By default, the interquartile range (IQR) normalized by the median is used. Other options include the IQR itself, the standard deviation and the coefficient of variation.	-25%

Screenshot of the method parameter controls

What does the GSEA tab do?

The “GSEA” tab provides a simple gene set enrichment analysis using g:Profiler. This shows the significance of over-representation of annotated genes in the top part of the ranking which is a way to insinuate the biological or at least scientific relevance of the scoring.

Screenshot of the GSEA plot

How can I perform analyses of ubiquity relative to my own expression dataset?

Because of the computational requirements we can not publicly provide this service. We are open to suggestions for inclusion of other general and publicly available expression datasets into the public service on request. It is always possible to run the algorithms and also the interactive web interface on your own hardware using the ubigen R package.

API access

Ubigen provides programmatic access via an HTTP API. Please see the documentation below which includes usage examples using the commonly available command cURL. There is also an exemplary Python script to show how to use the API in your own applications.

Using the API for retrieving the data

You can use the API endpoint /ranking to download the dataset formatted in CSV, either with the default parameters, or any combination of parameters. The following optional query parameters are supported. They work for all queries to the API.

Parameter	Value	Meaning
`dataset`	`gtex_all` (default), `gtex_tissues` or `hpa_tissues`	Which expression dataset to use as the base for all analyses.
`cross_sample_metric`	`above_95` (default), `above_zero` or `above_median`	How to determine which samples to include for a genes sample proportion.
`level_metric`	`median_expression_normalized` (default), `mean_expression_normalized`, `median_expression` or `mean_expression`	Metric for assessing the overall expression level.
`variation_metric`	`iqr_expression_normalized` (default), `sd_expression_normalized`, `iqr_expression` or `sd_expression`	Metric for assessing the expression variation between samples.
`cross_sample_weight`	numeric (default: `0.5`)	Weight of the cross-sample metric for the final score.
`level_weight`	numeric (default: `0.25`)	Weight of the expression level metric for the final score.
`variation_weight`	numeric (default: `-0.25`)	Weight of the cross-sample metric for the final score.

Example using cURL:

# Download the ranking based on default parameters.
curl "https://ubigen.uni-rostock.de/api/ranking" > ubigen_default.csv

# Use an alternative expression dataset.
curl "https://ubigen.uni-rostock.de/api/ranking?dataset=hpa_tissues" > ubigen_hpa.csv

# Ignore expression variation and focus on the mean expression level.
curl "https://ubigen.uni-rostock.de/api/ranking?level_metric=mean_expression_normalized?variation_weight=0" > ubigen_custom.csv

Using the API to analyze your genes

Custom genes can be submitted to the API using a POST request. Include your genes of interest as a whitespace separated list in the request body.

Do not forget to add the correct Content-Type: text/plain header (see examples)!

Limit ranking to selected genes

For downloading the ranking data for a user defined gene set, use a POST request on the /ranking endpoint. This supports all query parameters to customize the ranking as defined above.

Example using cURL:

# Download the data for five random genes using default parameters.
curl -X POST \
     -H "Content-Type: text/plain" \
     -d "ENSG00000168907 ENSG00000182872 ENSG00000188763 ENSG00000196531 ENSG00000161638" \
     "https://ubigen.uni-rostock.de/api/ranking" > results.csv

Summarize information on your genes

If you are interested in a summary of the properties of your custom gene set in relation to other genes within the ranking, use a POST request to the /summary endpoint. This supports all query parameters to customize the ranking as defined above.

The request returns a JSON map with the following values:

Value	Meaning
`median_percentile`	Median percentile of the selected genes within the ranking.
`median_score`	Median score of the selected genes.
`median_score_reference`	Median score of all other genes.
`p_value`	p-value for the alternative hypothesis that the selected genes have different scores than the other genes (Wilcoxon rank sum test).
`change`	Estimate of the effect size (Wilcoxon rank sum test).
`conf_int_lower`	Lower bound of the estimated 95% confidence interval (Wilcoxon rank sum test).
`conf_int_upper`	Upper bound of the estimated 95% confidence interval (Wilcoxon rank sum test).

Example using cURL:

# Summarize the results for five random genes using default parameters.
curl -X POST \
     -H "Content-Type: text/plain" \
     -d "ENSG00000168907 ENSG00000182872 ENSG00000188763 ENSG00000196531 ENSG00000161638" \
     "https://ubigen.uni-rostock.de/api/summary"

The above command generates a JSON map like this:

{
    "median_percentile": 0.8859,
    "median_score": 0.2567,
    "median_score_reference": 0.1934,
    "p_value": 0.0013,
    "change": 0.0875,
    "conf_int_lower": 0.0456,
    "conf_int_upper": 0.4808
}

Your genes

Method

Drug effects

Number of interesting genes along the ranking

What is Ubigen?

How can I analyze my genes of interest?

Which information does the tool give for my genes of interest?

How can I change the parameters of the method?

What does the GSEA tab do?

How can I perform analyses of ubiquity relative to my own expression dataset?

API access

Using the API for retrieving the data

Using the API to analyze your genes

Limit ranking to selected genes

Summarize information on your genes