SEA Documentation

sRNA Expression Atlas (SEA) is a web application that allows for the search of known and novel small RNAs across ten organisms using standardized search terms and ontologies. SEA contains re-analyzed sRNA expression information for over 4200 published samples, including many disease datasets and over 769 novel, high quality predicted miRNAs. In addition SEA also stores sRNA differential expression, sRNA based classification, pathogenic sRNA signatures from bacteria and viruses and pathogen differential expression. Furthermore, SEA contains gene targets and diseases associated with a miRNA.

These 4235 samples are systematically annotated with metadata. For instance, biological metadata includes standardized information about the organism, cell line, cell type, tissue type, potential diseases and more. Additional annotations include experimental details about instrument models and library strategies. The raw data from all datasets was analysed with the Oasis 2 pipelines to achieve comparable small RNA expression across many studies.

In summary, SEA supports interactive result visualization on all levels, from querying and displaying of sRNA expression information to the mapping and quality information for each sample.

Search Options

SEA can be searched for:

  • sRNAs originating from miRBase, Ensembl as well as from the repository of novel predicted miRNAs from Oasis
  • Datasets based on ontology-linked metadata
  • pathogenic sRNA signatures from bacteria and viruses
  • gene targets
  • diseases associated with a miRNA
  • miRNA associated with diseases

A search can be performed with individual search terms as well as combinations thereof. The central feature of SEA is the powerful ontology-based search which allows the user to easily find datasets that are relevant to their research. This can help answering, for example, the following questions:
  1. What is expression of one or more sRNAs/pathogens in specific cell types or tissues?
  2. Is a particular sRNA/pathogen differentially expressed in a disease e.g alzheimer’s disease?
  3. Compare sRNAs across different studies (tissues, disease, cell types or cell line)?
  4. What are differentially expressed sRNAs in breast cancer and healthy women?
  5. Common differentially expressed sRNAs or potential sRNAs based biomarker across particular disease or tissue.
  6. Expression of one or more novel miRNAs for known diseased states.
  7. Get all gene targets for a miRNA.
  8. Get all diseases that are associated with a miRNA from literature.
  9. Get all miRNAs that are associated with a disease or it’s sub-types.
  10. Get genomic coordinates for sRNA or target genes.

All searches for experiments or pathogen(s)/sRNA(s) start by using the search bar on the top of SEA web pages. Search suggestions are popping up when typing in the search bar. Figure 1 shows what it looks like when "Human herpesvirus 3" is entered into the search bar. Be aware that this autocomplete functionality restricts pasting to the search bar to one term at a time.
SEA search will show suggestions for all different categories. Overall, the following search categories are supported: sRNA ID, Pathogen, Organism, Cell type, Tissue, Cell line, Disease, Dataset

search patho several
Figure 1: Suggestions when typing "human her"

Please note that SEA only allows searches based on suggested terms. And all terms that are suggested (while typing) are guaranteed to be in the database. If you are looking for a specific disease/tissue/organism/etc. and no matching terms are suggested, then there is no dataset with your criteria in the database.

Combining several search terms

When using several search terms, datasets are found according to the following rules:

  • If there are several search terms with the same category (e.g. several different diseases), at least one search term is matching.
  • If there are several search terms with different search categories (sRNA ID, pathogen name, disease, tissue, cell type, ...), all search criteria have to be met.
  • The combination of the first two points leads to datasets for which at least one search term of EACH search category is true.
[TODO] For example, let us assume we search for samples from tissue or muscle in human Psoriasis patients (see Figure x). Once we hit the enter key in the search bar, we will go to the search results page (Figure x). At the top you will see your search query again. You can see that SEA searched for skin or muscle tissue. Given that Psoriasis is a disease of the skin, it is not surprising that we do not find any datasets that contain muscle tissue.

Searching with ontologies

Each experiment in the SEA database is annotated with terms that come from ontologies. In simple words, an ontology is a list of relationships between words. For example, if we take the words human and mammal, we can say that a human is a mammal. And not only humans are mammals, but mice, dogs, dolphins and pigs are mammals too. But it does not end there. All mammals are also vertebrates. And all vertebrates are chordates.

Ontologies are not only restricted to organisms. Many more ontologies have been defined by independent organisations. When you use the search in SEA, all datasets will be found that match the search term but also all subterms as they are defined in the ontologies. For example, if you search for neurodegenerative disease, you will get search results from Alzheimer's and Huntington's disease. If you search for murinae you will get datasets from mice as well as from rats. This way you can be as broad or as specific with your search as you wish.

Search Results

Working with SEA, you will be most likely in one of the following situations:

  1. Being interested in a specific sRNA/pathogen (entity) and wanting to obtain expression datasets that contain this entity.
  2. Being interested in a specific tissue/disease/cell-line and the sRNAs that are differentially expressed or used in classification (healthy vs disease patients or tissue) in this tissue/disease/cell-line.
  3. Having a sRNA and a tissue/disease/cell-line you want to research on, you now are simply looking for datasets that feature this sRNA in this particular tissue/disease/cell-line.
(Your starting point might also involve more than one sRNA, pathogen and/or tissue. Feel free to add more search terms in this case.)

Keep in mind that SEA keeps track of your search/filtering criteria as you go through the results. If you select a specific sRNA, a specific dataset and/or a specific tissue, all subsequent diagrams and tables will be based only on datasets that fulfill this criteria.

The following examples will use the microRNA hsa-miR-1-3p and the tissue brain. Search Srna with Anno
All tables in SEA, all columns are sortable with TIM sort. Click the header of a column to sort and unselect from the TIM sort list. (Example in Dataset Figure) Also, tables can be filtered by terms appearing in any column of the table. (Example in Disease Associations Figure). Most tables are downloadable as tab-seperated file. The download link is on the buttom right of a displayed table.


Please choose your starting point:
  1. Search by sRNA(s)/pathogen(s)
  2. Search by sRNA(s)/pathogen(s) along with any terms (tissue, disease, cell type, cell line)
  3. Search only by metadata terms and their combination


  1. Search by sRNA(s)/pathogen(s)
    Searching for one or more sRNAs/pathogens, SEA will present the following results :
    1. Expression values
    2. Datasets
    3. Differential Expression
    4. sRNA specific output:
    5. Classification analysis
    6. Targets
    7. Disease associations
    8. Chromosome locations


    1. Expression values
      The first section shows the expression of the requested sRNA/pathogen in all experiments in the form of a violinplot if only one is searched, or a heatmap in case of multiple searched sRNAs or pathogens. The violins and heatmap-cells can be clicked to investigate the expression profiles in a particular experiment (dataset). The results of the click shows a table with the expression of the selected sRNA or pathogen in each sample, as well as the metadata for all samples such as sampleid, tissue, disease, cell type, experimental process, extracted molecule etc. This view also shows the analysis results for the raw sequencing data from Oasis.

      Violin Plot

      Violin plots are shown for the expression of hsa-mir-xxx across all datasets. The figure shows hsa-mir-xxx has expression in 81 datasets. Each vioplot show reads per million (RPM) values of the requested sRNA in the samples, that belongs to current selection annotation (tissue in this case). For example the top vioplot muscle-GSE66334 shows the expression of 12 samples from the dataset. Zero expression values are excluded in the plots. The violins can be subset by organism and sorted by different metrices such as max RPM, median RPM, mean RPM, number of samples, name and dataset id. Hovering on the violins summarizes the information in a tooltip.
      Clicking on a violin will show an overview of all the samples with the expression of the queried sRNA in the dataset along with meta information such as tissue, cell-type, cell line, disease and more annotations. Moreover raw sequencing data analysis output from Oasis is also shown in the detailed view.
      In case no ontological term such tissue, cell-type, cell line or disease is queried with sRNA, tissue annotation is shown by default for vioplots and the user can change the labels from the dropdown menu. In case a dataset contains samples from different tissues (or selected annotation, respectively) it will appear several times, a violin for each set of samples will be shown.

      Note : The same holds true for pathogen search



      If the search contains two or more entities (sRNA or pathogens), a heatmap will be displayed instead of the violin plot. On the x-axis the dataset-annotation pair is displayed as described for the violinplot y-axis. Each cell of the heatmap represents the average rpm-value of the entity given on the y-axis in the samples belonging to the group given on x-axis.
      In order to select a different annotation than tissue, an example of the desired category has to be searched. E.g. diseases should be displayed, the search should contain "cancer" or "alzheimer's disease".
      Be aware that the heatmap might be scrollable to the right, containing more dadtasets.
      Hovering on the heatmap cell summarizes the information in a tooltip.
      Clicking on a heatmap cell will show an overview of the selected entity in the corresponding dataset.

      Heatmap
    2. Datasets
      The table of experiments (datasets) that have samples with expression of the searched sRNA(s) or pathogen(s) displays their metadata such as number of samples in the experiment, tissue, cell line, organism disease, cell type and a link to the Oasis 2 output (analysis results of the raw sequencing data with detailed quality information). Datasets Overview

    3. Differential Expression
      This section shows two tables with differential expression values (log2 fold change and p-adjusted value) in comparisons (within experiment) for the searched sRNA(s) or pathogen(s). The first table shows all comparisons that were uploaded by the user. The second table shows all comparisons within public experiments where the searched sRNA(s) or pathogen(s) are differentially expressed.
      The tables also show the condition on which this differential expression analysis was performed such healthy vs diseased, covariates (if any) and a link to a detail page. This page shows a TSNE plot and for sRNA the analysis results of differential expression performed with Oasis 2, which has all the diagnostic plots. For pathogens it displays the analysis results of an in-house pipeline in table format.
      The detail page is explained more in detail in section analysis. Differential Expression



    4. Classification analysis
      This section shows two tables with the mean decrease gini in comparisons (within experiment) for the searched sRNA. The first table shows all comparisons that were uploaded by the user. The second table shows all comparisons within public experiments where the searched sRNA(s) are used as an important feature for classifying two conditions (e.g healthy vs disease).
      The table also shows the condition on which this classification analysis was performed such healthy vs diseased, AUC (for the whole model) and a link to a detail page with TSNE plot and analysis results of classification analysis performed with Oasis2, which shows all the diagnostic plots and also options for downstream enrichment analysis.
      The detail page is explained more in detail in section analysis. Classification
    5. Targets
      This table shows all the targets associated with a sRNA (only applicable if the sRNA is a micro RNA). The table shows gene names, ENSEMBL gene ids and associations (publications from where this relation was obtained). All of these values are clickable links corresponding to sources like genecards and pubmed respectively. Tabs are shown for each miRNA in case of multiple miRNAs.
    6. Disease associations

      This table shows all the diseases associated with a sRNA (only applicable if the sRNA was a micro RNA). It shows disease names and associations (publications from where this relation was obtained). All of these values are clickable links corresponding to their sources like ontology lookup service (OLS) and pubmed respectively. Tabs are shown for each miRNA in case of multiple miRNAs.

      diseaseasso_table_filtered
    7. Chromosome locations
      Genomic coordinates for the searched sRNA(s) such as sRNA-ID, biotype (miRNA, snoRNA, piRNA, siRNA etc), chromosome, start, end, strand and organism are shown in this section. Tabs are shown for each sRNA in case of multiple miRNAs.
    8. srna info

  2. Search by sRNA(s)/pathogen(s) along with any terms (tissue, disease, cell type, cell line)
    In case you search for one or more sRNAs/pathogens along with a term such as tissue, disease, cell type, cell line etc, the results are shown exactly the same as explained in section 1, except that the datasets are already filtered for the term. Also, the expression plots will be labeled with the terms of the same category of the searchterm. E.g. looking for cancer datasets results in an expression plot (violinplot or heatmap) with disease labels.

  3. Search only by metadata terms and their combination
    In case you search for one or more terms such as tissue, disease, cell type, cell line, SEA presents the following results :
    1. Datasets
      Table of experiments (datasets) that contain samples annotated with the searched term. The table shows dataset-ID along with metadata such as number of samples in the experiment, tissue, cell line, organism, disease, cell type and a link to Oasis 2 output (analysis results of the raw sequencing data with detailed quality information).
    2. Intra dataset comparisons
      This table shows the all intra dataset comparisons for the experiments in the Datasets table. It displays dataset ID, condition on which the corresponding comparison was performed such as healthy vs diseased, covariates (if any), and links to analysis results of sRNA differential expression, pathogen differential expression and sRNA classification based analysis results.
    3. miRNA - Disease Associations

      This table only shows when searching for a disease. It shows disease names, miRNA IDs and associations (publications from where this relation was obtained). All of these values are clickable links corresponding to their sources like ontology lookup service (OLS) and pubmed respectively. Tabs are shown for each disease that was searched. Since disease terms are connected to ontologies, all subtypes of the searched disease are presented in the table. (for example carcinoma, breast cancer and others when searching for the term cancer) This table has a maximal size of 50 entries (disease - miRNA combinations), so some associations might be missing. In order to get more results, a more specific term (e.g. breast cancer instead of cancer) should be searched.

      diseaseasso_table

On the fly Analysis

Presented with the search results, there are several options to do some further analysis.

Overlap differential expression or classification results from different datasets
The tables for differential expression results and comparison results provide a checkbox per experiment. By ticking these checkboxes, the experiment is selected for the SEA overlap analysis. The SEA overlap analysis, visualizes the overlap of the top differentially expressed pathogens or sRNAs between several differential expression analyses and respectively for the top features in sRNA classification. It is started by clicking the corresponding button underneath the tables (for example "sRNA DE Overlap"). User datasets can be compared with public datasets, but classification datasets cannot be compared with Differential Expression Compraisons. The following figure shows an example analysis for four different sets of sRNA being the features for classification of different samples.
Overlap Comparisons The top line still shows the sRNA and tissue that was searched, but in this page all sRNA are shown that play role in the selected experiments. The filter section provides fields to adjust the shown results. The minimum AUC of the used model and minimum mean decreased gini can be set and only the sRNA that have these values in a classification analysis are considered for the overlap. In case of differential expression you can filter for minimum p-adjusted value, minimum (absolute) log 2 fold change and direction of regulation. The first table on the page gives an overview of the selected experiments and has the "Set Name" in the first column which identifies the experiment in the other parts of the page. The table on the right shows all entities (in this case sRNA) that play role in at least one of the experiments according to the filter. The second column shows in which sets they play role. The filter functionality of this table might be useful when looking for a specific sRNA or experiment. The plot that is shown on this page is an upset-plot. On the x-axis all possible combinations of the four different sets are displayed. It is also encoded/visualized by the filled circles on the bottom part of the plot. The blue bar indicates the number of sRNA that are present in all experiments belonging to this intersection. Clicking on one of these bars results in displaying the IDs of the sRNAs that are present in all the exeriments of this intersection. Each sRNA is shown exactly once in this plot which means that the first 4 bars represent the sRNA that are unique to those experiments. The black bars on the left are not clickable. They are representing the sum of sRNA that play a role in the corresponding experiment.



Comparison details and resubmission to Oasis 2

The comparison details page can be reached from any comparison table (classification or differential expression) from the main search. It shows a TSNE plot and in case of sRNA the Oasis 2 report, in case of pathogens a list of all differentially expressed pathogens (figure on the right).

T-distributed Stochastic Neighbor Embedding (t-SNE) is an algorithm for visualization. It is used to represent high-dimensional data into a low-dimensional space of two or three dimensions. TSNE plots are useful to understand whether control and treatment groups cluster separately, but are also good for understanding if other noticeable features in the data are present. The GSE69825 data, for example, clusters into two biological conditions, where control samples appear in red and treatment samples appear in blue (second image on the right).

At the top of the TSNE plot section a "Re-analyze" button is provided for sRNA classification and differential expression of public datasets. Samples can be selected from the TSNE plot and again submitted to Oasis 2 for a new analysis (for example for outlier removal). The results could be uploaded as user data and they would be comparable them with the ones that are already available in SEA. For resubmission follow the following steps:
  1. press the green plus button to create a new group
  2. click lasso select button
  3. select samples from the TSNE plot (third figure on the right)
  4. press green plus again to create a second group
  5. select the second group of samples from the TSNE
  6. check the groups if they are as wanted
  7. press blue send button (next to green plus)
  8. add eMail adress where the results should be sent to
  9. click submit
Pathogen DE Comparison Overview Lasso Select

Uploading own data

One of the most useful features of SEA is for users to upload their analysis results of differential expression and classification from Oasis 2 to SEA. This allows to compare own inhouse data to over 4200 experimental samples across different conditions. For example, if you are working on sRNA-seq data for cancer, you can perform DE or classification using Oasis 2 and then upload the data to SEA. Now you can directly compare the inhouse differentially expressed sRNAs to all the publicly available cancer datasets in SEA. You can make overlaps on the fly as well. This will reduce a lot of work for the scientists who needs to cross compare to many studies. All your sRNA and term related searches will now include the output from your inhouse data as well.

For uploading own data, click the red arrow on the top right of the screen. It is only possiblie to upload data, if you are logged in. Log in by pressing the user icon next between the upload botton and the search bar. Generate your own user token Save this token. This token is unique to you. Of course you an share it with your lab members. The data you upload is connected to this token. It is not possible to upload the same Oasis 2 output with several different token. This is meant to protect your data. When you are logged in you can upload a DE or classification Oasis 2 result. Fill in all the fields in the upload form and press upload. Warning, Error and Success -messages should guide you through all the possibilities.
Upload User Data

Getting help : Contact us

Appendix

Small RNA identifiers
SEA uses standard small RNA identifiers for the search. The user should keep in mind that different types of small RNAs have different conventions when it comes to identifiers. For instance, microRNA IDs usually start with the species code that they are derived from. For example, a human microRNA usually starts with hsa-. The situation is similar to Piwi-interacting RNAs (piRNA). But instead of a dash the identifiers use an underscore: hsa_ Small nucleolar RNAs (snoRNAs) IDs tend to start with SNO and ribsomal RNA (rRNA) IDs usually start with a small r.

Backup
Search Srna Search patho with anno