sRNA Expression Atlas (SEA) is a web application that allows for the search of known and novel small RNAs across ten organisms using standardized search terms and ontologies. SEA contains re-analyzed sRNA expression information for over 4200 published samples, including many disease datasets and over 769 novel, high quality predicted miRNAs. In addition SEA also stores sRNA differential expression, sRNA based classification, pathogenic sRNA signatures from bacteria and viruses and pathogen differential expression. Furthermore, SEA contains gene targets and diseases associated with a miRNA. These 4235 samples are systematically annotated with metadata. For instance, biological metadata includes standardized information about the organism, cell line, cell type, tissue type, potential diseases and more. Additional annotations include experimental details about instrument models and library strategies. The raw data from all datasets was analysed with the Oasis 2 pipelines to achieve comparable small RNA expression across many studies. In summary, SEA supports interactive result visualization on all levels, from querying and displaying of sRNA expression information to the mapping and quality information for each sample.
SEA can be searched for:
All searches for experiments or pathogen(s)/sRNA(s) start by using the search bar on the top of SEA web pages. Search suggestions are popping up when typing in the search bar. Figure 1 shows what it looks like when "Human herpesvirus 3" is entered into the search bar. Be aware that this autocomplete functionality restricts pasting to the search bar to one term at a time. SEA search will show suggestions for all different categories. Overall, the following search categories are supported: sRNA ID, Pathogen, Organism, Cell type, Tissue, Cell line, Disease, Dataset
Please note that SEA only allows searches based on suggested terms. And all terms that are suggested (while typing) are guaranteed to be in the database. If you are looking for a specific disease/tissue/organism/etc. and no matching terms are suggested, then there is no dataset with your criteria in the database.
When using several search terms, datasets are found according to the following rules:
Each experiment in the SEA database is annotated with terms that come from ontologies. In simple words, an ontology is a list of relationships between words. For example, if we take the words human and mammal, we can say that a human is a mammal. And not only humans are mammals, but mice, dogs, dolphins and pigs are mammals too. But it does not end there. All mammals are also vertebrates. And all vertebrates are chordates. Ontologies are not only restricted to organisms. Many more ontologies have been defined by independent organisations. When you use the search in SEA, all datasets will be found that match the search term but also all subterms as they are defined in the ontologies. For example, if you search for neurodegenerative disease, you will get search results from Alzheimer's and Huntington's disease. If you search for murinae you will get datasets from mice as well as from rats. This way you can be as broad or as specific with your search as you wish.
Working with SEA, you will be most likely in one of the following situations:
Violin plots are shown for the expression of hsa-mir-xxx across all datasets. The figure shows hsa-mir-xxx has expression in 81 datasets. Each vioplot show reads per million (RPM) values of the requested sRNA in the samples, that belongs to current selection annotation (tissue in this case). For example the top vioplot muscle-GSE66334 shows the expression of 12 samples from the dataset. Zero expression values are excluded in the plots. The violins can be subset by organism and sorted by different metrices such as max RPM, median RPM, mean RPM, number of samples, name and dataset id. Hovering on the violins summarizes the information in a tooltip. Clicking on a violin will show an overview of all the samples with the expression of the queried sRNA in the dataset along with meta information such as tissue, cell-type, cell line, disease and more annotations. Moreover raw sequencing data analysis output from Oasis is also shown in the detailed view. In case no ontological term such tissue, cell-type, cell line or disease is queried with sRNA, tissue annotation is shown by default for vioplots and the user can change the labels from the dropdown menu. In case a dataset contains samples from different tissues (or selected annotation, respectively) it will appear several times, a violin for each set of samples will be shown. Note : The same holds true for pathogen search
If the search contains two or more entities (sRNA or pathogens), a heatmap will be displayed instead of the violin plot. On the x-axis the dataset-annotation pair is displayed as described for the violinplot y-axis. Each cell of the heatmap represents the average rpm-value of the entity given on the y-axis in the samples belonging to the group given on x-axis. In order to select a different annotation than tissue, an example of the desired category has to be searched. E.g. diseases should be displayed, the search should contain "cancer" or "alzheimer's disease". Be aware that the heatmap might be scrollable to the right, containing more dadtasets. Hovering on the heatmap cell summarizes the information in a tooltip. Clicking on a heatmap cell will show an overview of the selected entity in the corresponding dataset.
This table shows all the diseases associated with a sRNA (only applicable if the sRNA was a micro RNA). It shows disease names and associations (publications from where this relation was obtained). All of these values are clickable links corresponding to their sources like ontology lookup service (OLS) and pubmed respectively. Tabs are shown for each miRNA in case of multiple miRNAs.
This table only shows when searching for a disease. It shows disease names, miRNA IDs and associations (publications from where this relation was obtained). All of these values are clickable links corresponding to their sources like ontology lookup service (OLS) and pubmed respectively. Tabs are shown for each disease that was searched. Since disease terms are connected to ontologies, all subtypes of the searched disease are presented in the table. (for example carcinoma, breast cancer and others when searching for the term cancer) This table has a maximal size of 50 entries (disease - miRNA combinations), so some associations might be missing. In order to get more results, a more specific term (e.g. breast cancer instead of cancer) should be searched.
Presented with the search results, there are several options to do some further analysis.
Overlap differential expression or classification results from different datasets The tables for differential expression results and comparison results provide a checkbox per experiment. By ticking these checkboxes, the experiment is selected for the SEA overlap analysis. The SEA overlap analysis, visualizes the overlap of the top differentially expressed pathogens or sRNAs between several differential expression analyses and respectively for the top features in sRNA classification. It is started by clicking the corresponding button underneath the tables (for example "sRNA DE Overlap"). User datasets can be compared with public datasets, but classification datasets cannot be compared with Differential Expression Compraisons. The following figure shows an example analysis for four different sets of sRNA being the features for classification of different samples. The top line still shows the sRNA and tissue that was searched, but in this page all sRNA are shown that play role in the selected experiments. The filter section provides fields to adjust the shown results. The minimum AUC of the used model and minimum mean decreased gini can be set and only the sRNA that have these values in a classification analysis are considered for the overlap. In case of differential expression you can filter for minimum p-adjusted value, minimum (absolute) log 2 fold change and direction of regulation. The first table on the page gives an overview of the selected experiments and has the "Set Name" in the first column which identifies the experiment in the other parts of the page. The table on the right shows all entities (in this case sRNA) that play role in at least one of the experiments according to the filter. The second column shows in which sets they play role. The filter functionality of this table might be useful when looking for a specific sRNA or experiment. The plot that is shown on this page is an upset-plot. On the x-axis all possible combinations of the four different sets are displayed. It is also encoded/visualized by the filled circles on the bottom part of the plot. The blue bar indicates the number of sRNA that are present in all experiments belonging to this intersection. Clicking on one of these bars results in displaying the IDs of the sRNAs that are present in all the exeriments of this intersection. Each sRNA is shown exactly once in this plot which means that the first 4 bars represent the sRNA that are unique to those experiments. The black bars on the left are not clickable. They are representing the sum of sRNA that play a role in the corresponding experiment.
Comparison details and resubmission to Oasis 2
Appendix Small RNA identifiers SEA uses standard small RNA identifiers for the search. The user should keep in mind that different types of small RNAs have different conventions when it comes to identifiers. For instance, microRNA IDs usually start with the species code that they are derived from. For example, a human microRNA usually starts with hsa-. The situation is similar to Piwi-interacting RNAs (piRNA). But instead of a dash the identifiers use an underscore: hsa_ Small nucleolar RNAs (snoRNAs) IDs tend to start with SNO and ribsomal RNA (rRNA) IDs usually start with a small r.