Although the number of RNA-Seq datasets deposited publicly has increased over the past few years, incomplete annotation of the associated metadata limits their potential use. Now, Texas A&M University researchers constructed a database by curating datasets related with RNA splicing factors. Their effort focused on the RNA-Seq datasets in which splicing factors were knocked-down, knocked-out or over-expressed, leading to 75 datasets corresponding to 56 splicing factors. These datasets can be used in differential alternative splicing analysis for the identification of the potential targets of these splicing factors and other functional studies. Surprisingly, only ∼15% of all the splicing factors have been studied by loss- or gain-of-function experiments using RNA-Seq. In particular, splicing factors with domains from a few dominant Pfam domain families have not been studied. This suggests a significant gap that needs to be addressed to fully elucidate the splicing regulatory landscape. Indeed, there are already mouse models available for ∼20 of the unstudied splicing factors, and it can be a fruitful research direction to study these splicing factors in vitro and in vivo using RNA-Seq.

A use case of SFMetaDB for the splicing factor Mbnl1

A case of the splicing factor Mbnl1 to demonstrates the advantage of SFMetaDB over ArrayExpress. By using the same keyword, Mbnl1, SFMetaDB returned five accurate datasets that can be used for the downstream alternative splicing analyses. On the contrary, ArrayExpress returned 13 datasets with 8 that could not be used for the downstream alternative splicing analyses for Mbnl1. (a) The result page in SFMetaDB of the query Mbnl1. (b) The description page of the dataset GSE39911 in GEO. (c) The result page in ArrayExpress of the query Mbnl1. (d) The description page of the dataset E-GEOD-76222 in ArrayExpress.

Availability – Database URL: http://sfmetadb.ece.tamu.edu/