Here, we present an open-source, comprehensive, flexible bioinformatics platform named BioInstaller that can be used to collect, manage and share various types of bioinformatics resources and to perform interactive and reproducible data analyses. By utilizing a simplified and standard Tom’s Obvious, Minimal Language (TOML) format configuration file with extra parse functions, the developers and users can freely and unreservedly share their public or internal bioinformatics tools/scripts and databases online on the GitHub repository or other hosts. In addition, users can easily obtain access to pooled bioinformatics resources via the diverse interfaces of BioInstaller, which includes R functions, the Shiny application ( Chang et al., 2015 ) and HTTP representational state transfer (REST) application programming interfaces (APIs) that are rarely adopted in other similar tools. As a practical demonstration, we collected 157 tools/scripts and 110 databases specifically related to genetic variants annotation using the BioInstaller-defined configuration files. Notably, we developed a Shiny application to support functions including system monitoring, the logging system, file management, the queue system, and so on. This application can easily be reused in other Shiny applications. We expect the BioInstaller package and the practices in this work to reduce the difficulty of constructing the interactive and reproducible biological data analysis applications for R users, and to further improve the interactivity and reproducibility of bioinformatics data analysis.

To increase the convenience of BioInstaller for nonprogramming users, a user-friendly web application was developed based on Shiny ( Chang et al., 2015 ). The user-interface (UI) of BioInstaller was constructed using the R package shinydashboard ( https://cran.r-project.org/package=shinydashboard ) and Shiny ( Chang et al., 2015 ). Output tables were generated by the R package DT ( https://CRAN.R-project.org/package=DT ) and wrapped JavaScript library DataTables ( https://datatables.net/ ). Charts were mainly generated by published R packages and in-house scripts or R packages that all support interactive update and export of PDF, SVG, and PNG format plots. The tab items of the BioInstaller Shiny application at the left side of the navigation bar can be used to switch among all available modules, including “Introduction,” “Dashboard,” “Upload,” “File Viewer,” “Pipeline,” “Instant,” “Installer,” and “Setting.” The detail usage guidelines are provided on our host ( http://bioinfo.rjh.com.cn/labs/jhuang/tools/BioInstaller/ ), and R users can also use the browser vignettes functions in R to access these documents.

Massive bioinformatics tools/scripts and databases have been integrated into BioInstaller. TOML is a popular and human-readable configuration formats supporting comments. We uses standard TOML format configuration file to store required information of the included bioinformatics tools/scripts and databases. These configuration files can be reused in other bioinformatics software packages or data analysis pipelines via online accession or as a file copy. We have provided six directories to store different types of TOML files including “github,” “nongithub,” “database,” “web,” “docker,” and “shiny.” Due to the broad compatibility of BioInstaller, any resource published on docker, GitHub, Zenodo ( https://zenodo.org/ ) or other platforms can be supported.

Network transferring is a common problem in bioinformatics data analysis. A mirror resource is one option to partially resolve these problems, including an invalid link and network blocking. BioInstaller allows users to set any numbers of mirror URLs for their tools/scripts and databases to avoid the possible problems caused by network transmission. As shown in Fig. S1C , the mirror URLs of Miniconda ( https://conda.io/miniconda.html ) are separately provided by the official and our hosts. Notably, established mirror URLs of bioinformatics resources can be used in the spack ( Gamblin et al., 2015 ) and other similar tools to build the cache files.

The querying of versions of bioinformatics tools/scripts and databases of a GitHub or non-GitHub project is the basic function of BioInstaller. For GitHub projects items, the GitHub APIs were used to access the projects version information, such as release, tags, and branches. All released versions will be used as the available versions and returned to BioInstaller ( Fig. S1A ). However, the situation becomes more complicated if the resources have not been published on GitHub. Here, we propose two types of methods of parsing item versions. Method I: If the released versions are fixed, users can write it in the “version_available” field in the configuration file. Method II: Utilizing the R packages rvest ( https://CRAN.R-project.org/package=rvest ) and RCurl ( https://CRAN.R-project.org/package=RCurl ), we established an R functions pool to dynamically query the version of items from the original release website ( Dataset S1 ). The demo function to query the latest version of GMAP is shown in Fig. S1B . This is useful for automating a pipeline to build the precompiled binary version.

Bioinformatics tools, scripts and databases are supported by BioInstaller. Bootstrap and Shinydashbord are used to construct the front-end interface. The R functions, Shiny and Opencpu services and the SQLite and TOML databases were applied in the back-end.

BioInstaller was designed as an interactive R package to collect, manage, and share various types of bioinformatics resources and perform interactive and reproducible data analyses. BioInstaller contains the R functions and the Shiny application ( Chang et al., 2015 ) and REST APIs ( Fig. 1 ). Both R and other programming platform users can utilize the functions of BioInstaller, such as by downloading bioinformatics tools/scripts and databases and performing statistical analysis and visualization. The R and Shiny interfaces of BioInstaller were mainly developed in R language and utilize the HTML/CSS and JavaScript languages. To run an instance of BioInstaller, the R program and extra dependent R packages are required. Travis CI ( https://www.travis-ci.org/ ) was used to automatically test the R functions on Linux and MAC OSX platforms. Periodically, the tested and updated BioInstaller package is submitted to Comprehensive R Archive Network (CRAN) with an increased version number, for example, from v3.3.3 to v3.3.4. Both the open and restricted bioinformatics resources can be integrated using the TOML format configuration file. The configuration files can also be used in other programming language platforms to access desired masteries by using a unique item name, such as “bwa,” “gatk,” “annovar,” “db_annovar_1000g,” “db_annovar_gtex,” etc. A hash value was generated using the item name and version for the unique ids of tools/scripts and databases. An autogenerated docker image containing all required R packages and the backend web service of BioInstaller have been deposited at the DockerHub ( https://hub.docker.com/r/bioinstaller/bioinstaller ).

Results

Overview and practices of BioInstaller’s functionalities A comprehensive R package was developed that could be used to quickly construct interactive and reproducible biological data analysis applications based on the R platform (Fig. 2). The functionalities (Table 1; Dataset S2) of BioInstaller were divided into six parts based on whether users use BioInstaller or not: (1) deployment of resources, (2) collection of resources, (3) sharing of resources, (4) construction of pipelines, (5) construction of Shiny applications, and (6) reproducible data analysis. An example of a real project (annovarR, https://github.com/JhuangLab/annovarR, under development) is shown in Fig. 2 to illustrate the full workflow for BioInstaller utilization, which was designed to integrate various genetic variant annotation and visualization tools, including public command line tools, R packages and custom annotation and visualization functions. Using the code library, predefined TOML files (database resources and plugins), and the docker file of BioInstaller, we could easily customize the BioInstaller-established Shiny application to work on the genetic variants annotation tasks. If BioInstaller is not used, we need to develop the UI and server code of the Shiny application for a large number of universal functions, such as the file management system, background task submission and queue management, and tracking of the output log and files. The docker image of BioInstaller is also out-of-the-box and could be modified and applied to our own works. Based on the integrated installer (e.g., conda, spack, and BioInstaller) and simplified TOML files of BioInstaller, users can collect, share, and deploy genetic variant annotation databases and tools with one-stop service. As a real practice of BioInstaller, we collected and shared tools/scripts and databases in the configuration pool of BioInstaller, including genetic variant annotation databases and tools; the meta information is freely available and hosted on the public GitHub website (https://github.com/JhuangLab/BioInstaller/tree/master/inst/extdata/config). The raw files are stored on the original websites (e.g., https://github.com, https://sourceforge.net/, http://annovar.openbioinformatics.org/, etc.) and our host. Figure 2: The relevance, applicability, and a real example of BioInstaller. With bioInstaller Without bioInstaller Deployment of resources User-interfaces R functions, Shiny UI, REST APIs (Conda, Spack, and other tools/scripts) Command-line tools (Conda, Spack, and custom tools) Retrieve installed packages Integrated Shiny dashboard page including R packages, conda and Python packages, Spack packages, and BioInstaller resource Multiple command line operations Collection of resources Local development Yes No Need to register an account Not need Need Type of backend databases Default use TOML and SQLite (potable purpose) Plugins for other types MySQL Resources hosts No limitation Centralized File sizes No limitation Limited PubMed query Integrated R codes with secret key (no limited access) Shiny UI with formatted table Isolated R codes without secret key (limited access, n <= 20) Online version without formatted table Sharing of resources Medium Simplified TOML format files Form or configuration file required more skills Download service Local Shiny application Centralized web service or command line tools Construction of pipelines Store of meta information (e.g., URL and version) Pre-defined TOML file De novo source code (e.g., ANNOVAR and fusioncatcher) Construction of Shiny application Pre-defined pages Pre-defined Shiny UI and server (Dashboard, file management, task submission, logging, export, and update of plots exception handling, setting) Isolated examples UI and server codes Difficulty Easy to construct the Shiny application (Plugins + optional R codes) Relatively complicated (Require R codes for UI and server) Reproducible data analysis Logging Support Manual Docker image Pre-defined docker image with Shiny, Rstudio, and Opencpu services Most not DOI: 10.7717/peerj.5853/table-1

Comparison of BioInstaller with existing tools for the collection and sharing of bioinformatics resources To better understand the advance provided by BioInstaller in terms of the collection and sharing of bioinformatics resources, we further compared BioInstaller with several existing tools, including Omictools (Henry et al., 2014) and Datasets2Tools (Torre et al., 2018) (Tables 1 and 2), the two most comprehensive meta databases focused on bioinformatics tools. All provide a web forum to update the meta database of bioinformatics resources. However, BioInstaller offers an off-line way to develop the users’ own meta databases via an unlimited configuration file pool (TOML and SQLite format) that is easy to carry and share and is independent of programming knowledge. In addition, the developed R functions and Shiny application can be used to query and download the linked or isolated file databases, such as appendix data from papers, annotation databases for genetic variants, genome sequences, etc. In most cases, it is suitable to tightly combine the meta database with the file database. Therefore, we designed and shared an upload module in the Shiny application to set the meta information for all files, and users can add the description, genome version, custom file types, and other customizable fields. Both Omictools and Dataset2Tools only include the items in their databases and do not integrate external resources. BioInstaller not only can be used to collect users own resources, but also can be used to integrate external resources. BioInstaller Omictools Datasets2Tools Infrastructure and utilities Programing language R, JavaScript HTML/CSS/JavaScript HTML/CSS/JavaScript Chrome extension No No Yes Web service R Shiny Web Web R functions Yes No No REST APIs Yes No Yes Backend database TOML and SQLite Not available MySQL Docker image Yes No No Functionality Access and collect meta database Yes Yes Yes Access and collect file database Yes No No Integration of external resources Yes No No PubMed query Yes No No Dataset query Yes No Yes Number of supported resources Integrated High Medium Version query Yes No No Download service Yes No No Local branch and development Yes No No Input and output Input R functions, Web text, APIs Web text only Web text + APIs Output Text, table, plots, and Web page (PNG, SVG and PDF) Web page Text and web page DOI: 10.7717/peerj.5853/table-2

Examples of BioInstaller R functions We have demonstrated the basic structure, functions, and web service of BioInstaller. The full help document is available at http://bioinfo.rjh.com.cn/labs/jhuang/tools/BioInstaller/articles/. Because most of the Shiny application UIs are wrapped with R functions, we use several use examples to illustrate the R functions of BioInstaller. Example #1: Install packed or unpacked bioinformatics tools. We use the Ion Torrent Variant Caller (Zook et al., 2014) and svaba (Wala et al., 2018) to show how to install or download the bioinformatics tools or scripts that are not supported by other package management tools. > library(BioInstaller) # Library the R package

> set.biosoftwares.db(“∼/.BioInstaller/info.yaml”) # Store the installation information

> install.bioinfo(show.all.names = TRUE) # Get all items name supported by BioInstaller

> install.bioinfo(name = “tvc,” show.all.versions = TRUE) # Get all available versions of tvc

> install.bioinfo(name = “svaba,” show.all.versions = TRUE) # Get all available versions of svaba

> install.bioinfo(name = “tvc,” download.dir = “/path/tool/tvc”) # One-click install the tvc

> install.bioinfo(name = “svaba,” download.dir = “/path/tool/svaba”) # One-click install the svaba

> show.installed() # Get all installed tools

> get.info(“svaba”) # Get the svaba installation information, such as update time and version Example #2: Download genetic variants annotation databases. Genetic variants annotation is a common and high-demand task for most biomedicine research, especially for examining the correlations between phenotype and molecular features, such as germline and somatic mutations. The followed example describes how to download the genetic variants annotation databases dbSNP, CIViC, DisGeNET, and CancerHotspot (Chang et al., 2016; Griffith et al., 2017; Piñero et al., 2017). > install.bioinfo(“db_annovar_avsnp,” extra.list = list(buildver = “hg19”), download.dir = “/path/db/”) # install the latest dbSNP from ANNOVAR website

> crawl.all.versions(“db_annovar_avsnp”) # Download all dbSNP to current directory

> install.bioinfo(“db_civic,” download.dir = “/tmp/db”) # Download the nightly version of CIViC database

> install.bioinfo(“db_disgenet,” download.dir = “/tmp/db”) # Download the DisGeNET database

> install.bioinfo(“db_cancer_hotspots,” download.dir = “/tmp/db”) # Download the DisGeNET databaseß Example #3: Download an annotation database based on the supplementary files of published papers. The followed example is an epigenetic genes classification (e.g., reader, writer, eraser) database only available in the papers supplementary file (Huether et al., 2014). > install.bioinfo(“db_annovar_epi_genes,” extra.list = list(buildver = “hg19”), download.dir = “/path/db/”) # install the epigenetic genes database from our website

Portable message queue for background tasks based on SQLite Tasks with long-time costs are challenging in Shiny, which always blocks the other interactive operations simultaneously when the previous task has not been finished. Here, we utilized the R package litseq (https://CRAN.R-project.org/package=liteq) to submit and manage the background queue tasks. litseq is portable and lightweight. litseq does not require extra software or service from other programming platforms and can work on any clusters server running computing-intensive tasks. The developed queue worker in BioInstaller can be used for all other background tasks submitted by litseq. All litseq-submitted tasks of BioInstaller are assigned a unique identification id. All executed commands, output logs, and others are saved in the permanent files.

Opencpu backend service improves reproducibility Opencpu (Ooms, 2017) is an R package for reproducible research that can expose a web REST API interface with R, Latex, and Pandoc. The R functions of BioInstaller are invoked by the activated Opencpu R process or daemon service. For other programming platform users, this is one possible method for utilizing the R functions of BioInstaller (Fig. 1). The output of JSON and text formats are returned when using the browser access (Fig. 5A) or simulated requests. Three of the most basic APIs usages of BioInstaller were used to demonstrate how it works: (1) obtaining all supported tools/scripts and databases; (2) acquiring available versions of the appointed item; (3) installing a tool in a directory (Fig. 5B). Notably, a random string, such as “x0a469794fa,” will be generated as the key of Opencpu to obtain the output of one R session. Both JSON and text format output can be returned by Opencpu backend APIs (Fig. 5C). Figure 5: REST APIs of BioInstaller. (A) Workflow of REST APIs of BioInstaller that JSON and TEXT returns through the GET/POST query. (B) Using curl to invoke background R functions of BioInstaller. (C) The key character with GET method is provided to get the background R session output.

Docker container of BioInstaller A prebuilt docker image is available on the DockerHub (https://hub.docker.com/r/bioinstaller/bioinstaller), and the latest code change of the BioInstaller repository can automatically trigger an update of the docker image. In the docker image, we integrated and configured three types of web services, including Opencpu, Shiny (Chang et al., 2015; Ooms, 2017), and the RStudio server (https://www.rstudio.com/products/rstudio-server/). The followed commands can be used to deploy and start the service of BioInstaller service. $ docker pull bioinstaller/bioinstaller

$ docker run -it -p 80:80 -p 8004:8004 bioinstaller/bioinstaller Users can deploy a new instance host of BioInstaller and all other web services in a few minutes, and other tools/scripts and databases are also allowed to be embedded in this docker image using the Dockerfile (https://github.com/JhuangLab/BioInstaller/blob/master/Dockerfile).