The Bio-TDS Bio Tools Discovery Systems has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains ( e.g. genomic, proteomic, bio-imaging) the ability to query over 15,000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS system is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS’s scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on biological data analysis. The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researcher the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process.
S1 | High-level view of the Bio-TDS architecture | S2 | BETS Specification description and manipulation |
---|---|---|---|
S3 | Resources extraction and semi-automatics curation | S4 | TONER: Tools ontology-based annotation |
S5 | Bio-TDS Query processing workflow and programmatic access | S6 | Bio-TDS Evaluation and comparison |
Bioinformatics Elaborated Tools Specifications (BETS) provides a standard for analytic tool descriptions. The analytic tool descriptions (i.e. metadata) gathered from community tool repositories integrated into the Bio-TDS are stored in JSON format using the BETS standard. This standard consists of core BETS attributes and domains/repositories specifics attributes (see Figure S1) The core BETS attributes are manually mapped to the repository attribute
The Bio-TDS combines bioinformatics tools from five other repositories and stores them in one central location, following BETS (Bioinformatics Elaborated Tool Specification). There are six main modules that convert the data from each of the five repositories into BETS tools and store the new tools into the Bio-TDS database. The BETS Checker is a Java application that tests the compatibility of a tool with the BETS specification. A tool is considered “compatible” if it is in the format specified by the specific BETS converter. For example, the system contains a mapper called Galaxy Converter. A tool from the Galaxy Tool Shed can only be “compatible” if it matches the predefined Galaxy format.