BioKDE (Biomedical Knowledge Discovery Engine) is a platform for scientists to cope with the big data challenges arising in biomedical field in recent years. BioKDE offers both integrated data and tools. Currently the integrated data are mainly in oncology area. More data will be added in other biomedical areas in the future. We offer a wide variety of statistical and computational tools for visualization, data analysis, modeling and predictions. Complex pipelines can be built starting from your own data or data on the server. You can browse through the tools on the left panel of the tool page to get an idea on the tools we offer. Below is a guide in a decision tree format to help users to decide the best tool to use.
To use or not to use? - A decision tree of tools
Following Galaxy project's design philosophy, all the tool interfaces were designed so that users without a good understanding of the theories and how the tools were implemented can use them as long as they understand the input and output, which are always necessary for using a particular tool. Despite that philosophy, given the large number of tools provided on our platform, even experienced users may get confused on which tool or tools to choose for their tasks. We created a decision tree for users to find the right tools by answering a series of questions. You can search a tool using the search box on the top of the tool panel on the left. On each tool page, there is also more detailed documentation on the usage of the tool. The tools without links in the decision tree below are being developed at the moment. Please check back or sign up for our email list to get notifications when they are ready.
- The decision tree of tools
- Data tools. First question, do you want to analyze your own data or analyze the data on our server?
- Yes. Please use file uploading tool to upload data to our server.
- No. I would like to use the existing datasets on the server. Load data tool loads a dataset on our server to the work place, ready for analysis. Most of the test datasets for the tools can be loaded using this tool.
- No. I would like to query a dataset from the integrated data. Query cancer data tool is an interface for users to query integrated oncology data. Many different query conditions can be used in combination to query highly specific data sets.
- Analysis tools. What type of analysis do you want to perform?
- Descriptive statistics. Row statistics or column statistics?
- Statistical inference and hypothesis testing.
- One variable of interest.
- One sample problem. Is underlying distribution normal or central limit theorem can be assumed to hold?
- Yes. Does the inference concern mean or variance?
- The underlying distribution is binomial: binomial test.
- The underlying distribution is Poisson: Poisson test.
- Non-parametric method such as Wilcoxon test.
- Two sample problem.
- Normal assumption holds.
- Inference concerning means: two-sample t-test. Use F-test first to test equality of variances of the two groups.
- Inference concerning variances: two-sample F-test
- Binomial distribution.
- Independent samples and all expected values >= 5: contingency table test.
- Independent samples and not all expected values >=5: Fisher's exact test.
- Dependent samples: McNemar's test.
- Person-time data.
- One sample problem: one-sample test for incidence rate.
- Not one-sample problem, incidence rate remains constant over time.
- Two-sample problem and no confounding: two-sample test for comparison of incidence rate.
- Two-sample problem with confounding: methods for stratified person-time data.
- Interested in test of trend over more than two groups: test of trend for incidence rate.
- Not one-sample problem, incidence rate not constant over time: survival analysis.
- Comparison of survival curves of two groups with limited conroal of covariates: log-rank test.
- Interested in the effect of several risk factors on survival: Cox proportional hazards model.
- Non-parametric methods.
- Normal assumption holds.
- More than two samples.
- Normal assumption holds: one-way anova.
- Categorical data: R × C contingency-table methods.
- Non-parametric methods such as Kruskal-Wallis test.
- One sample problem. Is underlying distribution normal or central limit theorem can be assumed to hold?
- More than one variable.
- Relationships between two continuous variables.
- Predicting one variable from another: simple linear regression.
- Correlation of two normal variables: Pearson correlation.
- Correlation of non-normal variables: rank-correlation method.
- Relationships between one continuous and one categorical variable.
- More than two variables, continuous: multiple regression.
- More than two variables, binary.
- Time of events is important: survival analysis methods.
- Time of events not important: multiple logistic regression.
- Relationships between two continuous variables.
- One variable of interest.
- Regression.
- Survival analysis.
- Dimension reduction and variable selection.
- Machine learning.
- Graphics tools. Publication quality graphics can be created by non-professionals.
What type of graph would you like to draw?
- Relationship.
- Distribution.
- Comparison.
- Composition.
- Clustering.
- Text manipulation tools.
Text manipulation tools can be handy when a user wants to change the format of a file
to make it suitable for a particular tool.
They often serve as gluing tools between analysis and graphics tools.
- Manipulation of a single file.
- Transpose data.
- Convert delimiters to TAB.
- Compute an expression on every row.
- Remove beginning of a file.
- Change case of selected columns.
- Cut columns from a table.
- Add columns to an existing dataset.
- Merge columns of a dataset.
- Merge file.
- Split file.
- Other text manipulation tools.
- Manipulation of a single file.
- Genomics data analysis tools.
These are specific tools for genomics data analysis and grouped by data types.
- Gene expression, including both microarray and next generation sequencing data.
- Differential gene expression analysis.
- Gene set enrichment analysis.
- Pathway analysis based gene expression data.
- Network-based differential gene expression analysis.
- DNA methylation.
- DNA copy number.
- Chromosome occupancy, histone accessibility, replication timing.
- Variant analysis.
- Gene expression, including both microarray and next generation sequencing data.
- Network and systems biology tools.
- Gene regulatory network inference using gene expression data.
- Network analysis given a single or a group of genes.
- Text mining tools. We offer standard text mining tools for extracting relationships among
biological terms.
- Custom extraction of a given text.
- Extraction of available literature for certain information.
- Data tools. First question, do you want to analyze your own data or analyze the data on our server?