Azure Databricks provides tools and frameworks that bridge computer science and bioinformatics, designed for end-to-end analysis of genomics data at terabyte to petabyte scales.
Databricks Runtime HLS contains pre-packaged pipelines to align reads and detect and annotate variants in individual samples, parallelized using Apache Spark.
- DNASeq Pipeline
- RNASeq Pipeline
- Tumor/Normal Pipeline
- SnpEff Annotation Pipeline
- VEP Annotation Pipeline
Once variants have been detected in individuals, append data into Delta Lake tables. Then perform aggregate analyses to calculate cohort level quality control and summary statistics.
Once quality control is complete, perform population-scale statistical analyses of genetic variants.
- Joint Genotyping Pipeline
- Hail 0.2
- SAIGE and SAIGE-GENE
- Parallelizing Command-Line Tools With the Pipe Transformer
Azure Databricks provides an environment for building, training, and deploying machine learning and deep learning models at scale. To learn more and find example notebooks, see Machine Learning, Deep Learning, and MLflow guides.
This section provides examples of machine learning algorithms applied to genomics data.