Analyzing 1000 Genomes with Spark and Hail¶
Hail is an open-source platform built on Spark for genomic data analysis.
This tutorial will give you a sense of Hail’s basic features. It consists of a set of four notebooks:
- Deployment - how to deploy the Hail framework on Databricks.
- Overview - a broad overview of Hail’s functionality, with emphasis on the functionality to manipulate and query a genetic dataset.
- Introduction to the Expression Language - provides the basics of the Hail expression language and builds up practical experience with the type system, syntax, and functionality.
- Expression Language Part 2 - use the Hail expression language to query, filter, and annotate the same thousand genomes dataset from the overview.
Download the notebook archive and import into Databricks.