Hail is an open-source platform built on Spark for genomic data analysis.
This tutorial will give you a sense of Hail’s basic features. It consists of a set of four notebooks:
- Deployment: how to deploy the Hail framework on Databricks.
- Overview: a broad overview of Hail’s functionality, with emphasis on the functionality to manipulate and query a genetic dataset.
- Introduction to the Expression Language: provides the basics of the Hail expression language and builds up practical experience with the type system, syntax, and functionality.
- Expression Language Part 2: use the Hail expression language to query, filter, and annotate the thousand-genomes dataset from the overview.
To run the notebooks:
Download the notebook archive and import into Databricks.
Download libraries for Spark 2.1.1 and Scala 2.11:
You can build Hail for other versions of Spark by following this tutorial.
Follow the steps in the deployment notebook to upload the downloaded files and create Databricks libraries.