Analyzing 1000 Genomes with Spark and HailΒΆ

Hail is an open-source platform built on Spark for genomic data analysis.

This tutorial will give you a sense of Hail’s basic features. It consists of a set of four notebooks:

  • Deployment - how to deploy the Hail framework on Databricks.
  • Overview - a broad overview of Hail’s functionality, with emphasis on the functionality to manipulate and query a genetic dataset.
  • Introduction to the Expression Language - provides the basics of the Hail expression language and builds up practical experience with the type system, syntax, and functionality.
  • Expression Language Part 2 - use the Hail expression language to query, filter, and annotate the same thousand genomes dataset from the overview.

Download the notebook archive and import into Databricks.