This section explains how to migrate a single node deep learning (DL) code with PyTorch to distributed training code with Horovod on Databricks with HorovodRunner.
Databricks Runtime 5.0 ML (Beta), the minimum required runtime for HorovodRunner includes Horovod. However, to add PyTorch support you need to reinstall Horovod. The PyTorch Init Script notebook creates an init script named
pytorch-gpu-init.sh that installs required libraries. If you run on Databricks Runtime 5.1 ML (Beta) or above, you do not need to create the PyTorch init script and configure your cluster with the script.
Before running the HorovodRunner PyTorch MNIST Example notebook you must: