Prepare Storage for Data Loading and Model Checkpointing

Data loading and model checkpointing are crucial to deep learning workloads, especially distributed DL. You need to prepare a FUSE mount for data loading, model checkpoint, and logging from each worker to a shared storage location.

Azure Databricks recommends using Databricks Runtime 5.3 ML and above and saving data under dbfs:/ml, which maps to file:/dbfs/ml on driver and worker nodes. Available in Databricks Runtime 5.3 ML and above, dbfs:/ml is a special folder that provides high-performance I/O for deep learning workloads.

If you use an older Databricks Runtime version or you want to use your own storage, Azure Databricks recommends that you use the blobfuse client, an open source project to provide a virtual filesystem backed by Azure Blob Storage. For information about blobfuse, see the blobfuse GitHub website.

To mount an Azure Blob Storage container as a file system with blobfuse, you can use an init script. The example notebook below demonstrates how to generate an init script and configure a cluster to run the script.