Azure Databricks Datasets

Azure Databricks includes a variety of datasets within the environment that you can use to either learn Spark or test out algorithms. You’ll see these throughout the documentation pages.

To browse these files, you can use Databricks Utilities. Here’s a code snippet that you can use to list all the Databricks datasets.

display(dbutils.fs.ls("/databricks-datasets"))

With each of those you can can then print out the README for any dataset to get some more information about it.

%python
with open("/dbfs/databricks-datasets/README.md") as f:
    x = ''.join(f.readlines())

print x