Save Data to TFRecord Files with TensorFlow

You can load data and use TensorFlow to save the data to TFRecord files. See TensorFlow File Formats for details.

Note

This guide is not a comprehensive guide on TensorFlow. See the TensorFlow API Guide.

To save your data to TFRecord files, the workflow is as follows:

Step 1: Load the data with your own program.

Step 2: Open a TFRecord file with tf.python_io.TFRecordWriter.

Step 3: Parse and save the data to TFRecord files. Follow these steps:

  1. Convert your data into tf.train.Feature using tf.train.BytesList, tf.train.FloatList, or tf.train.Int64List.
  2. Create a tf.train.Features with the converted data.
  3. Create an Example protocol buffer with tf.train.Example.
  4. Serialize the Example to string using tf.train.Example.SerializeToString().
  5. Write the serialized example to TFRecord with the created TFRecordWriter.

The example notebook below demonstrates how to convert MNIST data to TFRecord format. Before running the notebook, you must:

  1. Prepare storage mounts for distributed data loading.
  2. Configure your FUSE_MOUNT_LOCATION in the notebook.