How can I best deliver read only data assets to a Kubernetes application?

HOW TO -️ October 18, 2021

I'm migrating an HPC app to Kubernetes, and am trying to determine the best way to provide read only data assets as a configuration managed snapshot.

Previously, my team had delivered our application as a set of RPMs, but as we move to Kubernetes, we're delivering Docker images. This works fine for our application binary, as instead of a pile of RPMs that all have to agree, we can just deliver a known working image.

The problem, however, comes with our read-only data assets (similar to a game's asset files). Several different Docker images might rely on a single set of data assets, and so we'd prefer not to bake them into the actual Docker image itself (plus we want the ability to change the assets without having to recompile the application images).

We're unsure as to the best approach for this. The first idea would be to create a "data container" that simply runs NFS and serves out the data. This successfully isolates the data from the application and allows us to collapse a set of data RPMs into a single tagged docker image, but to me it seems like it might be overkill.

I know that we're essentially looking for a persistent volume for Kubernetes, but the problem for us is bundling all of the data into a single package that has the same delivery convenience as a Docker image.

Is there a better way to provide this read-only data as a version controlled snapshot?


Could you indicate what you mean with "data"? E.g. text files, blobs that are larger than 1 GB? A directory tree with files. Basically I'm looking for something like an ISO snapshot that we can version control and deliver as a mount point that the application pods pick up.

Our biggest dataset is on the order of 15 GB, but it's broken into multiple files. What cloud provider do you use? E.g. AWS, GCP, Azure, bare-metal? @030 Bare metal. These are our own clusters.