DVC on Heroku
DVC is a version control system for machine learning datasets and models. It allows you to store large files outside of Git while keeping them versioned. In a few steps, you can have it working on Heroku.
This tutorial assumes you’re already using DVC and are ready to deploy your app to Heroku.
Getting Started
Use the Apt buildpack from Heroku to install DVC.
heroku buildpacks:add --index 1 heroku-community/apt
Create an Aptfile
with the latest release:
https://github.com/iterative/dvc/releases/download/1.9.1/dvc_1.9.1_amd64.deb
Next, add your storage credentials. We recommend an Amazon S3 bucket in the same region as your Heroku dynos (us-east-1
by default) to avoid paying for data transfer, which is free in-region:
heroku config:set AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... AWS_REGION=...
Finally, configure your application to run the following commands:
dvc config core.no_scm true
dvc pull
rm -r .dvc .apt/usr/lib/dvc # reduces slug size
You can run them at build time or runtime. We recommend build time unless it causes your app to exceed the maximum slug size (500 MB compressed).
With Django, add to settings.py
:
if "DYNO" in os.environ and os.path.isdir(".dvc"):
print("Running DVC")
os.system("dvc config core.no_scm true")
if os.system("dvc pull") != 0:
exit("Pull failed")
os.system("rm -r .dvc .apt/usr/lib/dvc")
With Rails, create an initializer with:
if Rails.env.production? && Dir.exist?(".dvc")
puts "Running DVC"
system "dvc config core.no_scm true"
system "dvc pull" or abort "Pull failed"
system "rm -r .dvc .apt/usr/lib/dvc"
end
Your DVC files are now available on Heroku