Using Git for backups

Sometimes you are stuck with a small budget, or none at all. This was the primary motivator for setting up a backup solution using Git.

“But isn’t Git used for version control of source?” I hear you say. Yes, you are quite right. Git absolutely excels at managing source code. It does not do very well with binary blobs however.

So then why choose to use Git for backups? In my case the requirements were simple. I had a bunch of sites all sitting on the same host. Most of these were written in PHP, Python or a combination of both. All other content was either plain HTML, XML, CSS or JavaScript.

The remaining blob content would practically hardly ever change and if necessary I could just .gitignore all of it and restore from local copies if need be.

Git handles source code and plain text files just fine. The various databases were a bit trickier and I ended up using mysqldump to export all database content to plain SQL which again was perfectly suitable for Git.

I set up a single cron job that every day would temporarily mount some remote storage, go through all the sites while dumping their databases, staging and committing all changes.

Usage of Git’s somwhat less-often used arguments was important as storing .git folders in webroots is a big nono. Using --git-dir and --work-tree allowed me to specify exactly where the .git folder and webroot were located. In addition, quite often nothing changed so --allow-empty let my script run without causing problems.

Using Git came with the added bonus that I could see every single change made on a daily basis. For bonus points the script would also send me a mail every day with the console log so I’d know when there was a problem.

And soon enough, there was. Several of these sites were never updated so they eventually contained known vulnerabilities that eventually were being exploited by less-scrupulous people trying to make a quick buck.

The nice thing of Git of course is that I could see exactly which changes were made and revert them without fuss. It didn’t help making the sites less vulnerable but it did make cleaning up the mess a lot easier.

Despite all the good things I would not recommend Git for backup unless the content you are archiving fits the criteria and you are working on a very tight budget. Especially now with everything is being virtualized, making snapshots is much faster and simpler.