Automate everything

Automation. It is what us developers are supposed to be good at. Though sometimes, automating something can be pretty hard.

A lot of money is wasted by testing things by hand. Of course there’s also the random errors us humans make and inexplicable test results as a direct result. I’m sure many agree that machines are far better suited at performing tedious tasks quickly and accurately.

Us simple developers love having a button that lights up green when everything is good so we can merge our changes without a single worry. Trying to creat this button on a budget though, a bit harder. It took me a few tries to get things right.

First, I started writing simple shell scripts. Such scripts are however executed and verified manually but its a step in the right direction. The bigger issue was that these scripts generally required very specific environments and were executed on your own development machine. Not great.

Integrating these scripts into Git was a fairly obvious step. Running arbitrary scripts is easily done with a post-recieve hook on the remote. This offloads all processing away from your development machine. The problem was creating an environment to run these scripts.

Most of the project I work with are shipped as packages, thus it made sense to build the software, package it and then perform tests with said scripts. There’s schroot and sbuild environments that most Debian developers are likely familiar with but I was looking for something more generic, less restrictive.

I heard a lot of good stuff about Jenkins so I set up a Jenkins server and started migrating my scripts to Jenkins. It was a pretty painful process. Even the setup required a lot of hand-holding. My workflow wasn’t compatible with Jenkins. Jenkins also proved to be a pretty big resource hog. More importantly, trying to use git-buildpackage in a chroot was simply not stable or maintainable.

A few months went by, Docker came into existence. Containers were now a thing. I experimented with building things in Docker containers and found it to be very simple and reliable. Creating a build environment took nothing more than a few lines in a Dockerfile. The ability to instantly rebuild and reuse any environment is very powerful and reduced build times to mere fractions.

As I couldn’t find a suitable build server that performed builds using Docker, I tried rolling my own. This didn’t go well as I didn’t have the budget nor the time to develop a brand new platform. The project was abandoned fairly quickly.

A few more months passed and I stumbled on GoCD. Its primary feature being very lightweight. The setup took mere minutes. Setting up pipelines was easy. No complicated shell scripts, just execute the entire build in a disposable Docker container. Done.

Test automation was still far away. No I don’t mean build tests, those are fairly easy to do (unless you are cross-compiling). Initially I tried running test in Docker but I quickly ran into issues as Docker is designed for single-process containers.

Performing tests in LXD containers however proved to be a much better option. Being able to spawn several containers and being able to perform functional client-server integration tests is very powerful.

With the help of Puppeteer, performing full in-browers didn’t take long to develop. Today I’m able to run hundreds of tests just by running git push, technology is pretty amazing.

Using Git for backups

Sometimes you are stuck with a small budget, or none at all. This was the primary motivator for setting up a backup solution using Git.

“But isn’t Git used for version control of source?” I hear you say. Yes, you are quite right. Git absolutely excels at managing source code. It does not do very well with binary blobs however.

So then why choose to use Git for backups? In my case the requirements were simple. I had a bunch of sites all sitting on the same host. Most of these were written in PHP, Python or a combination of both. All other content was either plain HTML, XML, CSS or JavaScript.

The remaining blob content would practically hardly ever change and if necessary I could just .gitignore all of it and restore from local copies if need be.

Git handles source code and plain text files just fine. The various databases were a bit trickier and I ended up using mysqldump to export all database content to plain SQL which again was perfectly suitable for Git.

I set up a single cron job that every day would temporarily mount some remote storage, go through all the sites while dumping their databases, staging and committing all changes.

Usage of Git’s somwhat less-often used arguments was important as storing .git folders in webroots is a big nono. Using --git-dir and --work-tree allowed me to specify exactly where the .git folder and webroot were located. In addition, quite often nothing changed so --allow-empty let my script run without causing problems.

Using Git came with the added bonus that I could see every single change made on a daily basis. For bonus points the script would also send me a mail every day with the console log so I’d know when there was a problem.

And soon enough, there was. Several of these sites were never updated so they eventually contained known vulnerabilities that eventually were being exploited by less-scrupulous people trying to make a quick buck.

The nice thing of Git of course is that I could see exactly which changes were made and revert them without fuss. It didn’t help making the sites less vulnerable but it did make cleaning up the mess a lot easier.

Despite all the good things I would not recommend Git for backup unless the content you are archiving fits the criteria and you are working on a very tight budget. Especially now with everything is being virtualized, making snapshots is much faster and simpler.

LVM saves the day, twice (or thrice?)

LVM is pretty great. I’ve been using it for years and you can do amazing things with it. It essentially adds an abstraction layer between physical media and filesystems. LVM has more features such as software raid, snapshots and more but hardware abstraction is what I use it for.

Replacing disks or extending filesystems becomes extremely trivial with LVM. Not only that, replacements can be done without ever shutting down the system or waiting for the process to finish.

A while back I was forced to replace my laptop. Unfortunately, the replacement laptop didn’t allow for a simple drive swap so I had to get a little creative.

Luckily I had access to one of those SATA to USB adapters. So I took the old drive and connected it via the adapter to one of the USB ports. Few minutes later I was booting from the old drive, albeit a bit slow due to the USB 2.0 bottleneck.

I started by partitioning the new internal NVMe SSD, added it to the LVM group and instructed LVM to move everything from the old USB-connected drive to the new internal NVMe drive. This operation of course took some time but I was able to continue my work like normal. The system gradually became faster and faster over time as more and more blocks were moved to the internal drive.

A few weeks pass and although for different reasons, once again I had to switch laptops. This time however, both laptops had internal non-removable drives. Or at least, not without removing a hundred screws and prying glued components apart.

After a bit of Googling I found something called NBD. Network Block Device is a pretty old protocol which allows a physical drive to be shared over a network as a plain block device. So I connected the two laptops together via gigabit and got to work.

A few minutes later the internal drive of the target laptop was accessible on my current laptop. I repeated the process of partitioning the drive, adding the drive to the LVM group and once again instructed LVM to move everything to the new drive. Meanwhile, I was able to continue working as normal.

Once the process finished, I shut down the laptop I was working on and rebooted the new laptop. It booted right away with all my data there, like nothing ever happened.

I found out later that this third laptop has a second internal drive. So I created a degraded RAID 1 array using software raid, partitioned the array, added it to the LVM pool and once more instructed LVM to move everything to the array.

I then re-partioned the now empty drive, added it to the array and voila. Hardware redundancy with a little bit of speed boost. A little bit of GRUB magic and I was able to boot from both drives.

The performance improvement was a bit of a bummer. I was expecting 350+300 MB/s reads but in the end only got about 500MB/s. I guess some internal bus is limiting transfer speeds. It will have to do.

There was a notable size difference between both SSDs, one being 256GB and the other 512GB. As all nodes in a RAID 1 array must be the same size I ended up with a 256GB RAID 1 array. The remainder I added as non-redundant storage to the end of my LVM pool. Perfect for swap space and other non-critical storage such as test containers and temporary VMs.

The flexibility that LVM provides is simply priceless. I highly recommend LVM on any new installation.