Backing Up Your Data With Fog

Fog, in case you haven’t heard of it, is a fantastic cloud computing library written in Ruby. It provides a unified interface to several popular cloud computing platforms(including Amazon, Rackspace, Linode, and others), making it easy to interact with them from Ruby. It currently supports four types of cloud services: storage, compute, DNS, and CDN. Fog has become very popular lately, and serves as the backbone for Chef’s cloud computing functionality, which is how I first became aware of it.

I recently used Fog to write a backup script in Ruby to automatically send encrypted database backups from a database server running at Rackspace to Amazon’s S3 storage service. Here’s how I did it.

Overview

My script runs as the second step in a process. The first step is a shell script that calls pg_dump to dump a PostgreSQL database and then encrypts the file using GnuPG, dropping them in a backup directory on the database server.

My Fog-based script’s job is to make sure that all of the files in the backup directory get moved to S3.

Writing Files

Fogsync (my script), looks at all of the files in that directory and makes sure that they all exist in a bucket on S3. If they don’t, it copies them up there. Additionally, it deletes old backups from S3. For this customer, we keep backups for 14 days, so all backups older than that get deleted.

Let’s look at how it works:

fog = Fog::Storage.new(
  :provider => 'AWS',
  :aws_access_key_id => MY_ACCESS_KEY,
  :aws_secret_access_key => MY_SECRET
  )
directory =  fog.directories.get("MY_DIRECTORY")

files = Dir["/var/backup/*.gpg"]
for file in files do
  name = File.basename(file)
  unless directory.files.head(name)
    directory.files.create(:key => name, :body => open(file))
  end
end

Here’s what this snippet does:

  1. Creates a connection to AWS. The syntax is basically the same for connecting to all of the cloud platforms, just the parameter names are changed.

  2. Uses ‘head’ to check if the file exists and, optionally, get some metadata about it (size, modify date, etc). Think of this as the cloud equivalent to the unix stat command. You don’t want to use the ‘get’ command, as that will return the whole file, which would take a very long time if the files are large cough*voice of experience*cough.

  3. Creates the file in the given directory (“bucket” in S3 terms) if it doesn’t exist already.

If you’ve used S3, you’ll notice that Fog uses slightly different terms for things than S3 does. Because Fog works across a number of different storage providers, it uses more general terms. While this might be confusing at first if you’re familiar with a specific provider’s nomenclature, but the tradeoff is that if you want to move from one provider to another, the only thing you have to change is the code that sets up the connection (the call to Fog::Storage.new() in this example).

Deleting files

      oldest = Date.today - 14 (our date)
      directory =  fog.directories.get(MY_DIRECTORY)
      files = directory.files
      files.each do |f|
        file_date = Date.parse(f.last_modified.to_s)
        if file_date < oldest
          file.destroy
        end
      end

This is fairly straightforward as well. Get all the files in the directory and check their age, deleting the ones that are older than we want to keep.

So that, in a nutshell, is how to use Fog. This is a simplified example of course, in my production code the parameters are all pulled from configuration files, and the script emails a report of what it did, in addition to having a lot more error handling.

If you do any scripting with cloud computing, you owe it to yourself to check out Fog.

Generating Realistic Test Data With Ruby

Generating semi-realistic test data for an application can be a pain. If the data already exists, as in the case of an upgrade to an existing system, you can generally create data based on the existing database. But what if you need a large sample of data for a brand new system? If you have simple data requirements, there are some Ruby gems that can help you out. Faker is one such gem, which lets you generate realistic names, addresses and phone numbers. But what do you do for things that are a little less typical? Things like scores, ratings, ages, dates, etc. I needed to do this recently for a prototype I built of a system to generate letters. Here’s the Rake task I ended up with:

This script adds 1000 records to my database that are representative of what real production data would look like. The quantity of data is obviously easily adjusted up or down as needed.

This is just a standard rake task that you can drop inside lib/tasks. Most of this is fairly standard ruby code and not very interesting, but lets look closer at what makes this work.

The first portion of the script does some setup work, deleting existing data. Then it sets up a series of arrays for the values that will be used for individual fields. For example the volumes variable:

volumes = (8000..100000).to_a

This creates an array of integers containing every number between 8000 and 100000. Response rates and variances are set up similarly, as are the client names.

In the loop that generates the actual data, we then call the rand() function on these arrays to select a value from our range. This function isn’t a standard part of the Ruby Array class, it’s actually added to the class by ActiveSupport.

Using this method makes it very easy to generate test data within predefined acceptable ranges.

For another take on this topic, see the EdgeCase blog

A Collection of Great Tools for the Ruby Developer

I’ve been a bit heads-down lately, working on a super-secret project in Ruby. More on that in the near future, but in the meantime I wanted to share about a few things that I’ve started using.

Shoulda

When I started my new project, I wanted to try one of the new testing frameworks for Ruby. The problem is there are a number to choose from. What to do…

I settled on Shoulda. I wish I could tell you that this was a rigorous process, that I evaluated each framework carefully, learning about each one’s strengths and weaknesses. I did not, I cheated. You see, a while back, Josh Susser did just that thing. He called it the The Great Test Framework Dance-off. He settled on Shoulda, so that’s what I went with.

Shoulda is developed by Tammer Saleh of ThoughtBot, who have a number of other really nice projects. Shoulda’s tagline is “Making Tests Easy on the Fingers and Eyes”, and it lives up to that goal. It has a very nice syntax for developing tests, including a complete set of macros for testing controllers and models. It’s a joy to use. Here’s what it looks like (both samples taken from the Shoulda README :

Nice, right?

Here’s a sample of the ActiveRecord macros in action:

Beautiful.

So what’s the big deal? Well, it’s easier to read for one. Instead of horrendous method names like test_should_do_this_but_not_that, you get to write English: should “do this but not that”. The macros in Shoulda also let you test your models and controllers easily.

Pivotal Tracker

Pivotal Tracker is an Agile project management tool, developed by the folks at Pivotal Labs. It lets you create projects, track release, stories, and defects. The beauty of Tracker is it’s all-on-one-screen user interface. It lets you see everything at a glance, and even provides keyboard shortcuts for common tasks. I’m not alone in my admiration of Tracker, it seems to be extremely popular among the Rails consulting shops (Hashrocket, for one).

While Tracker is powerful enough to be used for large multi-developer projects, it also happens to be perfect for managing your side projects. Enter the features you want, organize them into releases, and just click start to begin the first one. Click finish when you’re done, and move on to the next one. Easy peasy. Did I mention it’s free?

Be sure to check out the screencast, which gives a nice overview of the application.

HTTParty

John Nunemaker is a prolific Ruby and Rails developer, as witnessed by a quick glance at his Github page. One of his most recent projects is HTTParty, which makes it dead-simple to consume REST apis using Ruby. Here’s what it looks like:

HTTParty automatically detects whether the response is JSON or XML and parses it appropriately. It really doesn’t get much easier than that. There’s also a nice command-line app bundled with the gem that lets you call RESTful web services easily from the command-line, with a few more bells and whistles than curl.

Sinatra

Sinatra is a great, compact web framework similar in concept to Why the Lucky Stiff’s Camping framework. It makes it trivial to create a web application in just a few lines of code. It was originally written by XXX to allow for creating lightweight web services, but has since become quite popular as a web framework to use when Rails might be overkill.

It’s easy to create simple test applications for libraries, but also robust enough to create full-blown websites with. Check out the Sinatra website and the Sinatra book for more details.

What tools have you discovered lately?

Is SwitchPipe the Solution for Rails Shared Hosting?

Peter Cooper (who I interviewed recently ) has just announced SwitchPipe, which aims to make deploying and hosting Rails (and other frameworks, such as Django) applications easy. From the site:

Introduction / Overview
SwitchPipe is a proof of concept “Web application server” developed in Ruby. More accurately, it’s a Web application process manager and request dispatcher / proxy. Backend HTTP-speaking applications (Web applications) do not run directly within SwitchPipe, but are loaded into their own processes making SwitchPipe language and framework agnostic.
SwitchPipe takes control of, and manages, the backend application processes, including loading and proxying to multiple instances of each application in a round-robin style configuration. As an administrator, you can define the maximum number of backend processes to run for each app, along with other settings so that you do not exceeded preferred resource limits. SwitchPipe quickly removes processes that “break” or otherwise outlive their welcome. For example, you can let SwitchPipe kill any backend processes that have not been accessed for, say, 20 seconds. This makes hosting many multiple Rails applications, for example, a quick and non-memory demanding process, ideal for shared hosting environments.

SwitchPipe’s goal is to be:

* super easy to configure
* the easiest way to deploy multiple HTTP-talking backend applications
* painless in terms of management; no hand-holding of different applications is needed
* a permanent daemon that can handle configuration changes in backend apps “on the fly”
* a reliable solution on Linux and OS/X (and anything POSIX compatible, ideally)

I haven’t spent much time with SwitchPipe yet, but if it lives up to Peter’s claims this will dramatically simplify hosting Rails/Django/Camping/whatever applications.
What’s interesting to note is that this originated with Peter’s widely read article on why such a thing was needed. Unlike a lot of other people who have complained loudly about the state of Rails on shared hosting environments, Peter put his time and talents towards creating a solution which he then released within 3 weeks. This is definitely something we need more of.
So what are your thoughts? Is this the solution we’ve been waiting for?

Rails Snippets - 11/29

Holy Shmoly, Ruby 1.9 smokes Python away

Initial performance numbers would seem to indicate that Ruby 1.9 (due by Christmas) will be lots faster.

Quoted-Printable: My .irbrc

If you spend a lot of time in IRB (most of us probably do), it’s worth taking the time to learn how to customize it. This is a good start.

Faker

Nice clean library to generate fake data. The home page says it’s a port of Perl’s Data::Faker library, which I’d never even heard of.