Backing Up Your Data With Fog

Fog, in case you haven’t heard of it, is a fantastic cloud computing library written in Ruby. It provides a unified interface to several popular cloud computing platforms(including Amazon, Rackspace, Linode, and others), making it easy to interact with them from Ruby. It currently supports four types of cloud services: storage, compute, DNS, and CDN. Fog has become very popular lately, and serves as the backbone for Chef’s cloud computing functionality, which is how I first became aware of it.

I recently used Fog to write a backup script in Ruby to automatically send encrypted database backups from a database server running at Rackspace to Amazon’s S3 storage service. Here’s how I did it.

Overview

My script runs as the second step in a process. The first step is a shell script that calls pg_dump to dump a PostgreSQL database and then encrypts the file using GnuPG, dropping them in a backup directory on the database server.

My Fog-based script’s job is to make sure that all of the files in the backup directory get moved to S3.

Writing Files

Fogsync (my script), looks at all of the files in that directory and makes sure that they all exist in a bucket on S3. If they don’t, it copies them up there. Additionally, it deletes old backups from S3. For this customer, we keep backups for 14 days, so all backups older than that get deleted.

Let’s look at how it works:

fog = Fog::Storage.new(
  :provider => 'AWS',
  :aws_access_key_id => MY_ACCESS_KEY,
  :aws_secret_access_key => MY_SECRET
  )
directory =  fog.directories.get("MY_DIRECTORY")

files = Dir["/var/backup/*.gpg"]
for file in files do
  name = File.basename(file)
  unless directory.files.head(name)
    directory.files.create(:key => name, :body => open(file))
  end
end

Here’s what this snippet does:

  1. Creates a connection to AWS. The syntax is basically the same for connecting to all of the cloud platforms, just the parameter names are changed.

  2. Uses ‘head’ to check if the file exists and, optionally, get some metadata about it (size, modify date, etc). Think of this as the cloud equivalent to the unix stat command. You don’t want to use the ‘get’ command, as that will return the whole file, which would take a very long time if the files are large cough*voice of experience*cough.

  3. Creates the file in the given directory (“bucket” in S3 terms) if it doesn’t exist already.

If you’ve used S3, you’ll notice that Fog uses slightly different terms for things than S3 does. Because Fog works across a number of different storage providers, it uses more general terms. While this might be confusing at first if you’re familiar with a specific provider’s nomenclature, but the tradeoff is that if you want to move from one provider to another, the only thing you have to change is the code that sets up the connection (the call to Fog::Storage.new() in this example).

Deleting files

      oldest = Date.today - 14 (our date)
      directory =  fog.directories.get(MY_DIRECTORY)
      files = directory.files
      files.each do |f|
        file_date = Date.parse(f.last_modified.to_s)
        if file_date < oldest
          file.destroy
        end
      end

This is fairly straightforward as well. Get all the files in the directory and check their age, deleting the ones that are older than we want to keep.

So that, in a nutshell, is how to use Fog. This is a simplified example of course, in my production code the parameters are all pulled from configuration files, and the script emails a report of what it did, in addition to having a lot more error handling.

If you do any scripting with cloud computing, you owe it to yourself to check out Fog.

Generating Realistic Test Data With Ruby

Generating semi-realistic test data for an application can be a pain. If the data already exists, as in the case of an upgrade to an existing system, you can generally create data based on the existing database. But what if you need a large sample of data for a brand new system? If you have simple data requirements, there are some Ruby gems that can help you out. Faker is one such gem, which lets you generate realistic names, addresses and phone numbers. But what do you do for things that are a little less typical? Things like scores, ratings, ages, dates, etc. I needed to do this recently for a prototype I built of a system to generate letters. Here’s the Rake task I ended up with:

This script adds 1000 records to my database that are representative of what real production data would look like. The quantity of data is obviously easily adjusted up or down as needed.

This is just a standard rake task that you can drop inside lib/tasks. Most of this is fairly standard ruby code and not very interesting, but lets look closer at what makes this work.

The first portion of the script does some setup work, deleting existing data. Then it sets up a series of arrays for the values that will be used for individual fields. For example the volumes variable:

volumes = (8000..100000).to_a

This creates an array of integers containing every number between 8000 and 100000. Response rates and variances are set up similarly, as are the client names.

In the loop that generates the actual data, we then call the rand() function on these arrays to select a value from our range. This function isn’t a standard part of the Ruby Array class, it’s actually added to the class by ActiveSupport.

Using this method makes it very easy to generate test data within predefined acceptable ranges.

For another take on this topic, see the EdgeCase blog

A Collection of Great Tools for the Ruby Developer

I’ve been a bit heads-down lately, working on a super-secret project in Ruby. More on that in the near future, but in the meantime I wanted to share about a few things that I’ve started using.

Shoulda

When I started my new project, I wanted to try one of the new testing frameworks for Ruby. The problem is there are a number to choose from. What to do…

I settled on Shoulda. I wish I could tell you that this was a rigorous process, that I evaluated each framework carefully, learning about each one’s strengths and weaknesses. I did not, I cheated. You see, a while back, Josh Susser did just that thing. He called it the The Great Test Framework Dance-off. He settled on Shoulda, so that’s what I went with.

Shoulda is developed by Tammer Saleh of ThoughtBot, who have a number of other really nice projects. Shoulda’s tagline is “Making Tests Easy on the Fingers and Eyes”, and it lives up to that goal. It has a very nice syntax for developing tests, including a complete set of macros for testing controllers and models. It’s a joy to use. Here’s what it looks like (both samples taken from the Shoulda README :

Nice, right?

Here’s a sample of the ActiveRecord macros in action:

Beautiful.

So what’s the big deal? Well, it’s easier to read for one. Instead of horrendous method names like test_should_do_this_but_not_that, you get to write English: should “do this but not that”. The macros in Shoulda also let you test your models and controllers easily.

Pivotal Tracker

Pivotal Tracker is an Agile project management tool, developed by the folks at Pivotal Labs. It lets you create projects, track release, stories, and defects. The beauty of Tracker is it’s all-on-one-screen user interface. It lets you see everything at a glance, and even provides keyboard shortcuts for common tasks. I’m not alone in my admiration of Tracker, it seems to be extremely popular among the Rails consulting shops (Hashrocket, for one).

While Tracker is powerful enough to be used for large multi-developer projects, it also happens to be perfect for managing your side projects. Enter the features you want, organize them into releases, and just click start to begin the first one. Click finish when you’re done, and move on to the next one. Easy peasy. Did I mention it’s free?

Be sure to check out the screencast, which gives a nice overview of the application.

HTTParty

John Nunemaker is a prolific Ruby and Rails developer, as witnessed by a quick glance at his Github page. One of his most recent projects is HTTParty, which makes it dead-simple to consume REST apis using Ruby. Here’s what it looks like:

HTTParty automatically detects whether the response is JSON or XML and parses it appropriately. It really doesn’t get much easier than that. There’s also a nice command-line app bundled with the gem that lets you call RESTful web services easily from the command-line, with a few more bells and whistles than curl.

Sinatra

Sinatra is a great, compact web framework similar in concept to Why the Lucky Stiff’s Camping framework. It makes it trivial to create a web application in just a few lines of code. It was originally written by XXX to allow for creating lightweight web services, but has since become quite popular as a web framework to use when Rails might be overkill.

It’s easy to create simple test applications for libraries, but also robust enough to create full-blown websites with. Check out the Sinatra website and the Sinatra book for more details.

What tools have you discovered lately?

Is SwitchPipe the Solution for Rails Shared Hosting?

Peter Cooper (who I interviewed recently ) has just announced SwitchPipe, which aims to make deploying and hosting Rails (and other frameworks, such as Django) applications easy. From the site:

Introduction / Overview
SwitchPipe is a proof of concept “Web application server” developed in Ruby. More accurately, it’s a Web application process manager and request dispatcher / proxy. Backend HTTP-speaking applications (Web applications) do not run directly within SwitchPipe, but are loaded into their own processes making SwitchPipe language and framework agnostic.
SwitchPipe takes control of, and manages, the backend application processes, including loading and proxying to multiple instances of each application in a round-robin style configuration. As an administrator, you can define the maximum number of backend processes to run for each app, along with other settings so that you do not exceeded preferred resource limits. SwitchPipe quickly removes processes that “break” or otherwise outlive their welcome. For example, you can let SwitchPipe kill any backend processes that have not been accessed for, say, 20 seconds. This makes hosting many multiple Rails applications, for example, a quick and non-memory demanding process, ideal for shared hosting environments.

SwitchPipe’s goal is to be:

* super easy to configure
* the easiest way to deploy multiple HTTP-talking backend applications
* painless in terms of management; no hand-holding of different applications is needed
* a permanent daemon that can handle configuration changes in backend apps “on the fly”
* a reliable solution on Linux and OS/X (and anything POSIX compatible, ideally)

I haven’t spent much time with SwitchPipe yet, but if it lives up to Peter’s claims this will dramatically simplify hosting Rails/Django/Camping/whatever applications.
What’s interesting to note is that this originated with Peter’s widely read article on why such a thing was needed. Unlike a lot of other people who have complained loudly about the state of Rails on shared hosting environments, Peter put his time and talents towards creating a solution which he then released within 3 weeks. This is definitely something we need more of.
So what are your thoughts? Is this the solution we’ve been waiting for?

Rails Snippets - 11/29

Holy Shmoly, Ruby 1.9 smokes Python away

Initial performance numbers would seem to indicate that Ruby 1.9 (due by Christmas) will be lots faster.

Quoted-Printable: My .irbrc

If you spend a lot of time in IRB (most of us probably do), it’s worth taking the time to learn how to customize it. This is a good start.

Faker

Nice clean library to generate fake data. The home page says it’s a port of Perl’s Data::Faker library, which I’d never even heard of.

Rails Snippets - 11/13

RESTFul OpenID Authentication

A plugin to do OpenID authentication in Rails, in a RESTful way.

Off the Rails - An alternative Rails stack

Competition is good. Merb and the like provide that competition to Rails. This article runs through an alternative to the Rails stack. It’s always good to keep an eye on what else is out there.

Rands in Repose: The Nerd Handbook

Ok, this is a bonus link. Not at all Rails related, but relevent to you if you’re reading this. Rands nails the Nerd. I mean, really nails it.

Rails Snippets - 11/7

Troubleshooting Ruby Processes: Leveraging System Tools when the Usual Ruby Tricks Stop Working

A new book from O’Reilly on troubleshooting Ruby (and Rails) apps. From the overview:

This short cut introduces key system diagnostic tools to Ruby developers creating and deploying web applications. When programmers develop a Ruby application they commonly experience complex problems which require some understanding of the underlying operating system to be solved. Difficult to diagnose, these problems can make the difference between a project’s failure or success. This short cut demonstrates how to leverage system tools available on Mac OS X, Linux, Solaris, BSD or any other Unix flavor. You will learn how to leverage the raw power of tools such as lsof, strace or gdb to resolve problems that are difficult to diagnose with the standard Ruby development tools. You will also find concrete examples that illustrate how these tools solve real-life problems in Ruby development. This expertise will prove especially relevant during the deployment phase of your application. In this way, should your production Mongrel cluster freeze and stop serving HTTP requests, it will not take you 2 days to figure out why!

Sitepoint: Preparing for Rails 2.0

A nice, if a bit short, article on some of the changes that are coming in Rails 2.0. This is focused on what you will need to change in your application.

Creating a Ruby Weblog in 10 Minutes

This is a beginner tutorial, specific to using Netbeans 6.0. I’ve not played much with the Rails support in Netbeans, but it looks impressive so far.

Rails Snippets - 10/31

The Halloween Edition

Obvious Code: Creating a simple news publishing system in Rails 2.0

One of the first tutorials I’ve seen that focuses on Rails 2.0.

Deploy a Ruby on Rails app on EC2 in five minutes

This would seem to make deploying a Rails app on Amazon’s EC2 very simple:

EC2 on Rails is an Ubuntu Linux server image for Amazon’s EC2 hosting service that’s ready to run a standard Ruby on Rails application with little or no customization. It’s a Ruby on Rails virtual appliance. If you have an EC2 account and a public keypair you’re five minutes away from deploying your Rails app.

Rails Snippets

A collection of Rails links

Using Paypal with Rails

This is a nice step-by-step article on integrating PayPal with your Rails application, using ActiveMerchant.

Rails 2.0 Features: Multiple Views

I’ve only skimmed over the new features in the upcoming 2.0 release of Rails, but this looks like one of the nicest features. This is a good explanation of how it works and why it’s useful.

Mongrel 1.0.3 is out

A bugfix release of Mongrel is out. Looks like 1.1 is due soon, and it looks interesting:

“Mongrel 1.1 is coming real soon now with JRuby support and a few other things.”

Emacs on Rails

Being a bit of an Emacs junky, I’m not sure how I missed this. Looks mature, and very functional, and almost TextMate-like. The link has a nice flash video of Emacs on Rails in action.

Free Rails book from Sitepoint

Sitepoint’s book “Build Your Own Ruby on Rails Web Applications” is now free, at least for the next month. I’ve only skimmed it, but it looks like a decent introduction, and the price is certainly right.