Backing Up Your Data With Fog

Fog, in case you haven’t heard of it, is a fantastic cloud computing library written in Ruby. It provides a unified interface to several popular cloud computing platforms(including Amazon, Rackspace, Linode, and others), making it easy to interact with them from Ruby. It currently supports four types of cloud services: storage, compute, DNS, and CDN. Fog has become very popular lately, and serves as the backbone for Chef’s cloud computing functionality, which is how I first became aware of it.

I recently used Fog to write a backup script in Ruby to automatically send encrypted database backups from a database server running at Rackspace to Amazon’s S3 storage service. Here’s how I did it.

Overview

My script runs as the second step in a process. The first step is a shell script that calls pg_dump to dump a PostgreSQL database and then encrypts the file using GnuPG, dropping them in a backup directory on the database server.

My Fog-based script’s job is to make sure that all of the files in the backup directory get moved to S3.

Writing Files

Fogsync (my script), looks at all of the files in that directory and makes sure that they all exist in a bucket on S3. If they don’t, it copies them up there. Additionally, it deletes old backups from S3. For this customer, we keep backups for 14 days, so all backups older than that get deleted.

Let’s look at how it works:

fog = Fog::Storage.new(
  :provider => 'AWS',
  :aws_access_key_id => MY_ACCESS_KEY,
  :aws_secret_access_key => MY_SECRET
  )
directory =  fog.directories.get("MY_DIRECTORY")

files = Dir["/var/backup/*.gpg"]
for file in files do
  name = File.basename(file)
  unless directory.files.head(name)
    directory.files.create(:key => name, :body => open(file))
  end
end

Here’s what this snippet does:

  1. Creates a connection to AWS. The syntax is basically the same for connecting to all of the cloud platforms, just the parameter names are changed.

  2. Uses ‘head’ to check if the file exists and, optionally, get some metadata about it (size, modify date, etc). Think of this as the cloud equivalent to the unix stat command. You don’t want to use the ‘get’ command, as that will return the whole file, which would take a very long time if the files are large cough*voice of experience*cough.

  3. Creates the file in the given directory (“bucket” in S3 terms) if it doesn’t exist already.

If you’ve used S3, you’ll notice that Fog uses slightly different terms for things than S3 does. Because Fog works across a number of different storage providers, it uses more general terms. While this might be confusing at first if you’re familiar with a specific provider’s nomenclature, but the tradeoff is that if you want to move from one provider to another, the only thing you have to change is the code that sets up the connection (the call to Fog::Storage.new() in this example).

Deleting files

      oldest = Date.today - 14 (our date)
      directory =  fog.directories.get(MY_DIRECTORY)
      files = directory.files
      files.each do |f|
        file_date = Date.parse(f.last_modified.to_s)
        if file_date < oldest
          file.destroy
        end
      end

This is fairly straightforward as well. Get all the files in the directory and check their age, deleting the ones that are older than we want to keep.

So that, in a nutshell, is how to use Fog. This is a simplified example of course, in my production code the parameters are all pulled from configuration files, and the script emails a report of what it did, in addition to having a lot more error handling.

If you do any scripting with cloud computing, you owe it to yourself to check out Fog.

Handling Incoming Email With Your Web Application

This morning I was looking for a way to handle incoming email in a web application (similar to the way Highrise and Evernote let you email things to a special email address and have them put into their system). There are a number of ways to do this via procmail, or by using something to connect to your mail server using POP or IMAP and reading emails, but I was looking for a way to do this without having to host my own email infrastructure. Ideally, I want something like Twilio, that will receive the email and then do an HTTP POST to the endpoint of my choosing.

Here’s what I found.

CloudMailIn

Still in beta (and free while it is), this looks robust. It’s also available as a Heroku addon, if that’s how you roll.

(A tip of the hat to @peterc for pointing me to this one)

APInbox

Looks similar to CloudMailIn, though not in beta. There’s a free plan for up to 100 emails a day, and then it goes up from there. Their site was down when I first went to it this morning, which makes me a tad nervous, but that may well be an isolated thing.

SendGrid

SendGrid is a heavy hitter in the email space, mostly doing outbound delivery. They do however have a Parse API that seems to perform the same function as the other two services. I’m not sure on the pricing here, their basic plan is \$9.95 per month for 10,000 emails, but I’m not sure if that includes incoming or not. UPDATE: I heard from SendGrid. Their plans cover both incoming and outgoing, so for the in the case of the \$9.95 plan, it could be a mix of both, up to 10,000

(thanks to Twilio’s @johnsheehan for the pointer to SendGrid)

I haven’t used any of these yet, so I can’t make an endorsement of one over the other, but I thought I’d post it here in case anyone else is looking for this kind of provider. If you have experience with any of these, please comment with your opinion.

The Week in Links - 12/4/2010

Full-Ack: an Emacs interface to Ack
Ack is a useful little app for searching source code. If you ever use grep for finding things in your code, switch to ack immediately - you won’t regret it. This is a handy front end to ack for Emacs users.

Information architecture: A How to
I’ve been learning about information architecture lately as it’s becoming increasingly important for my job. This is a good overview.

Hacker’s Guide To Tea
I need to drink more tea. This article taught me a lot I didn’t know about tea and its benefits.

Tasty Treats for PostgreSQL
A bunch of useful tools if you work with PostgreSQL, from the guys at OmniTI.

HTML5Rocks - Introducing Web Sockets: Bringing Sockets to the Web
An introduction to Web Sockets, which let you do lots of cool real time things with the web. One of many things I need to spend more time experimenting with.

The Week in Links - 11/11/2010

Things You Should Do Immediately After Launching a Website
Some of these are common sense, but there are quite a few non-obvious ones here. A good checklist.

Running Shells in Emacs: An Overview | Mastering Emacs
Working with shells in Emacs is very useful; I almost always have a small one running at the bottom of my window to run commands in. This explains the differences between the different kinds of shells in Emacs, how to use them, and how to change their settings.

Announcing Cloud Load Balancing Private Beta | Rackspace Cloud Computing & Hosting
Rackspace Cloud, where I host a ton of different servers for myself and for clients, has announced a beta of their load balancing service. Good load balancing is a pain to set up, so this is promising.

The 1140px CSS Grid System/Framework · Fluid down to mobile
Nice new CSS grid framework that handles multiple screen sizes with ease. It seems like a fundamental failing of CSS that we need all these frameworks to do really basic stuff like this though.

Dr Nic’s Making CI easier to do than not to with Hudson CI and Vagrant
I need to spend some time with Hudson. It’s an incredibly powerful “Continuous Integration” server, but it does a lot more as well. This article explains how to use it in conjunction with Vagrant to automatically set up your test environment.

How to Use Your Zoom Lens as a Compositional Aid
I’ve been learning photography over the last couple of years. This article did a better job of explaining the effects of using different kinds of zoom lenses. The pictures that accompany the article are worth 1000 words and then some.

Training Your Technical Staff When You Don't Have a Budget

When budgets get tight, it can be difficult to provide adequate training for your staff. Over the last couple of years, I’ve found some ways to provide some training even in the face of a shrinking (or non-existent) budget.

Regional conferences

If you still have some budget, but maybe just not as much as you are accustomed to, look to smaller regional conferences as an alternative to the larger national ones that are in major cities. If you’re fortunate enough to live in a city where a conference is being held, you might get out of having to pay for travel at all. This past year, the excellent No Fluff Just Stuff conference made a stop in our town, and I was able to send two developers plus myself to it for a fraction of what it would have cost to send them away somewhere and pay airfare and hotel on top of the conference cost. I personally attended the Windy City Rails conference this year which was a single day for only \$150. While the smaller conferences may not have all the speakers you would get at a larger one, I’ve been really surprised at the quality of the speakers that these conferences draw.

Books

My team has done this for the past year or so. I buy a copy of a book for each person, and we meet once a week to discuss a chapter at a time. Have people take turns leading the discussion. My experience has been that these are most productive if you tackle a topic that your team agrees is currently a pain point, as they can take the information they learn and apply it to their current project. We’ve read through The Pragmatic Programmer, Pragmatic Unit Testingin Java, and are going to move on to Don’t Make Me Think next.

Hashrocket has actually taken this a step further and broadcast these live.

Videos

It’s become commonplace for conferences to record their talks and make them available online for free. Additionally, a number of larger user groups do the same. There are a nearly endless number of videos on a wide variety of topics that are available online. Pick a video (maybe two if they’re short), watch it as a group, and then discuss it afterwards.

Here are a few sources I like:

Peer to Peer

We’ve done this even before our training budget shrank. Have people take turns presenting on a relevant topic that they are passionate about. This works well on a few levels: those listening get to expand their knowledge, and those presenting will often develop a deeper understanding of their topic. If a presentation is used, post it somewhere so that people who join the company later can benefit.

So what have I missed? What do you do to keep your skills current when you can’t get money for training?

The Tools I Use

Inspired by Mike Gunderloy’s recent blog post, I decided to put together a list of the tools I use, both hardware and software.

I use a Mac at home and a Windows laptop at work; I plan to cover the Windows tools in a later post.

Hardware

  • MacBook Pro
    My primary computer is a late-2007 17” Macbook Pro with 2gb of RAM and a 160gb hard drive. I love this laptop, but I made two mistakes when buying it. First, I should have gone with the higher resolution display. If you’re going to have a 17” laptop, you should have as many pixels as you can. Second, I way undersized the hard drive. It was larger than the one in the laptop it replaced, but since I have the three most adorable kids ever, I take a lot more pictures and video than I did previously. This has quickly filled up the hard drive, to the point that I’ll need to replace it with a much larger one next year. To sum up: when buying a laptop, get the largest hard drive and the most pixels you can afford, unless you need ultra-portability.
  • 24-inch Dell Monitor
    Looks nice, and very affordable. Mine was refurbished. The picture quality isn’t bad, but it’s one of their very low-cost displays and is of lower quality than the rest of the line (I didn’t know this at the time). If I was a graphic designer or professional photographer I would probably care more. Since this primarily displays a code editor, a terminal window, and a web browser, I don’t really mind much.
  • Apple Extended Keyboard II
    These are the legendary Apple keyboard you’ve heard about, and the hype about them is true. I bought a couple of them gently used from EBay and then scrubbed them with a brush and some dish soap to clean them up. Paired with a Griffin ADB to USB converter, they work very well. I’m a sucker for the old-style keyboard action.
  • Mighty Mouse (bluetooth)
    A lot of people hate this mouse, but I don’t understand why. It’s solidly built, confortable, and has that cool little ball on top. That said, I’m a keyboard junkie and avoid the mouse when I can.
  • Time Capsule
    This serves as both wireless router for the house as well as the backup system for both laptops. I don’t have an off-site backup at the moment, I need to look into that.
  • Cambridge Soundworks Speakers
    I really wanted the Klipsch computer speakers, but they’re more than I want to spend. These sound good, and cost me only a little over \$100 refurbished.

Software

  • Emacs
    I learned programming with an IDE, but I learned to edit text with Emacs. I’ve been using it for 10 years or so now, and it would be difficult to switch. Every few years I have some brief dalliance with another editor (the last one was Textmate, when I bought my first Mac), but I always return to my first love. What emacs lacks in style, it more than makes up for in substance. In one window I can edit code, run shell commands (or a shell, for that matter), edit files on remote servers, and much more. It’s endlessly scriptable and insanely powerful. The fact that it’s cross platform helps as well. My emacs configuration, which works the same (with a couple of minor exceptions) on all the platforms I use it, is located on Github
  • Safari
    I really want to like Firefox, but it’s just too slow. Safari is quick, stable, and includes all the features I want.
  • The Hit List
    Even though it seems to be the popular thing to do these days, I’m not on a continual quest to find The Ultimate Todo List App. I got The Hit List as part of a MacHeist software bundle, and it works well. I mean, seriously, what do you really need in a todo list application? I can make items, I can check them off. The rest is gravy.
  • Adium
    It’s not perfect, but I can talk to people on pretty much any IM network out there.
  • Tweetie
    I used Twitterific for a couple of years, both on the iPod Touch and OS X. Frankly, it was left to rot, with no updates for a very long time. When Tweetie for the iPhone came out, I bought it immediately and after using it for 10 minutes, concluded that I wanted it for the desktop as well. I got my wish, and I’ve been happy ever since.
  • NetNewsWire
    I do not read as many feeds as I used to. I mean, I subscribe to a lot, I just don’t read them that often. My thoughts on why are here. When I do read feeds though, this is the app I do it in. I like that I can navigate everything with the keyboard and send things to Evernote and Instapaper easily.
  • 1Password
    I can’t remember all the passwords I create, or often even the account names (sometimes it’s a username, sometimes it’s an email address…). 1Password remembers them all for me and enters them for me automatically as well.
  • Evernote
    I use Evernote to track all the little bits of data I accumulate: code snippets, blog posts, how tos, meeting notes, PDFs, presentations, etc. I like the fact that it syncs with other computers, and it’s search works very well. I hate the way it captures web pages though, it destroys all the formatting. Yojimbo gets this right. It’s PDF viewing isn’t all that great either
  • Skype
    Skype works great if everyone is on a fast network pipe. It falls down spectacularly if anyone is on a slightly flaky network connection, like say a cell network connection. I use Skype mostly for after hours deployments (group voice call), and video chats with the Grandparents.
  • iWork
    I use Keynote for creating the occasional presentation and Pages for creating things that require more formatting than a text document. Numbers is the coolest application I almost never use.
  • Terminal
    I always have a terminal window open. Always. Usually more than one.
  • Pixelmator
    Photoshop is awesome, but it is both expensive and far more complicated than I want. I am not an image editing guru, I really just need basic capabilities. Pixelmator provides that - it’s Photoshop for mere mortals.

Online

  • Gmail
    It’s got an amazing spam filter and supports IMAP and POP out of the box (Yahoo still charges for this, for reasons I can’t comprehend). I use the online client almost exclusively.
  • Google Docs
    I love being able to create spreadsheets and easily share them with my better half. It does basically everything I need a spreadsheet app to do, and it does it well.
  • Pivotal Tracker
    Oh, how I love Pivotal Tracker. It’s a simple but powerful project management application that lets you keep track of features, bugs, and chores. I keep all my side projects in here, under a seperate account from the one I use at work. Any application I’ve built is in here as a seperate project (this blog, for example). Any time I discover a bug, or think of a feature I want to add, I can throw it in here under the appropriate project and it will be waiting for me when I have time to work on it. It’s nearly perfect.
  • Instapaper
    I read. A lot. Instapaper lets me capture things that I want to read later, and conveniently strips out all of the formatting for me. The iPhone application is great as well, I can carry reading material with me anywhere.

Hosting

  • VPSLink
    I’ve been hosted with these guys for a couple of years. Fast VPS servers and great uptime.

So that’s what I use to do what I do. If you’ve done a similar list, add a comment below with a link. Or if you have a recommendation for something to replace one of my tools, I’m always looking for cool new tools.

Generating Realistic Test Data With Ruby

Generating semi-realistic test data for an application can be a pain. If the data already exists, as in the case of an upgrade to an existing system, you can generally create data based on the existing database. But what if you need a large sample of data for a brand new system? If you have simple data requirements, there are some Ruby gems that can help you out. Faker is one such gem, which lets you generate realistic names, addresses and phone numbers. But what do you do for things that are a little less typical? Things like scores, ratings, ages, dates, etc. I needed to do this recently for a prototype I built of a system to generate letters. Here’s the Rake task I ended up with:

This script adds 1000 records to my database that are representative of what real production data would look like. The quantity of data is obviously easily adjusted up or down as needed.

This is just a standard rake task that you can drop inside lib/tasks. Most of this is fairly standard ruby code and not very interesting, but lets look closer at what makes this work.

The first portion of the script does some setup work, deleting existing data. Then it sets up a series of arrays for the values that will be used for individual fields. For example the volumes variable:

volumes = (8000..100000).to_a

This creates an array of integers containing every number between 8000 and 100000. Response rates and variances are set up similarly, as are the client names.

In the loop that generates the actual data, we then call the rand() function on these arrays to select a value from our range. This function isn’t a standard part of the Ruby Array class, it’s actually added to the class by ActiveSupport.

Using this method makes it very easy to generate test data within predefined acceptable ranges.

For another take on this topic, see the EdgeCase blog

A Collection of Great Tools for the Ruby Developer

I’ve been a bit heads-down lately, working on a super-secret project in Ruby. More on that in the near future, but in the meantime I wanted to share about a few things that I’ve started using.

Shoulda

When I started my new project, I wanted to try one of the new testing frameworks for Ruby. The problem is there are a number to choose from. What to do…

I settled on Shoulda. I wish I could tell you that this was a rigorous process, that I evaluated each framework carefully, learning about each one’s strengths and weaknesses. I did not, I cheated. You see, a while back, Josh Susser did just that thing. He called it the The Great Test Framework Dance-off. He settled on Shoulda, so that’s what I went with.

Shoulda is developed by Tammer Saleh of ThoughtBot, who have a number of other really nice projects. Shoulda’s tagline is “Making Tests Easy on the Fingers and Eyes”, and it lives up to that goal. It has a very nice syntax for developing tests, including a complete set of macros for testing controllers and models. It’s a joy to use. Here’s what it looks like (both samples taken from the Shoulda README :

Nice, right?

Here’s a sample of the ActiveRecord macros in action:

Beautiful.

So what’s the big deal? Well, it’s easier to read for one. Instead of horrendous method names like test_should_do_this_but_not_that, you get to write English: should “do this but not that”. The macros in Shoulda also let you test your models and controllers easily.

Pivotal Tracker

Pivotal Tracker is an Agile project management tool, developed by the folks at Pivotal Labs. It lets you create projects, track release, stories, and defects. The beauty of Tracker is it’s all-on-one-screen user interface. It lets you see everything at a glance, and even provides keyboard shortcuts for common tasks. I’m not alone in my admiration of Tracker, it seems to be extremely popular among the Rails consulting shops (Hashrocket, for one).

While Tracker is powerful enough to be used for large multi-developer projects, it also happens to be perfect for managing your side projects. Enter the features you want, organize them into releases, and just click start to begin the first one. Click finish when you’re done, and move on to the next one. Easy peasy. Did I mention it’s free?

Be sure to check out the screencast, which gives a nice overview of the application.

HTTParty

John Nunemaker is a prolific Ruby and Rails developer, as witnessed by a quick glance at his Github page. One of his most recent projects is HTTParty, which makes it dead-simple to consume REST apis using Ruby. Here’s what it looks like:

HTTParty automatically detects whether the response is JSON or XML and parses it appropriately. It really doesn’t get much easier than that. There’s also a nice command-line app bundled with the gem that lets you call RESTful web services easily from the command-line, with a few more bells and whistles than curl.

Sinatra

Sinatra is a great, compact web framework similar in concept to Why the Lucky Stiff’s Camping framework. It makes it trivial to create a web application in just a few lines of code. It was originally written by XXX to allow for creating lightweight web services, but has since become quite popular as a web framework to use when Rails might be overkill.

It’s easy to create simple test applications for libraries, but also robust enough to create full-blown websites with. Check out the Sinatra website and the Sinatra book for more details.

What tools have you discovered lately?

The Programmable Government

We are headed toward a time where the workings of government are much more visible to the American public.

Through things like the Freedom of Information Act, this information has technically been available for some time - but not in a form that is easily consumed. This is starting to change.

The emergence of open APIs that provide access to information about how the government is operating is a massive step in the right direction. It will, I hope, bring forth a new wave of websites that mine the data that these web services provide, and expose it to the world. Voting records, government expenditures, bids, and bill details all need to be made available for anyone to consume.

Here are a few of the APIs that I have come across. Some are available from the government themselves, others are from third parties.

The New York Times Congress API

This API, part of a growing set of APIs from the Times, let’s you program Congress. Well, not exactly:

The initial release exposes four types of data: a list of members for a given Congress and chamber, details of a specific roll-call vote, biographical and role information about a specific member of Congress, and a member’s most recent positions on roll-call votes.

Our database contains House votes since 1991 and Senate votes since 1989. House members are from 1983 and Senate members date back to 1947

Follow the Money

This API provides information about campaign contributions for state-level campaigns.

Sunlight Labs

The Sunlight Labs API provides methods for obtaining basic information on Members of Congress, legislator IDs used by various websites, and lookups between places and the politicians that represent them. The primary purpose of the API is to facilitate mashups involving politicians and the various other APIs that are out there.

Capital Words

This site, another project of the Sunlight Foundation tracks word frequency from the congressional record. Most frequently used word: Proposed.

USASpending.gov

One of the few official government APIs, this allows you to find out where your money goes:

Have you ever wanted to find more information on government spending? Have you ever wondered where federal contracting dollars and grant awards go? Or perhaps you would just like to know, as a citizen, what the government is really doing with your money. The Federal Funding Accountability and Transparency Act of 2006 (Transparency Act) requires a single searchable website, accessible by the public for free that includes for each Federal award:
1. The name of the entity receiving the award;
2. The amount of the award;
3. Information on the award including transaction type, funding agency, etc;
4. The location of the entity receiving the award;
5. A unique identifier of the entity receiving the award.

The GOP API

The Republicans are ahead of the Democrats on this one, but I would doubt we’ll have to wait to long to see them follow suit. This site is similar in concept to the Times’ Congress API mentioned above (though obviously only for the Republican members), but provides some additional information that the Times does not.

Conclusion

This is only a sampling of the APIs that are available, and hopefully this is only the beginning. I’m optimistic that both the government and third parties will provide more of these, and that groups will make use of this information to expose the inner workings of the government to the people who elect them.

Are their any APIs that I missed? Are there any sites that are using this information in interesting ways? What do you want the government to make easily consumable that it isn’t today?

A Brief Introduction to the Arduino

Arduino:http://www.flickr.com/photos/remkovandokkum/2667608562/in/set-72157606159601535/

For Christmas, I got an Arduino. Well, really I got two coffee pots. Identical ones. So I returned one of them to Amazon, and used the refund to buy an Arduino starter kit. It’s a neat device, with a ton of potential. Here’s why.

Ok, so what is it?

The Arduino is an open, hackable microcontroller, designed to be easy to program and easy to build things with. Simply put: the ultimate hacker toy.

For about \$40 (or less, if you want to buy all the parts and build it yourself), you can have a device that you can program from any computer with a USB port, and that is capable of interfacing with the outside world. It doesn’t require any special training in electronics, and is ideal for experimentation. You can add an amazing array of sensors and add-on boards to allow you to do just about anything you can imagine, from reading the temperature to getting GPS coordinates.

Did I mention you can program it in Ruby?

What can you do with it?

Pretty much anything you want. You can start by making an LED blink - this is the hardware equivalent to “Hello, world”. Beyond that, the basic board comes with an array of inputs and outputs that you can connect up to all sorts of things: temperature and light sensors, motors, GPS modules. You name it, you can build it.

Here’s a quick rundown of a few things people have used these for.

This is only a fraction of what’s out there. An impressive community has sprung up around these little guys, and there is no shortage of cool projects documented on the web.

If you want to see the Arduino in action, check out Greg Borenstein’s presentation from RubyConf on programming the Arduino with Ruby, in which he demos an Arduino-based drum machine (literally, a machine that plays a drum with chopstics) as well as a board that uses windshield washer fluid pumps to mix screwdrivers. It’s one of the most entertaining talks I’ve seen.

Summary

In summary, if you’ve ever wanted to play with hardware, the Arduino is the place to start. It’s inexpensive, easy to use, and endlessly customizable. I’ve had mine a week and it’s been great fun so far.

  • liquidware open source electronics A provider of Arduino boards and addon boards
  • Tutorials A collection of tutorials from the official Arduino site
  • RAD - Ruby Arduino Development A tool to let you build Arduino apps uusing Ruby
  • Adafruit Industries Another provider of Arduino boards as well as other electronic paraphernalia.
  • LadyAda This site is run by Limor, proprieter of Adafruit Industries and contains a lot of tutorials on the Arduino and electronics in general.
  • SparkFun Provider of Arduino boards plus an array of other kits and projects.
  • Arduino Starter Kit This is the kit I bought. It includes everything you need to get started - even a USB cable.

Do you have an Arduino? Built anything cool with it? If so, share in the comments.

Photo by Remko Van Dokkum - Some Rights Reserved