There's been quite a lot of chat recently about various job scheduling systems and process managers for offlining expensive tasks on the LRUG list. BackgrounDRb, Beanstalk, Starling, BackgroundJob and other similar solutions have been discussed. These systems can be useful, but most of the time they're just adding unnecessary complexity.

One instance where I feel these solutions are unnecessary is where you need to strip data from an external service in a way that's totally disconnected from the HTTP request-response cycle.

Say you want to pull the most recent article from this blog every 15 minutes and create a file that could then be served statically to your visitors. A naive implementation of that functionality would look something like this:

require 'net/http'
require 'hpricot'

barking_iguana = URI.parse('')
loop do
  articles = Hpricot(Net::HTTP.get(barking_iguana))
  title = (articles / "div.article a[@rel=bookmark] text()").first
  link = (articles / "div.article a[@rel=bookmark]").first['href']

  # Of course, this should have a real file path in it."/.../.../.../barking_iguana.ssi", "w+") do |f|
    f.write("#{title}: #{link}")

  sleep 900 # 15 minutes

Doesn't that look nice? No screwing around with complex tools to handle the scheduling - just run it and it'll go forever.

"Ah," I hear you say, "but what if it crashes?" Well, in the unlikely event that such a simple script does crash I'd have something like God monitoring the processes so it would be restarted. You've got something monitoring your processes anyway (right?) so it should be pretty simple to add another process to that list.

Love me!

If you've found this article useful I'd appreciate recommendations at Working With Rails.

written by
Disagree? Found a typo? Got a question?
If you'd like to have a conversation about this post, email I don't bite.
You can verify that I've written this post by following the verification instructions:
curl -LO
curl -LO
gpg --verify offline-tasks-the-easy-way.html.orig{.asc,}