Scaling: Using MogileFS for Storing Uploaded Images
As you might have guessed from several of my previous posts, the team I've been working in has recently been scaling an application. I've learned a bunch of things along the way, and I've got half-written articles about several of them that I'll totally finish one day.
One of the most useful technologies I've started using is [MogileFS](http://www.danga.com/mogilefs/), a distributed BLOB store. In our application we use it to store user-generated assets like uploaded images and syndication feeds. Rather than go into the pros and cons here, I'd like to share some code that's been genuinely useful: a `MogileFilesystemBackend` for [AttachmentFu](http://github.com/technoweenie/attachment_fu/tree/master).
Why do you need a shared filestore for uploads? Once your application cluster scales beyond a single box, uploaded images land on different disks depending on which server handled the request. Without a shared store, there's no guarantee a particular image will be available to a subsequent request that hits a different server.
#### Getting stuck in
I've done some admittedly ugly preparation here and monkey-patched `Kernel` to provide an `attr_accessor` called `filestore` -- just an instance of `MogileFS::MogileFS` from the excellent [MogileFS client](http://seattlerb.rubyforge.org/mogilefs-client/) by the folks at [Seattle RB](http://seattlerb.rubyforge.org/). The patch, which will probably make experienced Rubyists wince, looks like this:
```ruby
module Kernel
# Oh noes, I'm screwing with Kernel.
#
mattr_accessor :filestore
end
```
During Rails initialisation, the filestore is set up using configuration values pulled from a YAML file in `RAILS_ROOT/config/`:
```ruby
Kernel.filestore = MogileFS::MogileFS.new(
:domain => "APPNAME-#{RAILS_ENV}",
:hosts => array_of_hosts_from_yaml_file
)
```
(What I actually do is quite a bit different from this because I've done evil things to the MogileFS client library, which I'll probably share in the future. For now, believe the magic.)
With the setup complete, getting AttachmentFu to work with MogileFS is straightforward:
```ruby
class Image << ActiveRecord::Base
has_attachment :content_type => :image,
:storage => :mogile_filesystem,
:max_size => 5.megabytes,
:thumbnails => {
:canonical => '1024x'
},
:processor => "MiniMagick"
validates_as_attachment
end
```
#### The backend
Without the actual backend code, none of the above does anything. The implementation was heavily influenced by the existing Amazon S3 backend, since the concepts behind S3 and MogileFS are quite similar:
```ruby
module MogileFilesystemBackend
def full_filename(thumbnail = nil)
"#{class_prefix}:#{filestore_tag(thumbnail)}"
end
def filestore_tag(thumbnail = nil)
"#{parent_id || id}:#{thumbnail || :original}"
end
def current_content
temp_path ? File.read(temp_path) : temp_data
end
def public_filename(thumbnail = nil)
[
editorial_object_type.demodularize.tableize,
editorial_object_id,
"#{class_prefix}.#{file_extension}#{thumbnail && "?size=#{thumbnail}"}"
].join("/")
end
def file_extension
Mime::Type.lookup(content_type).to_sym
end
def filestore_paths(thumbnail = nil)
filestore.get_paths(full_filename(thumbnail))
end
def file_data(thumbnail = nil)
filestore.get_file_data(full_filename(thumbnail))
end
protected
def current_content_location
temp_path ? :temp_path : :temp_data
end
def destroy_file
filestore.delete full_filename
end
def rename_file
filestore.rename @old_filename, full_filename
end
def save_to_storage
logger.info "Storing #{self.class.name}\##{id} as #{full_filename(thumbnail)} (class: #{replication_policy}) from #{current_content_location == :temp_path ? temp_path : :memory}"
filestore.store_content full_filename(thumbnail), replication_policy, current_content
end
def class_prefix
self.class.name.demodularize.underscore.downcase
end
alias_method :replication_policy, :class_prefix
end
Technoweenie::AttachmentFu::Backends::MogileFilesystemBackend = ::MogileFilesystemBackend
```
#### Serving images
Getting images *into* MogileFS is only half the story. You also need to serve them to visitors. Here's a controller that reads from the `filestore` instead of the local filesystem (and if you're storing files in the database, we need to have a talk):
```ruby
class ImageController < ApplicationController
before_filter :load_image
def show
respond_to do |format|
format.html
format.any(:png, :jpg, :gif) do
send_data @image.file_data(params[:size]),
:type => @image.content_type,
:disposition => 'inline'
end
end
protected
def load_image
@image = Image.find(params[:id])
end
end
```
And there you have it. Images go into MogileFS on upload, get replicated across your storage nodes, and are served back to visitors through a simple controller action. No more worrying about which app server has which file.
These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.