Sitemaps with a bit of Metal
Sitemaps with a bit of Metal
Sooner or later, you will probably start taking care about your application's SEO,
especially if it provides a lot of content and you want to be discovered by users
in search engines results. There are several ways to do this in your Ruby app.
You can create sitemap.xml
file manually if there aren't very much urls,
but it will become pretty ineffective when you have more than a dozen or so.
There are some very neat tools which will do this for you out of the box,
even if you need a lot of customization.
Tools to the rescue
Tool which I would like to mention is Sitemap Generator by kjvarga. It's pretty cool, it keeps the standards so you don't have to care too much. It also have custom rake tasks, which will generate Sitemap under given criteria and ping selected search engines about availability of new one one for your site. Magic.
Installation is very easy. You only need to add one line to your Gemfile
:
sitemap_generator
Then you should run bundle
and rake sitemap:install
.
Now you should have config/sitemap.rb
in your directory structure,
which you need to tweak for your needs.
SitemapGenerator::Sitemap.default_host = 'http://example.com'
SitemapGenerator::Sitemap.create do
add '/home', :changefreq => 'daily', :priority => 0.9
add '/contact_us', :changefreq => 'weekly'
end
And that's it! All you need to do is to run rake sitemap:refresh
.
Now you have new sitemap.xml.gz
file in your /public
directory.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/home</loc>
<changefreq>daily</changefreq>
<priority>0.9</priority>
</url>
<url>
<loc>http://www.example.com/contact_us</loc>
<changefreq>weekly</changefreq>
<priority>0.9</priority>
</url>
</urlset>
If your app handle multiple domains, there's no problem, because you can render multiple sitemap files for different domains, subdomains or specific locales.
Hitting Sitemap limits
You might also heard that a single Sitemap
must have no more than 50,000 URLs and can't be larger
than 10MB. And it's true. There was a risk that our app will hit that limit in close future.
Fortunatelly Sitemap protocol provides
a possibility to handle such situation through index
files. As I mentioned earlier,
sitemap_generator
keeps the standards pretty good, so it creates index
file if such one is needed by default.
You can also force it to always create index
file.
Let's use some real life example. Mentioned application is presenting a huge amount of events and allows users to buy tickets to them.
We will fetch all events from database through find_each
method to get objects in batches.
We do this in case that large amount of objets could not fit into memory.
On each event we would use event_path
helper to add proper URL
to our sitemap.
SitemapGenerator::Sitemap.create_index = true
SitemapGenerator::Sitemap.default_host = 'http://example.com'
SitemapGenerator::Sitemap.create do
Event.find_each do |event|
add event_path(event.slug, locale: false), lastmod: event.updated_at, changefreq: 'daily'
end
end
After running rake sitemaps:refresh
we now have at least two files: sitemap.xml.gz
and sitemap1.xml.gz
.
At least because for each n multiple of 50,000, sitemap{n}.xml.gz
would get created.
Let's take a close look at sitemap.xml.gz
content:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml.gz</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>
It no longer contains a Sitemap
, but index
which specifies where the Sitemaps
are.
Content of sitemap1.xml.gz
:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/events/awesome_event_1</loc>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
<!-- ... -->
<url>
<loc>http://www.example.com/events/awesome_event_50000</loc>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
</urlset>
Content of sitemap2.xml.gz
:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/events/awesome_event_50001</loc>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
<!-- ... -->
<url>
<loc>http://www.example.com/events/awesome_event_100001</loc>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
</urlset>
Pretty easy, isn't it?
Great, but I don't want to keep this in my /public directory
If you use CDN for static files and don't want to keep Sitemap
in your /public
directory,
you can use specific adapter and just customize this in config file:
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
or even
SitemapGenerator::Sitemap.adapter = SitemapGenerator::S3Adapter.new
I really appreciate how the author of the gem solved different storage mechanisms.
If you need more customization in this area, you can just write your compatible adapter and save Sitemap
whenever you want: database, key-value storage or whatever.
If we want to use ActiveRecord
for this purpose, we can just write:
module SitemapGenerator
class ActiveRecordAdapter
def write(location, raw_data)
Sitemap.new do |sitemap|
sitemap.data = gzip(StringIO.new, raw_data).string
sitemap.filename = location[:filename]
sitemap.mime_type = 'multipart/x-gzip'
end.save!
end
def gzip(location, string)
gz = Zlib::GzipWriter.new(location)
gz.write string
gz.close
end
end
end
One more thing to do to keep this running is to create such db migration:
class CreateSitemaps < ActiveRecord::Migration
def change
create_table :sitemaps do |t|
t.binary :data, null: false
t.string :filename, null: false
t.string :mime_type, null: false
t.timestamps
end
add_index :sitemaps, [:filename, :created_at]
end
end
We must also update our config/sitemap.rb
file and tell that we want to use custom
adapter:
SitemapGenerator::Sitemap.adapter = SitemapGenerator::ActiveRecordAdapter.new
Ok, now we have up and running creation of Sitemap. But how to render it if it's no longer
available in /public
directory?
We need to find away to get the file from db and render to user, in this case search engine crawler.
Let's write custom controller
To render our file we need to create proper controller. In typical Rails application we would probably do something like this:
class SitemapsController < ApplicationController
skip_before_filter :authenticate_user! # because you use devise, don't you?
def show
sitemap = Sitemap.where(filename: params[:id]).order('created_at desc').first!
respond_to do |format|
format.xml_gz { send_data sitemap.data, filename: sitemap.filename }
end
end
end
Our controller responds to xml_gz
format which is not supported in Rails by default.
We need to register this format, so our controller could render proper response when
*.xml.gz
format is requested by client. We can to do this by putting line below
in config/initializers/mime_types.rb
file.
Mime::Type.register "application/x-gzip", :xml_gz, [], ["xml.gz"]
One more necessary thing is adding these few lines to config/routes.rb
:
constraints(format: /[a-z]+(\.[a-z]+)?/) do
resources :sitemaps, only: :show
get '/sitemap.:format' => 'sitemaps#show'
end
We use constraints on format
because we need to handle non standard, double resource extension xml.gz
.
Without this, our Rails app would lookup for resource with .gz
extension and sitemap.xml
would be treated as filename.
Let's take a look what exactly our controller has inside:
irb(main):001:0> SitemapsController.ancestors
=> [
SitemapsController,
ApplicationController,
#<Module:0x007fc2178c35a8>,
#<Module:0x007fc2179a01b0>,
ActionController::Base,
Turbolinks::XHRHeaders,
Turbolinks::Cookies,
Turbolinks::XDomainBlocker,
Turbolinks::Redirection,
Devise::Controllers::UrlHelpers,
Devise::Controllers::Helpers,
Devise::Controllers::StoreLocation,
Devise::Controllers::SignInOut,
ActiveRecord::Railties::ControllerRuntime,
ActionDispatch::Routing::RouteSet::MountedHelpers,
ActionController::ParamsWrapper,
ActionController::Instrumentation,
ActionController::Rescue,
ActionController::HttpAuthentication::Token::ControllerMethods,
ActionController::HttpAuthentication::Digest::ControllerMethods,
ActionController::HttpAuthentication::Basic::ControllerMethods,
ActionController::RecordIdentifier,
ActionController::DataStreaming,
ActionController::Streaming,
ActionController::ForceSSL,
ActionController::RequestForgeryProtection,
ActionController::Flash,
ActionController::Cookies,
ActionController::StrongParameters,
ActiveSupport::Rescuable,
ActionController::ImplicitRender,
ActionController::MimeResponds,
ActionController::Caching,
ActionController::Caching::Fragments,
ActionController::Caching::ConfigMethods,
AbstractController::Callbacks,
ActiveSupport::Callbacks,
ActionController::ConditionalGet,
ActionController::Head,
ActionController::Renderers::All,
ActionController::Renderers,
ActionController::Rendering,
ActionController::Redirecting,
ActionController::RackDelegation,
ActiveSupport::Benchmarkable,
AbstractController::Logger,
ActionController::UrlFor,
AbstractController::UrlFor,
ActionDispatch::Routing::UrlFor,
ActionDispatch::Routing::PolymorphicRoutes,
ActionController::ModelNaming,
ActionController::HideActions,
ActionController::Helpers,
AbstractController::Helpers,
AbstractController::AssetPaths,
AbstractController::Translation,
AbstractController::Layouts,
AbstractController::Rendering,
AbstractController::ViewPaths,
ActionController::Metal,
AbstractController::Base,
ActiveSupport::Configurable,
Object,
PP::ObjectMixin,
ActiveSupport::Dependencies::Loadable,
JSON::Ext::Generator::GeneratorMethods::Object,
Kernel,
BasicObject
]
irb(main):002:0> SitemapsController.ancestors.count
=> 68
But do we really need to carry whole this stuff which is usually inherited by ApplicationController? How about no. We don't really need skipping before filters, we don't need url helpers, turbolinks, devise and any other useless in this case stuff. So, let's slim this down a bit.
Here comes the Metal
class SitemapsController < ActionController::Metal
include AbstractController::Rendering
include ActionController::MimeResponds
include ActionController::DataStreaming
include ActionController::RackDelegation
include ActionController::Rescue
include ActionController::Head
def show
sitemap = Sitemap.where(filename: params[:id]).order('created_at desc').first!
respond_to do |format|
format.xml_gz { send_data sitemap.data, filename: sitemap.filename }
end
end
end
Let's take a look what we have achieved:
irb(main):004:0> SitemapsController.ancestors
=> [
SitemapsController,
ActionController::Head,
ActionController::Rescue,
ActiveSupport::Rescuable,
ActionController::RackDelegation,
ActionController::DataStreaming,
ActionController::Rendering,
ActionController::MimeResponds,
AbstractController::Rendering,
AbstractController::ViewPaths,
ActionController::Metal,
AbstractController::Base,
ActiveSupport::Configurable,
Object,
PP::ObjectMixin,
ActiveSupport::Dependencies::Loadable,
JSON::Ext::Generator::GeneratorMethods::Object,
Kernel,
BasicObject
]
irb(main):005:0> SitemapsController.ancestors.count
=> 19
Our controller is much lighter and contains mostly necessary things to serve our sitemap.xml.gz
file to Google,
Bing, Yandex or whoever wants our Sitemap
. José Valim inspired me
to use ActionController::Metal in his
Crafting Rails Applications book. Picking only those modules which are indispensable for our
controllers is a pretty cool approach, but in my humble opinion, not often seen in Rails applications.
Mounting Sinatra application with requested functionality in routes.rb
or config.ru
could be alternative,
but still lightweight solution.
Making sales and marketing happy
As you can see, such chore like rendering Sitemap
can be done in a smart way,
in most of the cases with just few lines of code. It's very useful, especially for the applications
with a lot of content. Easy customization is another advantage of presented solution. And now go and make
your sales and marketing teams happy providing better search engine results.
I will try to write another blogpost focused on usage of ActionController::Metal
in different, but maybe more
surprising use case, on condition that this topic is interesting for you and
Robert won't forestall me. :)