Menu Search
Jump to the content X X
Smashing Conf San Francisco

We use ad-blockers as well, you know. We gotta keep those servers running though. Did you know that we publish useful books and run friendly conferences — crafted for pros like yourself? E.g. upcoming SmashingConf San Francisco, dedicated to smart front-end techniques and design patterns.

Using A Static Site Generator At Scale: Lessons Learned

Static site generators are pretty en vogue nowadays. It is as if developers around the world are suddenly realizing that, for most websites, a simple build process is easy enough to render the last 20 years of content management systems useless. All right, that’s a bit over the top. But for the average website without many moving parts, it’s pretty close!

However, does that hold true for websites bigger than your humble technology blog? How do static site generators behave when the number of pages exceeds the average portfolio website and runs up into the thousands? Or when development is a team effort? Or when people of different technical backgrounds are involved? This is the story of how we managed to bring roughly 2000 pages and 40 authors onto a technology stack made for hackers.

The Reason For Static Sites And The Task At Hand

We used static site generators at our company’s spin-off startup, where we had a reasonable amount of content to maintain: about 30 to 40 pages of product information, the occasional landing page and some company-related websites.

We have had good experiences with static site generators. For us front-end-heavy web developers, using a static site generator is as easy as templating, but with real data and actual content! And it enables us and our content providers to scale easily.

We started using the popular Jekyll1 static site generator. The usual page consisted of the following:

  • YAML front matter
    This is a delight for authors and editors, because they can put any meta information in it — even meta data that is not yet interpreted but might be interpreted in the future.
  • Markdown
    Markdown provides the basic structure for the content. It is easy to understand, it is easy to write, and a ton of editors out there give a good preview of the content at hand.
  • Liquid block elements
    Liquid is Shopify’s2 templating language, and it’s very powerful. It allows for advanced loops and conditionals and can be easily extended using plugins written in Ruby. Developers provide content editors with structural elements such as {% section %} and {% column %} to better organize the page.
  • And, of course, a lot of images
    Each page had about 5 to 15 images.

A typical page looked like this:

---
title: Getting started with our product
layout: blue
headerImage: getting-started.svg
permalink: /getting-started/
---

{% section %}
# How to get started with our product
…
{% endsection %}
…

In the end, writing content was as easy as scribbling down notes in an editor. Polish and beauty were added once the page ran through Jekyll.

In addition to the ease of using our preferred content editors, we loved the additional benefits that static site generators gave us as developers and “webmasters.” (You haven’t heard that term in quite some time, have you?)

  • Security-wise, static websites are a fortress. Not having any database or any dynamic interpreter running on your servers reduces the risk of hacks tremendously.
  • A static website is incredibly fast to serve. Put it on a CDN and let it be consumed worldwide in no time.
  • Web developers love the flexibility. Changing the layout or adding a microsite to the content does not require you to go deep into the internals of the content management system, nor does it require any hacks. You can maintain these resources next to your usual content and “just deploy” them with it.
  • Storing all of the technology parts as well as the content in a version-control system such as Git allows for a flexible publishing cycle. Preparing content in a branch, merging it on demand and putting it out on the servers entails just a few clicks on a Monday morning.
Git as content store3
Using Git as content store allows you to treat content like source code. Including pull requests, code reviews and versioning. This brings content authors to the same place as developers and designers. (Image: Stefan Baumgartner and Simon Ludwig) (View large version4)

Our five-person team was pretty pleased with the results. We took the idea from our marketing website over to our other web entities. Suddenly, next to our 40-page main brand website was a 50-page style guide. Then, 150 pages of documentation. Then, almost every web entity from our sibling company, counting up to 2000 pages of documentation. You wouldn’t believe it, but our tech stack was at the edge of exhaustion:

  • The bigger your page, the longer your build. Static site generation is pro-active compilation of source code into HTML pages. It can take hours until your website is deployment ready.
  • Choosing a static site generator is like betting on one Content Management System. A tool which speeds you up at first, but slows you down when it doesn’t meet your needs.
  • Even if most of your content is static and does not require any user input, there is the occasional case where you need dynamic data.
  • Tech-savvy content editors love working with content that is actually source code. Not so tech-savvy editors …, well, they don’t.

Let’s see how we tackle each one of those topics.

More Content, More Build Time

One key factor of static site generation is the proactive approach to rendering. With a traditional content management system (CMS), each page you access gets rendered just for that one visit (obvious caching algorithms not included). With a static site generator, you create all of your pages at once. And even though the process is fast, it scales linearly with the amount of content. Especially when you have to auto-generate and auto-optimize responsive images of screenshots from 200 pixels up to full high definition in 200-pixel steps. Even with our initial setup of 40 pages and roughly 300 images, the build took about two hours from start to finish on our continuous delivery machines. That’s not time you want to wait for to see whether you fixed that typo in your headline or not. Anticipating our workload in the not-so-distant future, we had to make some important decisions.

Divide And Conquer Build

Even if your technology stack can generate thousands of pages, it doesn’t necessarily need to. Not every item of content needs to know about every other item. A German-language website can be treated separately from an English-language version. And the documentation is a different content area than our main brand website.

Not only are the content elements discrete, but they also diverge in update frequency. Blogs are updated many times a day, the main brand website many times a week, and the documentation every two weeks to coincide with our product’s feature releases. And our Java performance book? Well, that’s once or twice a year.

This led us to split all of our websites into distinct content packages and tech repositories. The content packages were Git repositories that contained everything an author or editor could and should touch: Markdown files, images, additional data lists. For each entity, we created a content repository with the same structure.

One tech repo for various content packages5
The tech repository can be regarded as a machine that converts several content repositories into full static websites. (Image: Stefan Baumgartner and Simon Ludwig) (View large version6)

The tech repositories, on the other hand, contained everything meant for developers: front-end code, templates, content plugins, and the build process for our static site generation.

Our build servers were set up to listen for changes to each of those Git repositories. A change in a content repository triggered a build of that content package alone. A change in a tech repository triggered a build of only the corresponding content repository. Depending on how many content repositories you have, this can cut down the build time tremendously.

Incremental Builds

Even more important than splitting the content files into separate packages was the method for dealing with images. The objective was to generate each screenshot of our product in various responsive-friendly sizes, down to 200 pixels wide. Also, each newly generated image would have to be optimized in gulp-imagemin7. Doing this for every build iteration took up a good chunk of the initial two-hour build time.

While the two hours of build time were necessary for the very first build, that was wasted time for each subsequent one. Not every image changed from iteration to iteration. Much of the work had already been done, so why do it over and over again? Incremental builds were the key to saving our build servers and ourselves from having to do a lot of work.

Our image processing was done in Gulp8. The gulp-newer plugin is exactly what we needed for our incremental builds. It compares the timestamp of each file with the timestamp of the file with the same name in the destination directory. If the timestamp is newer, the file is kept in the stream. Otherwise, it is discarded. Generating all of the responsive images, then, was a matter of chaining the right plugins in the right order:

const merge = require('merge2');
const rename = require('gulp-rename');
const newer = require('gulp-newer');
const imagemin = require('gulp-imagemin');
…

/*
The options in this case are an array of image widths. In our
case, we want responsive images from 200 pixels
up to 1600 pixels wide.
*/
const options = [
  { width: 200 }, { width: 400 },
  { width: 600 }, { width: 800 },
  { width: 1000 }, { width: 1200 },
  { width: 1400 }, { width: 1600 }
];

gulp.task('images',() => {
  /*
  We can map each element of this array to a Gulp stream.
  Each of those streams selects each of the original images
  and creates one variant.
  */
  const streams = options.map(el =>  {
    return gulp.src(['./src/images/**/*'])
       /*
       We follow a naming convention of adding a suffix to
       the file's base name. This suffix is the image's target
       width.
       */
      .pipe(rename(file => file.basename += '-' + el.width))
       /*
       This is where the "incremental" builds kick in.
       Before we run the heavy processing and resizing tasks,
       we filter elements that don't have any updates.
       This Gulp tasks checks whether the results in "images" are
       older than the source items.
       */
      .pipe(newer('dist/images'))
      .pipe(resize(el))
      .pipe(imagemin())
      .pipe(gulp.dest('dist/images'));
  });
  return merge(streams);
});

In the absence of a good image-resizing plugin, we had to create our own. This was also the time to ensure that no unnecessary file was being processed. If an image couldn’t be resized because the target width was bigger than the original, then we discarded the image as well. The following snippet uses Node.js’ GraphicsMagick9 bindings to complete the task.

const gm = require('gm');
const through = require('through2');

module.exports = el => {
  return through.obj((originalFile, enc, cb) => {
    var file = originalFile.clone({contents: false});

    if (file.isNull()) {
      return cb(null, file);
    }

    const gmfile = gm(file.contents, file.path);
    gmfile.size((err, size) => {
      if(el.width < size.width) {
        gmfile
          .resize(el.width,
            (el.width / size.width) * size.height)
          .toBuffer((err, buffer) => {
             file.contents = buffer;
             /* add resized image to stream */
             cb(null, file);
           });
      } else {
        /* remove from stream */
        cb(null, null);
      }
    });
  });
};

With all of this incremental adding of files, we couldn’t forget to get rid of files in the destination directory that had been deleted in the source. We didn’t want any leftovers from previous builds lying around, adding extra weight to the bundle to be deployed. Thankfully, Gulp’s task system allows for promises10, so we had a lot of Promise-based plugins we could use for this task.

const globby = require('globby');
const del = require('del');
const path = require('path');
const globArray = [
  'images/**/*'
];

const widths = [200, 400, 600, 800, 1000, 1200, 1400, 1600];

/* This helper function adds the width suffix
   to the file name. */
const addSuffix = (name, w) => {
  const p = path.parse(name);
  p.name = `${p.name}-w`;
  return path.format(p);
}

gulp.task('diff', () => {
  return Promise.all([
    /* First, we select all files in the destination. */
    globby(globArray, { cwd: 'dist', nodir: true }),
    /* In parallel, we select all files in the source
       folder. Because they don't have the width suffix,
       we add them for every image width after selecting. */
    globby(globArray, { cwd: 'src', nodir: true })
      .then(files => files
         .map(el => […widths.map(w => addSuffix(el, w)])
  ])
  /* This is the diffing process. All file names that
     are in the destination directory but not in the source
     directory are kept in this array. Everything else is
     filtered. */
  .then(paths => paths[0]
    .filter(i => paths[i].indexOf(i) < 0))
  /* The array now consists of files that are in "dest"
     but not in "src." They are leftovers and should be
     deleted. */
  .then(diffs => del(diffs));
);

With all of these changes to the build process, the initial two hours for the images had been reduced to two to five minutes per build, depending on the number of images added. The extra time of doing all of the file-status checks passed pretty quickly, even with tens of thousands of images lying around.

Avoiding Technology Lock-In

Jekyll is an amazing tool because it comes with a lot of features that go beyond merely creating HTML pages. The healthy plugin ecosystem makes Jekyll not just a static site generator, but a full-fledged build system. Out of the box, it’s possible to compile Sass and CoffeeScript11 with Jekyll. The Jekyll asset pipeline12 offers not only a ton of features for creating images, but also extra confidence because it checks every included asset for existence and integrity. This is gold if you’re dealing with a lot of assets.

However, these benefits come at a high cost, not only with performance and build time, but also with a certain level of technology lock-in. Instead of Jekyll being in your build system, it becomes your build system. Anything not included in Jekyll or one of the plugins has to be written and maintained by you in Ruby.

This bugged us in a several ways. Ruby was not our favorite language to begin with. While many on our team could work with Ruby, some of us couldn’t write a single line without referring to the language’s specification. Even worse is that we were trying to move away from a traditional CMS to gain more freedom and flexibility in the way we do things. By relying heavily on Jekyll’s ecosystem, we were trading one monolith for another. To avoid this form of technology lock-in, we took a few more steps.

Separation of Concerns

First, we stripped away everything from Jekyll’s duties that had nothing to do with the actual output of HTML. We still included the check for an asset’s existence. However, image generation and the compilation of JavaScript and style sheets would all be done by Gulp builds running beforehand.

This gave us a list of completely different benefits:

  • Should we have a change of heart and switch Sass for something trendier, it would affect only a single part of our build file, not the entire static site generation. The same goes for the compilation of our other assets.
  • We could decide whether even to build certain assets. The JavaScript might change, but the styles might not, so why compile the styles again? This cuts down even more on build time.
  • We could even remove Jekyll at some point, and key parts of our build would still be intact and functioning.

Secondly, we removed any post-processing steps from Jekyll. The Jekyll asset pipeline allows you to create hashed URLs for JavaScript, style sheets and images. Stripping that away from the Jekyll process meant Jekyll had less to do, thus clarifying its purpose. Interestingly enough, we saw an improvement in speed by moving the revisiting process from Ruby to Node.js. The wonderful plugin gulp-rev13 took care of this process.

const gulp       = require('gulp');
const rev        = require('gulp-rev');
const revReplace = require('gulp-rev-replace');
…
gulp.task('revision', () => {
  return gulp.src(['./**/*.js',
     './**/*.css', './images/**/*.*'])
    .pipe(rev())
    .pipe(gulp.dest(based))
    .pipe(rev.manifest())
    .pipe(gulp.dest('.'));
});

gulp.task('rev', ['revision'], () => {
  var manifest = gulp.src('rev-manifest.json');
  return gulp.src(['./**/*.html'])
    .pipe(revReplace({
      manifest: manifest
    }))
    .pipe(gulp.dest(based));
});

From here on in, we made sure to know what is a part of Jekyll’s purpose and what isn’t. You can do amazing things with Jekyll and its ecosystem, but you also don’t want to rely too much on a tool that might not be the right one for tasks to come.

Jekyll’s responsibilities were reduced to converting Markdown and Liquid to HTML pages. With everything else being done by Gulp, you can easily spot the odd bird in the stack:

  • The self-written plugins for custom sectioning elements were being done in Ruby (the only Ruby dependency still there).
  • We were still using Liquid, a rather “exotic” templating language.

We also realized that Jekyll is not meant to be included in a build process. Jekyll was created to be the build process. Jekyll opens and analyzes every file during a build. Once you strip away everything from Jekyll that isn’t HTML creation, you have to take care of Jekyll’s built-in features like incremental builds by yourself.

Liquid Voodoo

While Jekyll is very popular, the underlying templating engine, Liquid, seems to be the odd one standing. It bears similarities to the PHP templating engine Twig and the JavaScript equivalent Swig, but it has a lot of features that are nowhere else to be seen. Liquid is powerful and allows for a lot of logic to find its way into the templates. This is not always a good thing, but it also isn’t Liquid’s fault. Take, for example, the creation of breadcrumbs based on a document’s permalink, done entirely in the templating language:

{% assign coll = site.content %}
<ul class="breadcrumbs">
  <li><a href="{{site.baseurl}}/">Home</a></li>
  {% assign crumbs = page.url | split: '/' %}
  {% for crumb in crumbs offset: 1%}
  {% capture crumb_url %}{% assign crumb_limit = forloop.index | plus: 1 %}{% for crumb in crumbs limit: crumb_limit %}{{ crumb | append: '/' }}{% endfor %}{% endcapture %}
  {% capture site_name %}{% for p in coll %}{% if p.url == crumb_url %}{{ p.title }}{% endif %}{% endear %}{% end capture %}
  {% endunless %}
  {% unless site_name == '' %}
  <li>
  {% unless forloop.last %}
    <a href="{{ site.baseurl }}{{ crumb_url | strip_newlines }}">{{site.name}}</a>
  {% else %}
    <span>{{ site_name }}</span>
  {% endunless %}
  </li>
  {% endfor %}
</ul>

Let’s not go too deep into the abomination of code you see above. A mere glance should get the point across: The code above will output correctly, but it’s obviously not as readable as one would expect from a templating engine. On the contrary, the more features and logic you cram into this, the worse it’s going to be if you ever have to reconstruct what has happened. Moving away from this Liquid “voodoo” to Jekyll plugins would be a better idea:

  • Restrict Liquid to content output (loops and simple conditionals).
  • Create complex data beforehand. If it’s not available in Jekyll itself, then a plugin or a pregenerated YAML or JSON file is the way to go.

Looking at the breadcrumb generation again. A plugin that fills the relevant data set would be much more flexible and would not rely on string concatenation or splitting magic. Also, the Liquid templates that access the prefilled data would be much more readable and easier to understand:

{% if page.breadcrumbs %}
<ul class="breadcrumbs">
  <li><a href="{{site.baseurl}}/">Home</a></li>
  {% for item in page.breadcrumbs %}
  <li>
    {% unless forloop.last %}
    <a href="{{item.url}}">{{item.label}}</a>
    {% else %}
    <span>{{item.label}}</span>
  {% endunless %}
  </li>
  {% endfor %}
</ul>
{% endif %}

This will keep your templates clean and tidy. Also, if you want to move from Liquid to another templating engine (in case you ever drop Jekyll), the templates will be a lot easier to convert.

Serving More Than Static Websites

Deploying a static website sounds easy at first. You have a bundle of rendered HTML files and a lot of assets, and you just have to put them somewhere to be delivered to the World Wide Web. With free hosting services, static storage services and content delivery networks, the possibilities for getting your content out seem endless. You even can serve a page from a Dropbox folder!

If you are doing more than simply delivering content — perhaps you are in an ongoing migration process — then the requirements for the server might be a little more demanding.

The solution we have in place is based on nginx, which is great for serving static websites to begin with, but also makes for an easy setup when you’re not just serving a static website.

Ongoing Migration From Old to New

With 2000 pages of content divided into different content packages, we had two strategies to choose from to go live:

  • Convert all of the old content, wait for a big-bang release and fail miserably.
  • Or start to release smaller content packages, and migrate over time.

With option one, the converted content would grow stale or would have to be maintained twice for a certain amount of time. And we wouldn’t get the benefits of static websites until long after everything is done. Of course, we opted for the latter. To make sure we could freely deploy new content created with the new technology stack without killing access to unmerged pages from the old CMS, we configured nginx to serve as a “fall-through” proxy.

A proxy handling files from two different locations14
The proxy hits files from the static content folder first. Should the file not be available, the proxy falls through to the old CMS server. (Image: Stefan Baumgartner and Simon Ludwig) (View large version15)
  • The idea was to first hit the deployed static website. This page and all of the assets had to be served should nginx hit a file.
  • Instead of showing a 404 when a file is not available, nginx proxies the request through to the old server architecture.

This required us to run nginx on its own server, with the domain pointed towards it, and the IP of the old server serving as an upstream server.

# An upstream server pointing to an IP address where the old
# CMS output is
upstream old-cms {
  server  192.168.77.22;
}

# This server fetches all requests and fetches the static
# documents. Should a document not be available, it falls
# back to the old CMS output.

server {
  listen 80;
  server_name your-domain.com;

  # the fallback route
  location @fallback {
    proxy_pass  http://old-cms;
  }

  # Either fetch the available document or go back to the
  # fallback route.
  location / {
    try_files   $uri $uri/ @fallback;
  }
}

As the migration continued, more and more pages fell from the old CMS to the new static site server. Once everything was done, we could freely cut off the connection to the old server and have everything on the new one.

The Need for Dynamic Content

Static websites exclude dynamic content, by definition. But even with the most static of websites, a dynamic element is needed once in a while. A lot of website features can be implemented with services that operate solely client side. Comments on blogs are often powered by services such as Facebook and Disqus. However, these features shouldn’t be critical to your website. Because JavaScript is the weakest link in the browser’s technology stack and could fail unexpectedly, it wouldn’t be wise to rely on it for key features of your website.

Other features require the server. For example, some parts of our static website are confidential and must be restricted to a certain user group. To gain access, a user needs to log in using the company’s single-sign-on service and must have the necessary permissions.

For such cases, we opened up a door to a small Node.js application using nginx upstreams.

A proxy handling files from two different locations16
Before hitting the static websites, the proxy directs a subset of URLs to the Node.js app, hiding possible static websites underneath. (Image: Stefan Baumgartner and Simon Ludwig) (View large version17)
# This upstream points to a Node.js server running locally
# on port 3000.
upstream nodejs-backend {
  server   127.0.0.1:3000
}

# Later, we pointed dedicated routes to this Node.js back end
# when defining locations.
location /search/ {
  proxy_pass http://nodejs-backend;
}

This application handles session IDs, calls to the single-sign-on service and a small website search feature implemented with Lunr18. The important thing here is that we’re not modifying existing content, but rather providing additional content on top of it. If your content needs to have mostly dynamic features, then a static site generator is not right for you.

Editing With A Static Site Generator

The biggest challenge in our journey to static sites was getting content editors to work with the new technology stack. You have to be hard as nails if you are willing to leave the comforts of your WYSIWYG editor — you know, those same comforts that drive web developers insane.

To get everyone, even the biggest skeptics, on board, we needed to take care of the following:

  • Content editing on the new website had to be an improvement for the author, both in comfort and productivity. Providing actual content management and offering a good overview of what’s happening on the website justified the added complexity of dealing with files and source code.
  • Authors had to see the benefit of being restricted in how they could edit content. Not being able to color text red would be a relief for developers and would make the website more consistent, but content creators might view it as an unnecessary restriction. Again, this is an issue of content management versus content design. Leveraging the possibilities of structuring content without giving much thought to the visual output lowers the barrier to entry with the new system. If content editors are enabled to do much more that is actually beneficial to the page, then they won’t miss features that they never needed in the first place.

On top of that, we had to support content editing for two different types of users: the professional, daily content editor and the occasional contributor.

Pro Mode

We provided our main content editors with two ways to edit.

The first was to set up Node.js and Ruby on the content editor’s own machine, as well as to give them a crash course in Git and GitHub. Granted, this is not the easiest way to get started. It’s actually pretty hard, especially for someone who isn’t familiar with these technologies or isn’t even into software development at all. This elaborate setup was meant for people who needed a complete view of the whole page, or at least of their content package.

The advantage for these editors kicked in when they got to creating content. Writing in Markdown, as one might scribble notes, was incredibly fast. And copying and pasting content blocks between websites without the interference of a web-based interface made it even faster.

An editing tool such as Atom learns from everything you type and completes your code at times — another boost in productivity. Content becomes code, and all of the advantages of code editing emerge. Once the heavy lifting was done and the content authors were set up, editing happened in a breeze.

The other way, meant for quick fixes and updates to existing content, was to use GitHub’s web interface.

Github's source editing interface is close to a Markdown CMS.19
GitHub features a source editor that can stand in for a file-based Markdown CMS, including file uploading. (View large version20)

Not only is GitHub good for storing and managing source code, but its in-place editing and file uploading make it a full-fledged CMS. It’s a bit rough if you want to do more than Markdown and actually want previews of your images, but everything that’s available for editing GitHub’s wiki pages was good enough for our own documentation.

One huge benefit we got from using Git and GitHub for data storage and the user interface was being able to work in content branches and doing pull requests. This was a delight for managing content, because we could prepare content in packages without interrupting the public website. This allowed professional content editors to create their own version of a page long beforehand and then push it live simply by clicking the “merge” button in GitHub.

Access to GitHub was, however, still a bit limited. There was a lot of clutter meant for source code, not for content, and a connection to the actual website couldn’t be provided. So, a whole lot of people were unable to create or maintain content.

Not-So-Pro Mode

For those people, we decided to create our own content-editing interface, strongly inspired by the likes of Ghost21. The most important thing was to be able to get an immediate preview. People who don’t spend every day creating content would be confused and put off if they couldn’t see the result of what they were typing — especially when the incorrect use of a custom plugin could result in a build error.

Constantly rendering the content at hand instilled confidence and supported the work of the content author. It was even fun to see simple commands outputted to advanced HTML.

A simple content preview editor that allows fast Markdown to HTML conversion22
The content preview editor allows for Markdown conversion and a lot of custom content blocks, giving authors a good feel for the result. (View large version23)

When working on the content-editing interface, we started dropping Jekyll completely. Jekyll is meant for websites, not for single pages in a single-page application. Requiring a manually started rendering process, which was rather slow to begin with, negated the benefit of immediacy. Jekyll was slowly replaced with Metalsmith24, a static site generation framework that had one huge benefit: various templating engines to choose from. This enabled us to run the same templating language on both the static site generator and our content-editing application, saving us a lot of development time and keeping the output consistent.

The content-editing application was not meant to be a full-fledged CMS. It was a way to get easy access to a technology that’s difficult, and to provide others who spend more time working on the content with prepared input.

Again, we saw an increase in productivity. The content providers usually wrote in Word and then pasted the text in various other content-editing interfaces. This had side effects: copy-and-paste errors and additional styling that was hard to get rid off. This time, the content they prepared was perfectly consumable by our static site generator and required minimal attention afterwards.

API-Driven Content Management System

With the popularity of static site generators, we’ve also seen a shift in the CMS landscape. Instead of being all-powerful, all-in-one solutions, the new generation of CMS’ focuses on actual management again, dropping the rendering process in favor of structured output via an API. With our goal to make content management as convenient as possible for our authors and editors, we inspected these as well.

Commercial products such as Contentful25 and open-source alternatives such as Cockpit26 provide all of the conveniences of a traditional CMS without requiring you to spend much effort on the actual content-rendering part. Traditional old-school CMS’ such as Drupal have jumped on the bandwagon, with the Headless Drupal27 initiative. Even WordPress allows you to access content directly with the JSON API28 plugin.

Contentful as an API-driven CMS29
API-driven CMS’ such as Contentful have all of the conveniences of a traditional CMS, while keeping content management and content output completely separate. (View large version30)

If you think of the static site generation pipeline as a combination of loosely coupled technologies, API-driven CMS’ still come with a certain kind of vendor or technology lock-in. The editing interface isn’t tightly coupled to the rendering process anymore; however, how and where your data is stored is still in the hands of the CMS.

One of our requirements was to preserve the content itself in a human-readable and easily accessible form. Storing content in structured Markdown files on GitHub was key to quickly moving forward, making quick updates and choosing combinations of technologies as we see fit. We didn’t want to go through another content migration process.

Resources

In our journey from a simple startup website to the point where we were powering all of our company’s content with static site generators, we went through a lot of resources.

Phil Hawksworth has been talking about static site generators and their implications for developers and designers for the last two years. His constantly updated talk31 is the definitive resource if you want to start with static site generation.

StackEdit32 is a wonderful open-source browser-based Markdown editor. In addition to the lovely interface, it also features a lot of integration with data-storage platforms such as GitHub and Dropbox.

A ton of static site generators are out there. StaticGen33 gives a good overview and might help you find the right tool for the job.

Conclusion

Static site generators are wonderful, even when they have to deal with work for which they weren’t initially created. To avoid trading one big monolith for another, keep your tasks for the static site generator clear and manageable. Splitting up tasks and keeping parts of the build process interchangeable will help you if one part doesn’t serve its purpose anymore.

In addition to the technical challenges, the human factor is also critical. People have to see the benefits of working in a rather unfamiliar environment. Take care and provide aid where necessary for them to get going and become more productive than ever.

(vf, il, al)

Footnotes Link

  1. 1 http://jekyllrb.com
  2. 2 https://help.shopify.com/themes/liquid
  3. 3 http://provide.smashingmagazine.com/new_no-cms-intro.svg
  4. 4 http://provide.smashingmagazine.com/new_no-cms-intro.svg
  5. 5 http://provide.smashingmagazine.com/new_conveyor_belt.svg
  6. 6 http://provide.smashingmagazine.com/new_conveyor_belt.svg
  7. 7 https://github.com/sindresorhus/gulp-imagemin
  8. 8 http://gulpjs.com
  9. 9 https://www.npmjs.com/package/gm
  10. 10 https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Promise
  11. 11 https://jekyllrb.com/docs/assets/
  12. 12 https://jekyll.github.io/jekyll-assets/
  13. 13 https://github.com/sindresorhus/gulp-rev
  14. 14 http://provide.smashingmagazine.com/visu_1.svg
  15. 15 http://provide.smashingmagazine.com/visu_1.svg
  16. 16 http://provide.smashingmagazine.com/visu_2.svg
  17. 17 http://provide.smashingmagazine.com/visu_2.svg
  18. 18 http://lunrjs.com/
  19. 19 https://www.smashingmagazine.com/wp-content//uploads/2016/08/new_github-screenshot-large-opt.png
  20. 20 https://www.smashingmagazine.com/wp-content//uploads/2016/08/new_github-screenshot-large-opt.png
  21. 21 https://ghost.org/
  22. 22 https://www.smashingmagazine.com/wp-content/uploads/2016/07/ook-large-opt.jpg
  23. 23 https://www.smashingmagazine.com/wp-content/uploads/2016/07/ook-large-opt.jpg
  24. 24 http://metalsmith.io
  25. 25 https://contentful.com
  26. 26 http://getcockpit.com
  27. 27 https://groups.drupal.org/headless-drupal
  28. 28 https://wordpress.org/plugins/json-api
  29. 29 https://www.smashingmagazine.com/wp-content/uploads/2016/07/contentful-large-opt.jpg
  30. 30 https://www.smashingmagazine.com/wp-content/uploads/2016/07/contentful-large-opt.jpg
  31. 31 https://www.youtube.com/watch?v=_cuZcnJIjls
  32. 32 https://stackedit.io/
  33. 33 https://www.staticgen.com/

↑ Back to top Tweet itShare on Facebook

Stefan Baumgartner is web architect and performance advocate based in Linz, Austria. He made the Kessel run in less then twelve parsecs, organises several meetups and conferences, loves Italian food and enjoys Belgian beer.

  1. 1

    A great article. Thanks for sharing your knowledge and experience.

    8
  2. 3

    I’m partial to Hugo : https://gohugo.io which has no dependencies, comes in precompiled binaries for all major operating systems and is blazingly fast.

    6
    • 4

      I absolutely love Hugo! It’s also one of the few Static Site generators that work with almost no hassle on author machines! When we moved from Jekyll to Metalsmith, the most important criteria for us was to reuse most of the same technology for content editing interfaces, preview modes and the static site generation. This came to be with the Handlebars templating language running on all three apps.

      1
  3. 5

    Hi Stefan,
    Do you intend to open-source your editor? I’d love to see it, particularly the preview function, and I’ll post it on our tools section at the New Dynamic: http://www.thenewdynamic.org/tools/content-management/

    3
    • 6

      Short answer: yes. Slightly longer answer: The content editor is right now tailored for exactly the type of website that we have running. We’re currently thinking about splitting several parts to make them more reusable. Also to make e.g. WordPress and other platforms editable with the same editor. Once we know how (and when!), we’ll definitely do that in the open.

      2
  4. 7

    There must have been a point at which you asked, “Why not just use a CMS?”. What was the driving force for keeping the site static?

    6
    • 8

      Oh yes, we’ve been more than often at this point and to be honest, we’re still asking ourselves from time to time ;-) There are several reasons, with the most important one being that we truly want to avoid technology lock-in. Right now, CMS’s take care of everything. From Content editing to data storage and render process. If you place your bet on one CMS, you have to live with each one of those parts being tightly coupled and also the only choice you have. You then start not only talking about creating a website, but about creating a Typo3 website, or a Drupal website or a WordPress blog. You need the expertise, and if you want to expand your team, they better have expertise in the selected technology as well. So before we made our choice, we took a very good look on the content itself and how we want to maintain and preserve content for future versions of the web entities. This lead to having all 2000 pages now available in machine- and human-readable Markdown. Extra properties are done with YAML, and you can see the occasional structure element. What we can do now with the content is completely up to us. E.g. we once had a documentation entity that served this content in an Ember application. Once Ember proved to be the wrong tool for this project, we were able to switch to a Metalsmith site in a day! Content stayed the same, editing processes stayed the same. The output was completely different. We’re doing similar stuff with other web entities as well: Our WordPress driven blogs use WordPress only for the user interface, but the data storage and the render process are a little bit different and talk mostly to WordPress’s APIs. Like WordPress does with it’s Calypso UI, but on the other end of the process.

      So we have data storage and render process like we want it. Now it’s the time to rethink the actual content management. We see that lots of CMSs start reducing themselves to be “just” the editing interface and the data storage. Once we find a way to feed the CMS data storage with what we have in files (and vice versa), we might have found the editing interface for us.

      I just see that this is a topic on its own :D

      Why staying static? Security fortress. Performance boost :-)

      1
    • 9

      Phil Hawksworth

      August 3, 2016 6:54 pm

      Mike,

      I hear that!

      Sometimes though, a CMS brings a lot of baggage with it and can add complexity and risk to your architecture.

      Personally I think that there are ways to have a CMS and use a static site generator. Stefan mentioned Contentful who host a headless CMS as a service. Having them provide and host a rich content authoring experience, and then consuming that data in the build of your SSG feels like having your cake and eating it!

      I experimented with a proof of concept which used that approach for both server-side and client-side rendering and it worked out nicely:

      https://comedyinthecrown.com

      I blogged about it a little here:

      http://hawksworx.com/blog/isomorphic-rendering-on-the-jam-stack/

      3
      • 10

        Steve Schofield

        August 4, 2016 4:12 pm

        Great initial article by the author and very interesting proof of concept article.

        At [Beach](www.beach.io), we’ve built a number of sites using a similar workflow with our own tools, Continuous Deployments and hosting via [Forge](www.getforge.com), [Hammer](www.hammerformac.com) and the Hammer Cloud service within Forge for static site generation and we’re experimenting with the API-first CMS solutions.

        We’ve so far focussed on [Contentful](www.contentful.com) integration in Hammer and also added support for the open source [Cockpit](www.getcockpit.com) which can be self-hosted and it’s easy to use.

        So far, I’m really happy using the Slim templating language in Hammer – the markup is succinct, clean and you have access to ruby.

        I’ve written about how we’re approaching this:

        http://guild.beach.io/t/static-cms-concept/103/1

        The thing I’d say sums up this whole movement, is that I really enjoy building websites this way – it’s put the whole FUN factor back into the process for me, allowing me to concentrate on what I like doing (design, code, content) and less or none of the stuff I don’t like (sys ops, security patches, server maintenance)

        Now that Forge also supports .htaccess like capabilities – URL rewrites, redirects etc. it makes migration much more viable.

        The more we speak about this topic with tangible examples, the better.

        0
  5. 11

    Great article. Thanks for sharing. Just one question. Why not use something like imgix to process images? It would be like doing something similar to what contentful does for content with your images.

    2
    • 12

      Yeah, I use imgix with Jekyll, too. It’s amazing, because there’s no vendor lock-in, the CDN works well, and it’s cheap and easy to use.

      0
    • 13

      Very good question. To be honest, we haven’t looked into image processing services, because the task at hand was very straightforward. Take the 1600×900 px product screen and create 1200, 1000, 800, 600, 400, 200 pixel versions of it. This lead to the imagemagick build process. This example was actually something to showcase how important incremental builds are for big calculations. I will however take a good, good look into imgix, because it just looks amazing! :-)

      1
    • 14

      Pardon my ignorance, but I find it really astonishing, that there is no solution or technology in the web, that relieves us from this content duplication (and forces us to duplicate images in various resized resolutions). This is valid for pictures and videos by the way. Probably also has something to do with the fact that essentially no new image formats could be established since the beginning of the web era.

      It should be possible to have an algorithm that allows for retrieving content in multiple streams like a basic 360p version which could be enhanced with a 720p stream (only containing the missing information) or further enhanced with a 1080p stream and so on. Same goes for images as they account for the greatest part of traffic on our websites and can slow down page loads dramatically. It shouldn’t be necessary to store content in multiple ways for the same purpose.

      Solving this would also eliminate high-praised approaches for fast content-serving like Facebook is doing by serving a “small” 200 byte image representation before actually loading a full version of the picture, meaning that for each picture served 200 bytes of traffic are totally wasted. You can do the math what kind of traffic this is inducing for all pictures served through Facebook alone on your own…

      0
  6. 16

    Paulo Griiettner

    August 2, 2016 5:34 pm

    For more that I love everything about this concept, unfortunately it would not be possible to implement to our project. We have a lot of different data that is being consumed by the website and they are so dynamic, changing every minute… So… on my word, I can not apply such thing… maybe on a different and smaller project…

    1
  7. 19

    What a great and thorough article! Thanks Stefan!

    Did you ever consider Middleman during the project? It comes with a decent built-in asset pipeline, but one of its latest features, the external pipeline, lets you use Gulp, Grunt, Webpack or whatever you want. So there’s no technology lock-in when it comes to handling assets. It also offers different templating languages to play with. I was thinking it would be a good fit as I was reading the article.

    2
    • 20

      o/ Great that you liked it! We gave Middleman some tries and love some of the concepts, and even though we didn’t know of the external pipeline (which sounds great) we actually felt that it was much, much more open to our extensions than Jekyll ever was.
      We decided against it however to keep the codebase between the content editor and the site generation process uniform. So it was more a decision for a JavaScript-only environment where we had some expertise.

      2
  8. 21

    Salem Ghoweri

    August 2, 2016 10:12 pm

    Awesome article!

    Totally nailed it on the (potentially) painfully slow processing time with having bulk image resizing and optimization as part of the front-end build process (yay Gulp!).

    Any chance any of the build tasks are up on a public repo somewhere?

    Would really love to check out how that GraphicsMagic code ties in with the rest of build process!

    0
  9. 23

    Mathias Christensen

    August 3, 2016 12:31 am

    Awesome article! Always great to get the insights from some of the large real-world projects build with this new stack.

    The gradual migration by proxying through requests to an legacy dynamic origin is something we’ve seen a lot at Netlify. We handle that at the CDN level so you don’t have to give up on CDN hosting and essentially send all traffic to a central Nginx box to achieve that.

    Also really interesting to see the divine and conquer approach by splitting the build into different pieces. I think that will always be useful, but for something that’s around ~2000 pieces of content, you could also consider replacing Jekyll with Hugo for just building out the HTML pages. Hugo can typically crunch through 2k pages in seconds…

    3
    • 24

      Awesome. Great that you liked it :-) It’s good to get feedback on a project from somebody who spends a great deal of the day working with/for static site generators ;-) Huge Netlify fan!

      I really have to give Hugo another ..go. We chose Metalsmith in the end mostly because of it’s flexibility, the plugin architecture and the ability to use Handlebars (which also fuels our Node apps and our client side apps), even though Metalsmith lacks the ability of incremental builds. Can’t beat the power of natively compiled, though.

      I also like that you can easily distribute Hugo across platforms. Something that was a big deal with the Node+Ruby setup we had earlier, especially for the content editors and especially on Windows. We even had the idea to a) get rid of Ruby and b) bundle everything Node.js related in an Electron-app which has a magic “build me” button :D Never became more than a prototype, though.

      0
  10. 25

    Radu Serbanescu

    August 3, 2016 12:06 pm

    Thank you for the article. I am really interested in the subject of static site generators :)

    1
  11. 26

    Mike Neumegen

    August 4, 2016 3:00 am

    Great article Stefan. The way separate the concerns of content/asset management makes a lot of sense.

    > Not being able to color text red would be a relief for developers and would make the website more consistent.

    Yes exactly! If you give authors the kitchensink of editing tools the branding consistency will soon get out of whack. We share the same sentiment at CloudCannon. Our approach is to try and provide smart ways of structuring data so the author doesn’t need to style it.

    I’m curious, did your authors have any difficulty picking up the branch/pull request workflow?

    1
    • 27

      Hi Mike! Great that you like it \o/

      > I’m curious, did your authors have any difficulty picking up the branch/pull request workflow?

      Actually, from all the new steps we introduced the branch / PR workflow was actually the one that everyone loved from day 1. I think the idea of staging content and submitting changes for review was something that really fit well with the way our content editors worked already. The names were a little hard to grasp at first, but once everybody knew that pushing green buttons brings you to a review (with a locked master/dev branch of course), everything was easy :-)

      > Our approach is to try and provide smart ways of structuring data so the author doesn’t need to style it

      Love it! Content Management System, not Content Design Software :D

      0
  12. 28

    Hey Stefan,

    Great article, thanks for sharing.

    Interesting that in your Electron App wrapper; you’ve basically written a CMS :)

    We did the same for similar reasons – I think you’ll like Lackey CMS. I’ve used Jekyll and static site generation is on the Lackey roadmap but I mean more in the way we’ve approached the project architecture so it can support any multidimensional “Variant” and that it’s favouring Node over Ruby. Would be good to get your thoughts, links are:

    https://lackey.io (EYOD site example, source is FOSS)
    https://github.com/getlackey/lackey-cms (CMS core implemented as NPM module)
    https://github.com/getlackey/lackey-cms-site (example instance)

    Cheers,

    Rob

    0
    • 29

      I think a big red “build me” button hardly qualifies as a CMS, but thank you :D

      Lackey sounds *really* good… will definitely give it a try over the weekend. Judging from a peek at the source this really seems going into the direction we went when extending our preview editor, right down to selecting an animal as logo :O

      Will get in touch once I tried it!

      0
  13. 30

    Stefano Verna

    August 5, 2016 2:25 pm

    Contentful is great but it’s quite a generalist service. If what your need is exactly an hosted CMS to let your clients/editors manage static websites content, I suggest giving DatoCMS (https://www.datocms.com) a try. It’s been designed from the ground up to to manage this scenario, and provides great plugins to integrate it with the most popular static site generators (Middleman, Jeckyll, Hugo, etc.)

    1
  14. 32

    Thanks for a great article!

    Have you considered (hypothetically) using only Gulp to build your static files (without Jekyll)? It looks more than capable of providing needed functionality and the wide variety of plugins certainly helps. I can see that with projects of your scope Jekyll can significantly reduce development time, but were there any conserns with using Gulp as an ultimate multitool aside from that?

    1
    • 33

      Yes, we have… and we have one or the other build out there that uses exclusively Gulp to handle static sites. Gulp gives you a lot of flexibility and you can choose practically every template language, but you also have to take care about a lot of tasks that static site generators usually do for you. E.g.: Having global site data, having correct permalink handling. This is all possible, but with extra attention. This is why we love Metalsmith so much. It has all the flexibility from Gulp, integrates with Gulp, but has a few extra tweaks specifically designed for static site generation.

      2
  15. 34

    Stuart McCoy

    August 9, 2016 7:38 pm

    Why not avoid the build time and use something like Statamic, Kirby, or Grav? Sure, there’s a small hit while the templates are interpreted on the server but this is minimal and the savings from not having to do anything other than press save on your entry has to count for something.

    0
    • 35

      This is a very good question. One reason not to use a flat file CMS is that, while you get all the benefits of having your contents as files and versionable, you still need a CMS to display the actual contents. And we weren’t sure if either of those could handle the amount of traffic that we have on our pages (which is quite considerable … we not only have roughly 2000 pages, but also the traffic for that). So the output was kind of given here.

      0
  16. 36

    Skylar Challand

    August 17, 2016 4:21 pm

    Great article! Having built countless websites for clients over the years, we came across many of the same problems and ultimately built a similar solution called Siteleaf. It’s a client-facing CMS built on Jekyll, with cloud-preview and JSON API for content. Might be useful for readers who don’t want to reinvent the wheel :)

    1
  17. 37

    I highly recommend to have a look at Lektor, which also features, among other things, a slick UI for content editing and incremental build.

    0

↑ Back to top