Menu Search
Jump to the content X X

Sponsored Post S(GH)PA: The Single-Page App Hack For GitHub Pages

For some time now, I’ve wanted the ability to route paths for a GitHub Pages website to its index.html for handling as a single-page app (SPA). This is table-stakes because such apps require all requests to be routed to one HTML file, unless you want to copy the same file across all of your routes every time you make a change to the project. Currently, GitHub Pages doesn’t offer a route-handling solution; the Pages system is intended to be a flat, simple mechanism for serving basic project content.

In case you weren’t aware, GitHub does provide one morsel of customization for your project website: the ability to add a 404.html file and have it served as your custom error page. I took a first stab at an SPA hack simply by duplicating my index.html file and renaming the copy to 404.html. Turns out that many folks have experienced the same issue1 with GitHub Pages and liked the general idea. However, the problem that some folks on Twitter correctly raised was that the 404.html page is still served with a status code of 404, which is not good for search engine crawlers. The gauntlet had been thrown down, and I decided to answer — and answer with vigor!

Further Reading on SmashingMag:

One More Time, With Feeling Link

After sleeping on it, I thought to myself, “Self, we’re deep in dirty hack territory, so why don’t I make this hack even dirtier?!” To that end, I developed an even better hack that provides the same functionality and simplicity, while also preserving your website’s crawler juice — and you don’t even need to waste time duplicating your index.html file and renaming it to 404.html anymore! The following solution should work in all modern desktop and mobile browsers (Edge, Chrome, Firefox, Safari) and in Internet Explorer 10+.

Template and Demo: If you want to skip the explanation and get the goods, here’s a template repo5, and a test URL to see it in action.

That’s So Meta Link

The first thing I did was investigate other options for getting the browser to redirect to the index.html page. That part was pretty straightforward. You basically have three options: a server config, a JavaScript location manipulation, or a refresh meta tag. The first one is obviously a no-go for GitHub pages. And JavaScript is basically the same as a refresh, but arguably worse for crawler indexing. That leaves us with the meta tag. A meta tag with a refresh value of 0 appears to be treated as a 301 redirect6 by search engines, which works out well for this use case.

You’ll need to start by adding a 404.html file to a gh-pages repository that contains an empty HTML document inside it. That document must total more than 512 bytes (explained below). Next, put the following markup in your 404.html page’s head element:

<script>
  sessionStorage.redirect = location.href;
</script>
<meta http-equiv="refresh" content="0;URL='/REPO_NAME_HERE'">

This code sets the attempted entrance URL to a variable on the standard sessionStorage object and immediately redirects to your project’s index.html page using a meta refresh tag. If you’re doing a Github Organization site, don’t put a repo name in the content attribute replacer text, just do this: content="0;URL='/'"

Customizing Route Handling Link

If you want more elaborate route handling, just include some additional JavaScript logic in the script tag shown above. You can tweak several things: the composition of the href that you pass to the index.html page; which pages should remain on the 404 page (via dynamic removal of the meta tag); and any other logic you want to put in place to dictate what content is shown based on the inbound route.

512 Magical Bytes Link

This is, hands down, one of the strangest quirks I have ever encountered in web development. You must ensure that the total size of your 404.html page is greater than 512 bytes, because if it isn’t, Internet Explorer will disregard it and show a generic browser 404 page instead. When I finally figured this out, I had to crack open a beer to cope with the amount of time it took.

Let’s Make History Link

To capture and restore the URL that the user initially navigated to, you’ll need to add the following script tag to the head of your index.html page before any other JavaScript acts on the page’s current state:

<script>
  (function(){
    var redirect = sessionStorage.redirect;
    delete sessionStorage.redirect;
    if (redirect && redirect != location.href) {
      history.replaceState(null, null, redirect);
    }
  })();
</script>

This bit of JavaScript retrieves the URL that we cached in sessionStorage over on the 404.html page and replaces the current history entry with it. How you choose to handle things from here is up to you, but I’d use popstate and hashchange if I were you.

Well, folks, that’s it. Now go celebrate by writing some single-page apps on GitHub Pages!

This article is part of a web development series from Microsoft tech evangelists and engineers on practical JavaScript learning, open-source projects and interoperability best practices, including Microsoft Edge7 browser.

We encourage you to test across browsers and devices (including Microsoft Edge — the default browser for Windows 10) with free tools on dev.microsoftedge.com8, including the F12 developer tools9: seven distinct, fully documented tools to help you debug, test and speed up your web pages. Also, visit the Edge blog10 to stay informed by Microsoft developers and experts.

(al)

Footnotes Link

  1. 1 https://twitter.com/csuwildcat/status/730558238458937344
  2. 2 https://www.smashingmagazine.com/2015/07/development-to-deployment-workflow/
  3. 3 https://www.smashingmagazine.com/2015/04/creating-web-app-in-foundation-for-apps/
  4. 4 https://www.smashingmagazine.com/2014/08/build-blog-jekyll-github-pages/
  5. 5 https://github.com/csuwildcat/sghpa
  6. 6 http://sebastians-pamphlets.com/google-and-yahoo-treat-undelayed-meta-refresh-as-301-redirect/
  7. 7 https://blogs.windows.com/msedgedev/2015/05/06/a-break-from-the-past-part-2-saying-goodbye-to-activex-vbscript-attachevent/?wt.mc_id=DX_873182
  8. 8 https://dev.windows.com/en-us/?wt.mc_id=DX_873182
  9. 9 https://developer.microsoft.com/en-us/microsoft-edge/platform/documentation/f12-devtools-guide/?wt.mc_id=DX_873182
  10. 10 https://blogs.windows.com/msedgedev/?wt.mc_id=DX_873182

↑ Back to top Tweet itShare on Facebook

Daniel is a Program Manager at Microsoft working on developer-facing products, tools, and evangelism. He enjoys creating new products for developers and consumers that transform their daily experience. Before Microsoft, he founded a startup out of college, then spent 5 years helping to shape the Web at Mozilla.

As for code projects, he authored X-Tag (a Web Components sugar library supported by Microsoft), and continues to work on web standards through participation in various W3 groups. You can check out his GitHub profile for more.

  1. 1
    • 2

      What is this comment supposed to be communicating?

      0
    • 3

      I guess you think this post is something like RawGit? If so: no, it’s about creating dynamic single page apps with route handling by hijacking the 404.html affordance Github provides.

      5
  2. 4

    Trevor Goodchild

    August 16, 2016 3:38 pm

    Thanks for publishing this clever tip! I’ve been looking for ways to make gh-pages a little less static.

    1
  3. 5

    Can serverless SPA routing be done without hashes (#) using this trick?

    0
    • 6

      Yep, that’s what this hack enables. You leverage the 404.html feature of GitHub pages to do an inferred 301 redirect of the inbound dynamic route to the index.html page, then handle the route via popstate once it arrives.

      3
  4. 7

    I first heard about that 512+ bytes IE quirk from reading the source code of the HTML5 Boilerplate project. I like their solution. They just added a brief comment explaining the bug at the end of the HTML document. That fills up the extra space, explains why, and is unlikely to be accidentally deleted by future code editors.

    https://github.com/h5bp/html5-boilerplate/blob/master/src/404.html#L60

    2
  5. 8

    Thumbs up for creative thinking

    0
  6. 9

    That is incredible, thanks for the trick, I’ll definitely try out.
    One thing I don’t get, why sessionStorage.redirect = xxx, not sessionStorage.setItem('redirect', xxx). Is it valid use of session storage at all? Did not know about that way.

    0
    • 10

      Yes, you can set top-level keys of the localStorage and sessionStorage objects with any value, it will just be type converted to a string. That’s fine for this use, as we’re storing a string. If you want to save JSON data directly to a top-level storage key, you must do it like this: sessionStorage.foo = JSON.stringify({...}), if you don’t it will store [object Object]

      0
  7. 11

    Can you explain how your solution differs from https://github.com/rafrex/spa-github-pages ? Seems like the major difference is your redirect instead of a window.location JS redirect, tho according to @rafrex, both are interpreted as 301s by modern crawlers…

    A quick SEO note – while it’s never good to have a 404 response, it appears based on Search Engine Land’s testing that Google’s crawler will treat the JavaScript window.location redirect in the 404.html file the same as a 301 redirect for its indexing. From my testing I can confirm that Google will index all pages without issue, the only caveat is that the redirect query is what Google indexes as the url. For example, the url example.tld/about will get indexed as example.tld/?p=/about.

    1
    • 12

      Sorry, forgot to escape my carets, should read “your <meta> redirect.

      0
    • 13

      For SEO, the meta redirect is useless because the search engine saves the URL that it is redirected to, i.e. just the hostname, username.github.io, and not the path that is stored in session storage. Since the search engine is always redirected to the same url, the ONLY url that it will store for your entire site is username.github.io.

      2
      • 14

        This is incorrect: the meta redirect pushes to the index.html page and that page executes a replaceState() call, which crawlers use to attributed the content of the page to the newly replaced path and parameters.

        0
        • 15

          I tested it (created a dummy site and submitted it to google to index) and google didn’t attribute the content of the page to the `replaceState()` path. It attributed it to where it was redirected to with the 301.

          The other solution also uses a `replaceState()` call in `index.html`, which is probably why the redirect query is what google indexes.

          0
          • 16

            This comment is simply false – to reiterate: crawlers attribute history.replaceState() paths to the content present. Nothing about the meta tag use on the 404.html changes this.

            We use this method in production for the X-Tag library site, and Google has no issues crawling and surfacing our routes: https://www.google.com/?ion=1&espv=2#q=x-tag+docs&pws=0

            I will kindly ask you to stop posting FUD.

            0
    • 17

      Why meta vs window.location: I found conflicting information about how window.location redirects were treated, but clear evidence that meta tag redirection was broadly treated as a 301.

      0
  8. 18

    Started with your code initially, then following on a link in the comments, here’s a tiny node/bower package that’s easy to use and with more options,

    https://github.com/websemantics/gh-pages-spa

    Thanks for sharing,

    -1

↑ Back to top