Kyle Banks

Deploying a Jekyll Blog to Amazon S3 Without .html Extensions

Written by @kylewbanks on Oct 15, 2016.

Deploying Jekyll sites on Amazon S3 is an efficient, performant, and super cheap method for web hosting of mostly-static websites. But one thing that bothered me is that, by default, Jekyll outputs your posts with a .html extension, so requests to S3 will have to have the extension as well.

For example:

$ curl -I http://example.com/blog/my-post.html
HTTP/1.1 200 OK

$ curl -I http://example.com/blog/my-post
HTTP/1.1 404 Not Found

Since I personally dislike having unnecessary extras in my URLs, I always avoid file extensions like .html on my sites. This poses a problem with S3, because the built-in request rewrite tools are cumbersome and not overly flexible.

So, the simplest way to resolve this issue is to simply remove the file extension, right? Well, I couldn’t find any documentation to support the ability to customize (or remove) the file extension of Jekyll posts. It seems that .html is the only supported output for posts, so something custom seems to be required.

After browsing around for some plugins that might provide this, I finally decided to just do it the old fashion way and add a step to my deploy script to just rename each post file to remove the suffix.

Here’s a look at how that works:

# Remove the .html extension from all blog posts for sexy URLs
for filename in $DEPLOY_DIR/blog/*.html; do
    if [ $filename != "$DEPLOY_DIR/blog/index.html" ];
    then
        original="$filename"

        # Get the filename without the path/extension
        filename=$(basename "$filename")
        extension="${filename##*.}"
        filename="${filename%.*}"

        # Move it
        mv $original $DEPLOY_DIR/blog/$filename
    fi
done

Just a simple loop over the .html files in the /blog directory, ignoring index.html, and stripping the extension.

If you’re curious, here’s what the full deploy script looks like:

#!/bin/bash
# 
# Cleans and deploys the project to S3.
#
# Usage:
#   ./deploy.sh <ACCESS_KEY> <SECRET_KEY>

# Initialize some vars
export AWS_ACCESS_KEY_ID="$1"
export AWS_SECRET_ACCESS_KEY="$2"
export AWS_DEFAULT_REGION="us-east-1"
export BUCKET="kylewbanks.com"

export DEPLOY_DIR=".deploy"

# Build jekyll
jekyll build

# Copy the site directory to a temporary location so that modifications we make don't get overwritten by the Jekyll server
# that is potentially running
mkdir -p $DEPLOY_DIR
cp -a _site/. $DEPLOY_DIR

# Remove the .html extension from all blog posts for sexy URLs
for filename in $DEPLOY_DIR/blog/*.html; do
    if [ $filename != "$DEPLOY_DIR/blog/index.html" ];
    then
        original="$filename"

        # Get the filename without the path/extension
        filename=$(basename "$filename")
        extension="${filename##*.}"
        filename="${filename%.*}"

        # Move it
        mv $original $DEPLOY_DIR/blog/$filename
    fi
done

# Now upload to s3, deleting any items that no longer exist
aws s3 sync --delete $DEPLOY_DIR s3://$BUCKET

# Finally, upload the blog directory specifically to force the content-type
aws s3 cp "$DEPLOY_DIR/blog" s3://$BUCKET/blog --recursive --content-type "text/html"

# Cleanup
rm -r $DEPLOY_DIR

You’ll notice the second last line re-uploads the /blog directory to explicitly set the content type. This was required because without the .html file extension, it seems that the S3 content-type guessing becomes suspect, and you may or may not get the proper content-type.

Running this script from the root of your Jekyll project will upload the entire site to S3, with clean URLs for all blog posts.

Let me know if this post was helpful on Twitter @kylewbanks or down below!