Building a blog with make, pandoc and Python

by Martin Fischer published on 2020-10-06

I have been wanting to create a personal website for a long time, but now I have finally figured out a domain name and toolchain that I am happy with.

There are many static-site generators out there but the ones I tried were too complex for my taste. The features that I wanted were just:

convert Markdown to HTML
wrap the resulting HTML in a template with some variables from a YAML frontmatter
autogenerate a list of all posts on the startpage
autogenerate an Atom feed with my latest posts so that people can subscribe

Two powerful hammers

The answer to the question “How do I convert A to B?” is pandoc. Pandoc also supports YAML frontmatter and templates, so we have got the first two points covered.

To recursively search all subdirectories for .md files and pass the filepath to pandoc, you could use the find command. A better way is to use good old make. A declarative Makefile is not only more readable than a shell script since it makes the dependencies explicit, we also get incremental builds for free. (You can event build concurrently with -j). The Makefile I came up with looks somewhat like:

MARKDOWNFILES:=$(shell find . -type f -name '*.md')
HTMLTARGETS:=$(MARKDOWNFILES:%.md=build/%/index.html)
PANDOCFLAGS = -f markdown -t html5 -s -c /static/style.css

all: $(HTMLTARGETS) build/index.html build/static

build/static: static
    cp -r static/ build/

build/%/index.html: %.md begin.html header.html
    @mkdir -p $(dir $@)
    pandoc $(PANDOCFLAGS) -B begin.html -H header.html $< -o $@

clean:
    rm -Rf build

A pinch of Python

To autogenerate the HTML index and the Atom feed, I had an idea: The HTML files generated by Pandoc should already contain all the necessary metadata, so I can just scrape my local HTML files to generate the indexes. To make the post content extractable, I had to customized the Pandoc HTML template:

Dump the default template with pandoc -D html5 > template.html.
Edit template.html to wrap $body$ in a <div id="content">. (I also removed the xmlns from the body to make parsing the HTML from Python easier.)
Add --template template.html to the PANDOCFLAGS.

The 70-line Python script I came up with, mkfeed.py, takes a directory path, reads the contained .html files and generates an index.html and an atom.xml in the same directory.

To use the script I have the following two Makefile targets:

build/posts/index.html: build/posts
    TITLE=push-f.com URL=https://push-f.com/ mkfeed.py posts

build/index.html: index.md build/posts/index.html
    pandoc $(PANDOCFLAGS) -A build/posts/index.html $< -o $@

The nice thing about the script parsing HTML files instead of the source files is that it doesn’t have to bother with YAML (which requires a dependency to be parsed reliably) and doesn’t have to invoke pandoc (which is the job of make) to get the HTML for the Atom feed. Furthermore it nicely decouples the index generation from the source format (I could theoretically ditch Pandoc for something else as long as the other tool still provides the same anchor points in the HTML).

Pushing it online

To deploy my website I just push my git repository, which then triggers the following post-receive git hook:

WEB_DIR="$HOME/public_html"
git --work-tree=${WEB_DIR} checkout --force
cd "$WEB_DIR"
export PATH="$HOME/repos/tools:$PATH"
make clean all

For drafts I simply use a Basic-Auth protected directory.

Conclusion

I am happy with this toolchain because I completely understand it and don’t have to bother with complex static-site generators.

While mkfeed.py currently does not support any kind of tags or categorization, it could be easily extended to understand <meta name="keywords" content="web, Python">.

If you enoyed this read, feel free to subscribe to my Atom feed :)