Skip to main content

Sed: Normalize markdown file with Regex

· 3 min read
Ajay Dhangar
Founder of CodeHarborHub

I have been using web clipper to save articles and blog posts for a while now. It's a great tool to save content from the web and organize it in a clean and readable format. However, the markdown files generated by web clipper are not always consistent, and I often find myself manually editing them to make them more readable.

One of the common issues I encounter is inconsistent formatting of the front matter in the markdown files. The front matter is a block of metadata at the beginning of a markdown file that contains information such as the title, author, tags, date, and description of the content. Here's an example of what the front matter looks like:

---
title: "Sed: Normalize markdown file with Regex"
author: [ajay-dhangar]
tags: [sed, regex, web clipper]
date: 2020-11-26 21:13:28
description: How to normalize markdown file with Regex
draft: false
---

As you can see, the front matter is enclosed in three dashes (---) at the beginning and end of the block, and each key-value pair is separated by a colon (:). The keys and values are also enclosed in single quotes (') to ensure that special characters are escaped properly.

To make the front matter consistent across all my markdown files, I decided to use the sed command-line utility to write a simple regular expression that would normalize the front matter. Here's the regular expression I came up with:

sed -i -E "s/^---\n(.*: .*\n)+---\n//g" file.md

Let's break down the regular expression:

  • ^---\n matches the opening three dashes at the beginning of the file, followed by a newline character.
  • (.*: .*\n)+ matches one or more lines containing a key-value pair, where the key is followed by a colon and a space, and the value is followed by a newline character.
  • ---\n matches the closing three dashes at the end of the block, followed by a newline character.
  • /g is a flag that tells sed to perform the substitution globally, i.e., on all matching lines in the file.

When I run this command on a markdown file, it removes the existing front matter and leaves me with just the content of the file. This is exactly what I want, as I can then manually add a consistent front matter to the file.

I hope this example gives you an idea of how powerful regular expressions can be when used with command-line utilities like sed. With a little bit of practice, you can write regular expressions to perform complex text manipulations with ease. If you're interested in learning more about regular expressions, I highly recommend checking out the RegexOne interactive tutorial, which is a great resource for beginners.