I had a project idea for a small Go package called notionmd that converts markdown into Notion blocks and here’s a short story of my thoughts throughout the build.
The problem
Why am I doing this? Do I just love abstract syntax trees? Sorta, but that’s not the reason. Here’s the issue: I receive vulnerability findings and reports from security researchers in markdown format. This is great because markdown is wonderful, simple, and ubiquitous!
This is less great because I want to organize these reports in Notion automatically but the Notion API expects blocks not markdown.
💡 I want something that converts all this markdown into beautifully formatted Notion pages automatically.
What do I want the solution to feel like?
The ideal experience for this is that I could pass in whatever markdown I want into a single function and magically get a bunch of Notion blocks back.
I shouldn’t have to know what an abstract syntax tree is or even what each of the block types are. I just want a tidy workspace of nicely formatted security findings in Notion.
Neat, let’s build it
So with a rough idea of the experience I’m looking for, let’s get a little more specific on how we can get from A → B.
- Input Markdown content.
- Parse markdown into a tree of markdown nodes (I’ll explain this in a second).
- Split long documents into smaller chunks
- Convert each node in the tree to its corresponding Notion block.
- Output an array of Notion blocks for use with the Notion API.
What on earth is an abstract syntax tree?
For the purposes of our problem statement, it’s a way to derive meaning from the syntax of our markdown documents. We use the derived meaning to match the corresponding markdown element to Notion’s blocks. Let me show you just how the two sources depict the same thing.
Heading - Markdown
Here we’ve got a heading in markdown. You can tell it’s a level 1 heading because before the text content there’s a #
symbol. In markdown, once instance of that symbol preceding some text indicates H1
.
# Here is a heading
Heading - Notion
Here we’ve got a heading as a Notion block. You can tell it’s an H1 because of the heading_1
object key before the rich text content array.
{
"heading_1": {
"rich_text": [
{
"type": "text",
"plain_text": "Here is a heading",
"text": {
"content": "Here is a heading"
}
}
],
"is_toggleable": false
}
}
These clearly look nothing alike syntactically, but the user experiences them both as “the biggest of the heading text options”.
This syntax difference is the core of our problem. We have two ways of representing the same information and they look nothing alike. So we need to break the data down into a structured format that’s a little closer to our target. If you want to dive into ASTs more, I recommend checking out the wonderful Robert Nystrom and his book Crafting Interpreters. It’s a great read and will explain this far better than I could.
For our purposes, we’re going to outsource this “markdown to tree” wizardry to a markdown parsing package. The piece of the puzzle we need from this library is a markdown parser.
“A parser takes the flat sequence of tokens and builds a tree structure that mirrors the nested nature of the grammar. These trees have a couple of different names—parse tree or abstract syntax tree—depending on how close to the bare syntactic structure of the source language they are.”
Robert Nystrom
Creating structure
Our markdown parsing dependency comes with an exported function Parse
. This function converts Markdown into something a bit more structured. Here’s how:
func (p *parser.Parser) Parse(input []byte) ast.Node
- Input Markdown Content: The function starts with Markdown content, usually as a byte array representing the raw text.
- Lexical Analysis: It breaks down the Markdown into tokens.
- Building the Abstract Syntax Tree (AST): Using the tokens, the function builds an Abstract Syntax Tree (AST). This tree-like structure shows the hierarchy of the Markdown document. For example, a list node has child nodes for each list item. A heading node has child nodes for the heading text.
- Node Types: AST nodes represent different Markdown elements like headings (
heading_1
,heading_2
), paragraphs, and lists. Each node includes the text and formatting details. - Output AST: The final result is the root node of the AST, representing the entire Markdown document with all its elements.
Mapping nodes to blocks
At this point we’ve got our markdown parsed into a tree of nodes. The nodes have similar names to Notion blocks. So now we just need to write small, testable, functions that convert different types of nodes into their corresponding blocks. I’ll give a simple example to show what I mean here. The rest of the implementation is available in the GitHub repository.
Here’s the function to convert an AST heading into a Notion heading block. We pass in a heading node and depending on the heading level (1,2,3) we return the corresponding Notion block. For any headings beyond level 3, we still just return a Notion heading 3 as that’s the smallest heading size Notion supports.
// convertHeading converts an AST heading node to a Notion block.
//
// It takes a pointer to an ast.Heading node as input and returns a Notion block and an error.
// The function extracts the text content from the heading node and creates a Notion block
// based on the heading level. If the heading level is not 1, 2, or 3, it treats the node as an h3.
// The function returns the corresponding Notion block and an error if any.
func convertHeading(node *ast.Heading) notion.Block {
if node.GetChildren() == nil {
return nil
}
if node.Level == 1 {
return notion.Heading1Block{
RichText: chunk.RichText(string(node.Children[0].AsLeaf().Literal), nil),
}
}
if node.Level == 2 {
return notion.Heading2Block{
RichText: chunk.RichText(string(node.Children[0].AsLeaf().Literal), nil),
}
}
return notion.Heading3Block{
RichText: chunk.RichText(string(node.Children[0].AsLeaf().Literal), nil),
}
}
And here’s a related test that asserts given the provided ast node, we return the expected Notion block.
t.Run("can convert heading level 2", func(t *testing.T) {
input := "Heading level 2"
node := &ast.Heading{
Level: 2,
Container: ast.Container{
Children: []ast.Node{
&ast.Leaf{
Content: []byte("## " + input),
Literal: []byte(input),
},
},
},
}
expected := notion.Heading2Block{
RichText: []notion.RichText{
{
Type: notion.RichTextTypeText,
Text: ¬ion.Text{Content: input},
PlainText: input,
},
},
}
result := convertHeading(node)
assertHeadingBlockEqual(t, expected, result)
})
Are we done?
Almost. We’ve converted our markdown nodes to Notion blocks but we’ve got a problem. When we write a really long document our API request to Notion fails. Why?
According to Notion’s API limits, the max characters of a single element in a rich text array is 2000
. So if we have a document with a massive paragraph, our POST will fail. We can solve this problem by using a technique called chunking. Chunking is basically breaking apart large consecutive spans of data into smaller segments. Given 2500
characters, we could chunk that into one segment of 2000
and another of 500
.
// RichText builds a new rich text block every 2000 characters of the provided string content.
func RichText(content string, annotations *notion.Annotations) []notion.RichText {
var blocks []notion.RichText
if len(content) <= CharacterLimit {
richText := notion.RichText{
Type: notion.RichTextTypeText,
Text: ¬ion.Text{
Content: content,
},
PlainText: content,
Annotations: annotations,
}
blocks = append(blocks, richText)
} else {
for i := 0; i < len(content); i += CharacterLimit {
end := i + CharacterLimit
if end > len(content) {
end = len(content)
}
chunk := content[i:end]
richText := notion.RichText{
Type: notion.RichTextTypeText,
Text: ¬ion.Text{
Content: chunk,
},
PlainText: chunk,
Annotations: annotations,
}
blocks = append(blocks, richText)
}
}
return blocks
}
Creating the converter
The last step is tying everything together into one public function that can be imported by consumers of the package.
Remember, we want
- No requirement of prior knowledge of ASTs or Notion blocks
- Should be simple to use from a single function
// Convert takes a markdown document as text, parses it into an AST node,
// and iterates over the tree with the convertNode function, converting each
// of the nodes to Notion blocks.
func Convert(markdown string) ([]notion.Block, error) {
// Parse the markdown document into an AST node
extensions := parser.CommonExtensions
p := parser.NewWithExtensions(extensions)
document := p.Parse([]byte(markdown))
var blocks []notion.Block
ast.WalkFunc(document, func(node ast.Node, entering bool) ast.WalkStatus {
if !entering {
return ast.GoToNext
}
if isImage(node) {
return ast.GoToNext
}
if isList(node) {
list := convertList(node.(*ast.List))
blocks = append(blocks, list...)
return ast.SkipChildren
}
if isBlockquote(node) {
quote := convertBlockquote(node.(*ast.BlockQuote))
blocks = append(blocks, quote)
return ast.SkipChildren
}
if isHeading(node) {
block := convertHeading(node.(*ast.Heading))
blocks = append(blocks, block)
return ast.GoToNext
}
if isParagraph(node) {
block := convertParagraph(node.(*ast.Paragraph))
if block != nil {
blocks = append(blocks, block)
}
return ast.SkipChildren
}
if isCodeBlock(node) {
codeBlock := convertCodeBlock(node.(*ast.CodeBlock))
if codeBlock != nil {
blocks = append(blocks, codeBlock)
}
return ast.SkipChildren
}
return ast.GoToNext
})
return blocks, nil
}
All done!
We’ve got a markdown to Notion blocks converter that respects Notion’s API limits and can handle super long documents. Here’s what it looks like to use!
// Convert the Markdown content to Notion blocks
blocks, err := notionmd.Convert(string(markdown))
if err != nil {
log.Fatalf("Error converting Markdown to Notion blocks: %v", err)
}