Home / How I implemented Oembed Proxy for GitHub edit

Why Oembed Proxy for GitHub

I'm writing a programming book Essential Go in Notion and I need to include code snippets.

Notion has support for code blocks but it's not good enough for my use case.

I want to make sure the code compiles so I write small programs and store them in GitHub repository. My custom book building script compiles and runs the programs to ensure the code is correct.

Notion supports embedding GitHub's gists but not files from git repositories hosted on GitHub.

I researched things and turns out there's a standard called Oembed that was created to enable embedding arbitrary content from one web site in another.

Notion supports Oembed.

I didn't find existing service that can provide Oembed support for GitHub repositories so I built one myself.

This article describes high-level design of Oembed Proxy for GitHub.

What is Oembed?

Let's say you're implementing a rich-text editor on the web and you want to allow embedding arbitrary content from other services: a tweet, a youtube video, a flickr photo.

You can add code to support each services you know about (and pray that they provide a stable way to get the necessary information) but it's not scalable. There's way too many web services out there and more are created every day.

Some people noticed the problem and created Oembed protocol that provides a standard way to expose content for embedding. Now you need to only write code to support Oembed.

Here's how it works, using Oembed Proxy for GitHub as an example.

In our example Notion is an Oembed client that wants to embed a file from GitHub repository https://github.com/essentialbooks/books/blob/master/README.md in the body of a document.

Notion supports Oembed standard. If GitHub supported Oembed on their servers, you would add embed block and use https://github.com/essentialbooks/books/blob/master/README.md link directly.

GitHub doesn't support Oembed so instead you can use link via my Oembed Proxy: https://www.onlinetool.io/gitoembed/widget?url=https%3A%2F%2Fgithub.com%2Fessentialbooks%2Fbooks%2Fblob%2Fmaster%2FREADME.md

This is https://www.onlinetool.io/gitoembed/widget page with GitHub link provided as url argument.

If you view that page it's the file from GitHub with source code highlighting.

Oembed supports auto-discovery. If you peek at HTML of that page, you'll see this is section:

<link rel="alternate" type="application/json+oembed" href="https://www.onlinetool.io/gitoembed/oembed?format=json&url=https://github.com/essentialbooks/books/blob/master/README.md" title="README.md" />
<link rel="alternate" type="text/xml+oembed" href="https://www.onlinetool.io/gitoembed/oembed?format=xml&url=https://github.com/essentialbooks/books/blob/master/README.md" title="README.md" />

Those are instructions telling Oembed client (Notion in this example) how to get embeddable HTML.

Oembed supports 2 formats for providing this information: JSON and XML. In my testing Notion worked with just application/json+oembed but I implemented both just in case other clients only understand XML.

Oembed client parses HTML to extract those links and, if present, gets Oembed information. In our example it's in https://www.onlinetool.io/gitoembed/oembed?format=json&url=https://github.com/essentialbooks/books/blob/master/README.md and looks like this:

{
	"version": 1,
	"type": "rich",
	"provider_name": "gitoembed",
	"provider_url": "https://www.onlinetool.io/gitoembed/",
	"height": 320,
	"width": 720,
	"title": "README.md",
	"html": "\u003ciframe width=\"100%\" height=320 src=\"https://www.onlinetool.io/gitoembed/widget?url=https://github.com/essentialbooks/books/blob/master/README.md\" frameborder=\"0\" onload=\"resizeFrame(this);\"\u003e\u003c/iframe\u003e"
}

I hope the format is mostly self-explanatory.

The interesting bit is html field, which is:

<iframe 
	width="100%"
	height=320 
	src="https://www.onlinetool.io/gitoembed/widget?url=https://github.com/essentialbooks/books/blob/master/README.md" 
	frameborder="0"
	onload="resizeFrame(this);">
</iframe>

We could send the actual HTML content to insert but it's more customary to send an iframe which loads the html.

In my implementation src of the iframe is the same page from which we extracted Oembed JSON link so it serves double-duty as both the content and an Oembed pointer to the content. Those could be different URLs.

Implementation details of Oembed Proxy for GitHub

First I needed a server. Usually I use Digital Ocean but this time I went for biggest bang for the buck and used C2L server from Scaleway.

For ~$30 I get 8 core server with 32 GB of RAM, 250 GB SSD drive and 600 MBits/s unmetered bandwidth. It's a bare metal server, not a VPS, so it's all mine, eliminating risk of noisy neighbors.

On Digital Ocean the closest server with such specs would be $160. I keep a list of cheap VPS servers for comparison.

The downside is that the servers are in Europe (you can choose between Amsterdam or Paris) so the latency for users in US will be higher than if the server was hosted in US.

For the OS I went with Ubuntu 18.04. I know it best and it's one of the most popular distros.

The server is written in Go. It's my go-to language for writing backend code.

The service isn't very complicated:

  • it downloads the file via GitHub's https://raw.githubusercontent.com url
  • to avoid over-loading GitHub servers (and exceeding their throttling limits) I cache downloaded files for a day. It's not infinitely long cache because files on GitHub can change and I don't want to cache outdated version forever
  • for code highlighting I use chroma library

There's even less of front-end code. Explanation of the service and a way to test it implemented with a form and few lines of JavaScript.

Go to index of articles.

Share on