Why Oembed Proxy for GitHub
I’m writing a programming book
Essential Go in
Notion and I need to include code snippets.
Notion has support for code blocks but it’s not good enough for my use case.
I want to make sure the code compiles so I write small programs and store them in GitHub repository. My custom book building script compiles and runs the programs to ensure the code is correct.
Notion supports embedding GitHub’s gists but not files from git repositories hosted on GitHub.
I researched things and turns out there’s a standard called
Oembed that was created to enable embedding arbitrary content from one website in another.
I didn’t find existing service that can provide Oembed support for GitHub repositories so I built one myself.
What is Oembed?
Let’s say you’re implementing a rich-text editor on the web and you want to allow embedding arbitrary content from other services: a tweet, a youtube video, a flickr photo.
You can add code to support each service you know about (and pray that they provide a stable way to get the necessary information) but it’s not scalable. There are way too many web services out there and more are created every day.
Some people noticed the problem and created Oembed protocol that provides a standard way to expose content for embedding. Now you need to only write code to support Oembed.
Here’s how it works, using Oembed Proxy for GitHub as an example.
If you view that page it’s the file from GitHub with source code highlighting.
Oembed supports auto-discovery. If you peek at HTML of that page, you’ll see this is <head>
section:
<link rel="alternate" type="application/json+oembed" href="https://www.onlinetool.io/gitoembed/oembed?format=json&url=https://github.com/essentialbooks/books/blob/master/README.md" title="README.md" />
<link rel="alternate" type="text/xml+oembed" href="https://www.onlinetool.io/gitoembed/oembed?format=xml&url=https://github.com/essentialbooks/books/blob/master/README.md" title="README.md" />
Those are instructions telling Oembed client (Notion in this example) how to get embeddable HTML.
Oembed supports 2 formats for providing this information: JSON and XML. In my testing Notion worked with just application/json+oembed
but I implemented both just in case other clients only understand XML.
{
"version": 1,
"type": "rich",
"provider_name": "gitoembed",
"provider_url": "https://www.onlinetool.io/gitoembed/",
"height": 320,
"width": 720,
"title": "README.md",
"html": "\u003ciframe width=\"100%\" height=320 src=\"https://www.onlinetool.io/gitoembed/widget?url=https://github.com/essentialbooks/books/blob/master/README.md\" frameborder=\"0\" onload=\"resizeFrame(this);\"\u003e\u003c/iframe\u003e"
}
I hope the format is mostly self-explanatory.
The interesting bit is html
field, which is:
<iframe
width="100%"
height=320
src="https://www.onlinetool.io/gitoembed/widget?url=https://github.com/essentialbooks/books/blob/master/README.md"
frameborder="0"
onload="resizeFrame(this);">
</iframe>
We could send the actual HTML content to insert but it’s more customary to send an iframe which loads the html.
In my implementation src
of the iframe is the same page from which we extracted Oembed JSON link so it serves double-duty as both the content and an Oembed pointer to the content. Those could be different URLs.
Implementation details of Oembed Proxy for GitHub
First I needed a server. Usually, I use Digital Ocean but this time I went for biggest bang for the buck and used
C2L server from
Scaleway.
For ~$30 I get 8 core server with 32 GB of RAM, 250 GB SSD drive and 600 MBits/s unmetered bandwidth. It’s a bare metal server, not a VPS, so it’s all mine, eliminating risk of noisy neighbors.
On Digital Ocean the closest server with such specs would be $160. I keep a list of
cheap VPS servers for comparison.
The downside is that the servers are in Europe (you can choose between Amsterdam or Paris) so the latency for users in US will be higher than if the server was hosted in US.
For the OS I went with Ubuntu 18.04. I know it best and it’s one of the most popular distros.
The server is written in Go. It’s my go-to language for writing backend code.
The service isn’t very complicated:
- it downloads the file via GitHub’s https://raw.githubusercontent.com url
- to avoid over-loading GitHub servers (and exceeding their throttling limits) I cache downloaded files for a day. It’s not infinitely long cache because files on GitHub can change and I don’t want to cache outdated version forever
- for code highlighting I use chroma library
There’s even less of front-end code. Explanation of the service and a way to test it implemented with a form and few lines of JavaScript.