Simple serialization format for logging and analytics in Go
part of Go Cookbook
I was looking to save multiple records with somewhat flexible schema to a file. Go has plenty of options but I couldn’t find anything that was just right.
My desired features were:
Here are some most popular available options and why they don’t exactly fit the bill:
I designed and implemented my own format in package siser (which stands for Simple Serialization).
You’ll not be surprised by how it looks:
url: http://blog.kowalczyk.info/index.html
code: 200
large field:+13789
this is large data, 13789 bytes in size...
---
It’s a typical key/value serialization with one neat feature: support for large data (e.g. an image or long text).
The format is line oriented. Each line is ${key}: ${value}\n.
If the value is larger than 120 bytes or is not ascii text (with bytes outside of 32-127 range), I serialize it as large value:
${key}:+${value_length}\n
${value}\n
To separate records I use ---\n.
I use this format for 2 main purposes:

Using the library

When used for analytics, each record represents an event. I save the events to a file and later on process the whole file record-by-record and calculate desired statistics.
Here’s how we would log info about HTTP requests:
func logHTTPRequest(w io.Writer, url string, ipAddr string, statusCode int) error {
    var r siser.Record
    // you can append multiple key/value pairs at once
    r.Append("url", url, "ipaddr", ipAddr)
    // or assemble with multiple calls
    r.Append("code", strconv.Itoa(statusCode))
    /// ... more fields
    d := r.Marshal()
    _, err := w.Write(d)
    return err
}
Here’s a full example of logging HTTP requests.
Let’s say we wrote the data to http_access.log file. Here’s how we would process the records:
f, err := os.Open("http_access.log")
panicIfErr(err)
defer f.Close()
r := siser.NewReader(f)
for r.ReadNext() {
    _, record := r.Record()
    code, ok := r.Get("code")
    // get rest of values and do something with them
}
panicIfErr(r.Err())
Here’s a full example of calculating basic daily statistics on HTTP requests (most frequently visited pages, most frequent 404s, most frequent referrers).

Annoyances

The library doesn’t offer marshaling directly to/from structs. Could be added with a bit of reflection.
It doesn’t directly support non-string types like int or time.Time. You have to convert them to/from string yourself.

Implementation notes

Some implementation decisions were made with performance in mind.
Given key/value nature of the record, an easy choice would be to use map[string]string as argument to encode/decode functions.
However []string is more efficient than a map. A slice can be reused across multiple records. We can clear by re-slicing as empty slice and reusing underlying array. A map would require allocating a new instance for each record, which would create a lot of work for garbage collector.
When serializing, you need to use Reset method to get the benefit of efficient re-use of the record.
When reading and deserializing records, siser.Reader uses this optimization internally.
The format avoids the need for escaping keys and values, which helps in making encoding/decoding fast.
How does that play out in real life? I wrote a benchmark comparing siser vs. json.Marshal. It’s roughly similar when writing but about 8x faster when reading.
$ go test -bench=.
BenchmarkSiserMarshalWriteMany-12        1329409               871.1 ns/op
BenchmarkSiserMarshalWriteSingle-12      1000000              1088 ns/op
BenchmarkJSONMarshal-12                  1340370               767.9 ns/op

BenchmarkSiserUnmarshal-12               4720020               251.5 ns/op
BenchmarkJSONUnmarshal-12                 603591              2036 ns/op
The format is binary-safe and works for serializing large values e.g. you can serialize png image
It’s also very easy to implement in any language.
go
Jul 9 2017

Feedback about page:

Feedback:
Optional: your email if you want me to get back to you: