I was looking to save multiple records with somewhat flexible schema to a file. Go has plenty of options but I couldn’t find anything that was just right.
My desired features were:
- human-readable so that I can use tools like
grep
or tail
to look at the file
- records are not fixed i.e. can have variable number of fields
- no need to support nested records
- allows for simple and efficient implementation
Here are some most popular available options and why they don’t exactly fit the bill:
- csv - efficient but uses fixed records; not very readable if there are many fields on a single line
- json - not very readable
- protocol buffers - binary so unreadable; needs up-front scheme
I designed and implemented my own format in package
siser (which stands for Simple Serialization).
You’ll not be surprised by how it looks:
url: http://blog.kowalczyk.info/index.html
code: 200
large field:+13789
this is large data, 13789 bytes in size...
---
It’s a typical key/value serialization with one neat feature: support for large data (e.g. an image or long text).
The format is line oriented. Each line is ${key}: ${value}\n
.
If the value is larger than 120 bytes or is not ascii text (with bytes outside of 32-127 range), I serialize it as large value:
${key}:+${value_length}\n
${value}\n
To separate records I use ---\n
.
I use this format for 2 main purposes:
- structured logging to help in debugging
- logging events for analytics
Using the library
When used for analytics, each record represents an event. I save the events to a file and later on process the whole file record-by-record and calculate desired statistics.
Here’s how we would log info about HTTP requests:
func logHTTPRequest(w io.Writer, url string, ipAddr string, statusCode int) error {
var r siser.Record
// you can append multiple key/value pairs at once
r.Append("url", url, "ipaddr", ipAddr)
// or assemble with multiple calls
r.Append("code", strconv.Itoa(statusCode))
/// ... more fields
d := r.Marshal()
_, err := w.Write(d)
return err
}
Let’s say we wrote the data to http_access.log
file. Here’s how we would process the records:
f, err := os.Open("http_access.log")
panicIfErr(err)
defer f.Close()
r := siser.NewReader(f)
for r.ReadNext() {
_, record := r.Record()
code, ok := r.Get("code")
// get rest of values and do something with them
}
panicIfErr(r.Err())
Here’s a
full example of calculating basic daily statistics on HTTP requests (most frequently visited pages, most frequent 404s, most frequent referrers).
Annoyances
The library doesn’t offer marshaling directly to/from structs. Could be added with a bit of reflection.
It doesn’t directly support non-string types like int
or time.Time
. You have to convert them to/from string yourself.
Implementation notes
Some implementation decisions were made with performance in mind.
Given key/value nature of the record, an easy choice would be to use map[string]string
as argument to encode/decode functions.
However []string
is more efficient than a map
. A slice can be reused across multiple records. We can clear by re-slicing as empty slice and reusing underlying array. A map would require allocating a new instance for each record, which would create a lot of work for garbage collector.
When serializing, you need to use Reset
method to get the benefit of efficient re-use of the record.
When reading and deserializing records, siser.Reader
uses this optimization internally.
The format avoids the need for escaping keys and values, which helps in making encoding/decoding fast.
How does that play out in real life? I wrote a
benchmark comparing siser vs.
json.Marshal
. It’s roughly similar when writing but about 8x faster when reading.
$ go test -bench=.
BenchmarkSiserMarshalWriteMany-12 1329409 871.1 ns/op
BenchmarkSiserMarshalWriteSingle-12 1000000 1088 ns/op
BenchmarkJSONMarshal-12 1340370 767.9 ns/op
BenchmarkSiserUnmarshal-12 4720020 251.5 ns/op
BenchmarkJSONUnmarshal-12 603591 2036 ns/op
The format is binary-safe and works for serializing large values e.g. you can serialize png image
It’s also very easy to implement in any language.