Improving speed of SMAZ compressor by 2.6x/1.5x

I was testing fast compressors in pure Go. One of them was Go implementation of SMAZ algorithm for compressing small strings. It’s simple, fast and works well for English text.

It wasn’t as fast as I expected so I looked at the code and with a few tweaks I managed to speed up decompression 2.61x times and compression 1.54x times:

kjkmacpro:smaz kjk$ benchcmp before.txt after.txt
benchmark                  old ns/op     new ns/op     delta
BenchmarkCompression       3387936       2195304       -35.20%
BenchmarkDecompression     2667583       1022908       -61.65%

benchmark                  old MB/s     new MB/s     speedup
BenchmarkCompression       40.35        62.26        1.54x
BenchmarkDecompression     28.34        73.90        2.61x

The speed increase came from 3 micro-optimizations.

1. Don’t use `bytes.Buffer` if `[]byte` will do.

The biggest decompression speed-up came from this change where I replaced the use of bytes.Buffer with []byte slice.

bytes.Buffer is a wrapper around []byte. It adds convenience by implementing popular interfaces like io.Reader, io.Writer etc. Decreased speed is the price of that.

It doesn’t matter in most programs, but in a tight decompression loop even small wins do add up.

2. Re-using buffers is another common optimization trick in Go.

The original API was:

compressed := smaz.Compress(source)

Compress function has no option but to allocate a new buffer for the compressed data. Allocations are not free and they slow down the program by making garbage collector do more work.

Other compression libraries allow the caller to provide a buffer for the result:

compressed := make([]byte, 1024)
compressed = smaz.Encode(compressed, source)

If the buffer is not big enough, it’ll be enlarged. If the caller doesn’t want manage the buffer, it can pass nil.

3. Avoid un-necessary copies

Compression and decompression involves reading data from memory, transforming it and writing the result to another memory location.

Memory access is expensive. You can execute 7 CPU instructions for one memory operation in L2 cache.

I noticed that compression was making unnecessary temporary copies of data. The code got a bit more complicated but also 1.14x faster.

Benchmarking tools in Go

Go includes tools for writing benchmarks and tests.

Writing benchmarks is straightforward. Here’s a benchmark for compression speed:

func BenchmarkCompression(b *testing.B) {
    b.StopTimer()
    inputs, n := loadTestData(b)
    b.SetBytes(n)
    b.StartTimer()
    var dst []byte
    for i := 0; i < b.N; i++ {
        for _, input := range inputs {
            dst = Encode(dst, input)
        }
    }
}

You run benchmarks with go test -bench=.. To benchmark a single function only use -bench argument (or pass . to run all of them).

Go minimizes amount of work the programmer needs to do in several ways:

benchmarking functions are automatically recognized by convention: a function that starts with Benchmark in *_test.go file is a benchmark function
the results are in a standardized, human-readable form
benchmarking tool not only measures time but you can also get MB/s metric by using b.SetBytes(). It’s a good metric for compression algorithms.

There’s also a tool to compare benchmark results (before and after the change):

> go get -u golang.org/x/tools/cmd/benchcmp
> go test -bench=. >before.txt
> ... make the changes
> go test -bench=. >after.txt
> benchcmp before.txt after.txt

Edna	speedy note taking app with super powers
SumatraPDF	small, fast, free PDF / ePub / comic book reader for Windows

1. Don’t use bytes.Buffer if []byte will do.

2. Re-using buffers is another common optimization trick in Go.

3. Avoid un-necessary copies

Benchmarking tools in Go

1. Don’t use `bytes.Buffer` if `[]byte` will do.