Lessons learned porting 50k loc from Java to Go

I was contracted to port a large Java code base to Go.

The code in question is a Java client for RavenDB, a NoSQL JSON document database. Code with tests was around 50 thousand lines.

The result of the port is a Go client.

This article describes what I’ve learn in the process.

Testing, code coverage

Large projects benefit greatly from automated testing and tracking code coverage.

I used TravisCI and AppVeyor for testing. Codecov.io for code coverage. There are many other services.

I used both AppVeyor and TravisCI because a year ago Travis didn’t have Windows support and AppVeyor didn’t have Linux support.

Today if I was settings this up from scratch, I would stick with just AppVeyor, as it can now do both Linux and Windows testing and the future of TravisCI is murky, after it was acquired by private equity firm and reportedly fired the original dev team.

Codecov is barely adequate. For Go, they count non-code lines (comments etc.) as not executed. It’s impossible to get 100% code coverage as reported by the tool. Coveralls seems to have the same problem.

It’s better than nothing but there’s an opportunity to do things better, especially for Go programs.

Go’s race detector is great

Parts of the code use concurrency and it’s really easy to get concurrency wrong.

Go provides race detector that can be enabled with -race flag during compilation.

It slows down the program but additional checks can detect if you’re concurrently modifying the same memory location.

I always run tests with -race enabled and it alerted me to numerous races, which allowed me to fix them promptly.

Building custom tools for testing

In a project that big it’s impossible to verify correctness by inspection. Too much code to hold in your head at once.

When a test fails, it can be a challenge to figure out why just from the information in the test failure.

Database client driver talks to RavenDB database server over HTTP using JSON to encode commands and results.

When porting Java tests to Go, it was very useful to be able to capture the HTTP traffic between Java client and server and compare it with HTTP traffic generated by Go port.

I built custom tools to help me do that.

For capturing HTTP traffic in Java client, I built a logging HTTP proxy in Go and directed Java client to use that HTTP proxy.

For Go client, I built a hook in the library that allows to intercept HTTP requests. I used it to log the traffic to a file.

I was then able to compare HTTP traffic generated by Java client to traffic generated by my Go port and spot the differences.

Porting process

You can’t just start porting 50 thousand lines of code in random order. Without testing and validating after every little step I’m sure I would be defeated by complexity.

I was new to RavenDB and Java code base. My first step was to get a high-level understanding how Java code works.

At the core the client talks to the server via HTTP protocol. I captured the traffic, looked at it and wrote the simplest Go code to talk the server.

When that was working it gave me confidence I’ll be able to replicate the functionality.

My first milestone was to port enough code to be able to port the simplest Java test.

I used a combination of bottom-up and top-down approach.

Bottom-up part is where I identified the code at the bottom of call chain responsible for sending commands to the server and parsing responses and ported those.

The top-down part is where I stepped through the test I was porting to identify which parts of the code need to be ported to implement that part.

After successfully porting the first step, the rest of the work was porting one test at a time, also porting all the necessary code needed to make the test work.

After the tests were ported and passing, I did improvements to make the code more Go-ish.

I believe that this step-by-step approach was crucial to completing the work.

Psychologically, when faced with a year-long project, it’s important to have smaller, intermediate milestones. Hitting those kept me motivated.

Keeping the code compiling, running and passing tests at all times is also good. Allowing bugs to accumulate can make it very hard to fix them when you finally get to it.

Challenges of porting Java to Go

The objective of the port was to keep it as close as possible to Java code base, as it needs to be kept in sync with Java changes in the future.

I’m somewhat surprised how much code I ported in a line-by-line fashion. The most time consuming part of the port was reversing the order of variable declaration, from Java’s type name to Go’s name type. I wish there was a tool that would do that part for me.

String vs. string

In Java, String is an object that really is a reference (a pointer). As a result, a string can be null.

In Go string is a value type. It can’t be nil, only empty.

It wasn’t a big deal and most of the time I could mechanically replace null with "".

Errors vs. exceptions

Java uses exceptions to communicate errors.

Go returns values of error interface.

Porting wasn’t difficult but it did require changing lots of function signatures to return error values and propagate them up the call stack.

Generics

Go doesn’t have them (yet).

Porting generic APIs was the biggest challenge.

Here’s an example of a generic method in Java:

public <T> T load(Class<T> clazz, String id) {

And the caller:

Foo foo = load(Foo.class, "id")

In Go, I used two strategies.

One is to use interface{}, which combines value and its type, similar to object in Java. This is not preferred approach. While it works, operating on interface{} is clumsy for the user of the library.

In some cases I was able to use reflection and the above code was ported as:

func Load(result interface{}, id string) error

I could use reflection to query type of result and create values of that type from JSON document.

And the caller side:

var result *Foo
err := Load(&result, "id")

Function overloading

Go doesn’t have it (and most likely will never have it).

I can’t say I found a good solution to port those.

In some cases overloading was used to create shorter helpers:

void foo(int a, String b) {}
void foo(int a) { foo(a, null); }

Sometimes I would just drop the shorter helper.

Sometimes I would write 2 functions:

func foo(a int) {}
func fooWithB(a int, b string) {}

When number of potential arguments was large I would sometimes do:

type FooArgs struct {
	A int
	B string
}
func foo(args *FooArgs) { }

Inheritance

Go is not especially object-oriented and doesn’t have inheritance.

Simple cases of inheritance can be ported with embedding.

class B : A { }

Can sometimes be ported as:

type A struct { }
type B struct {
	A
}

We’ve embedded A inside B, so B inherit all the methods and fields of A.

It doesn’t work for virtual functions.

There is no good way to directly port code that uses virtual functions.

One option to emulate virtual function is to use embedding of structs and function pointers. This essentially re-implements virtual table that Java gives you for free as part of object implementation.

Another option is to write a stand-alone function that dispatches the right function for a given type by using type switch.

Interfaces

Both Java and Go have interfaces but they are different things, like apples and salami.

A few times I did create a Go interface type that replicated Java interface.

In more cases I dropped interfaces and instead exposed concrete structs in the API.

Circular imports between packages

Java allows circular imports between packages.

Go does not.

As a result I was not able to replicate the package structure of Java code in my port.

For simplicity I went with a single package. Not ideal, because it ended up being very large package. So large, in fact, that Go 1.10 couldn’t handle so many source files in a single package on Windows. Luckily it was fixed in Go 1.11.

Private, public, protected

Go’s designers are under-appreciated. Their ability to simplify concepts is unmatched and access control is one example of that.

Other languages gravitate to fine-grained access control: public, private, protected specified with the smallest possible granularity (per class field and method).

As a result a library implementing some functionality has the same access to other classes in the same library as external code using that library.

Go simplified that by only having public vs. private and scoping access to package level.

That makes more sense.

When I write a library to, say, parse markdown, I don’t want to expose internals of the implementation to users of the library. But hiding those internals from myself is counter-productive.

Java programmers noticed that issue and sometimes use an interface as a hack to fix over-exposed classes. By returning an interface instead of a a concrete class, you can hide some of the public APIs available to direct users of the class.

Concurrency

Go’s concurrency is simply the best and a built-in race detector is of great help in repelling concurrency bugs.

That being said, in my first porting pass I went with emulating Java APIs. For example, I implemented a facsimile of Java’s CompletableFuture class.

Only after the code was working I would re-structure it to be more idiomatic Go.

Fluent function chaining

RavenDB has very sophisticated querying capabilities. Java client uses method chaining for building queries:

List<ReduceResult> results = session.query(User.class)
                        .groupBy("name")
                        .selectKey()
                        .selectCount()
                        .orderByDescending("count")
                        .ofType(ReduceResult.class)
                        .toList();

This only works in languages that communicate errors via exceptions. When a function additionally returns an error, it’s no longer possible to chain it like that.

To replicate chaining in Go I used a “stateful error” approach:

type Query struct {
	err error
}

func (q *Query) WhereEquals(field string, val interface{}) *Query {
	if q.err != nil {
		return q
	}
	// logic that might set q.err
	return q
}

func (q *Query) GroupBy(field string) *Query {
if q.err != nil {
		return q
	}
	// logic that might set q.err
	return q
}

func (q *Query) Execute(result inteface{}) error {
	if q.err != nil {
		return q.err
	}
	// do logic
}

This can be chained:

var result *Foo
err := NewQuery().WhereEquals("Name", "Frank").GroupBy("Age").Execute(&result)

JSON marshaling

Java doesn’t have a built-in marshaling and the client uses Jackson JSON library.

Go has JSON support in standard library but it doesn’t provide as many hooks for tweaking marshaling process.

I didn’t try to match all of Java’s functionality as what is provided by Go’s built-in JSON support seems to be flexible enough.

Go code is shorter

This is not so much a property of Java but the culture which dictates what is considered an idiomatic code.

In Java setter and getter methods are common. As a result, Java code:

class Foo {
	private int bar;

	public void setBar(int bar) {
		this.bar = bar;
	}

	public int getBar() {
		return this.bar;
	}
}

ends up in Go as:

type Foo struct {
	Bar int
}

3 lines vs. 11 lines. It does add up when you have a lot of classes with lots of members.

Most other code ends up being of equivalent length.

Notion for organizing the work

I’m a heavy user of Notion.so, a hierarchical note taking application.

Here’s how I used Notion to organize my work on Go port:

Here’s what’s there:

not shown above, I have a page that is a calendar view where I take short notes about what I work on on a given day and how much time I spent. This is important information since it was a hourly contract. Thanks to those notes I know that I spent 601 hours over 11 months
clients like to know the progress. I had a page for each month were I summarized the work done like this:

Those pages were shared with the client.
A short-term todo list helps when starting work each day:
I even managed invoices as Notion pages and used “Export to PDF” function to generate PDF version of the invoice

Additional resources

I’ve provided some additional commentary in response to questions:

in Hacker News discussion
in /r/golang discussion

Other material:

if you need a NoSQL, JSON document database, give RavenDB a try. It’s chock full of advanced features
if you’re interested in Notion, I’m world’s most advanced user of Notion:
- I reverse engineered Notion API
- I wrote an unofficial Go library for Notion API
- all content on this website is written in Notion and published with my custom toolchain

Edna	speedy note taking app with super powers
SumatraPDF	small, fast, free PDF / ePub / comic book reader for Windows