Go team wrote golang.org/x/sys/windows package to call functions in a Windows DLL.

Their way is inefficient and this article describes a better way.

The sys/windows way

To call a function in a DLL, let’s say kernel32.dll, we must:

load the dll into memory with LoadLibrary
get the address of a function in the dll
call the function at that address

Here’s how it looks when you use sys/windows library:

var (
	libole32 *windows.LazyDLL

	coCreateInstance *windows.LazyProc
)

func init() {
	libole32 = windows.NewLazySystemDLL("ole32.dll")
	coCreateInstance = libole32.NewProc("CoCreateInstance")
}

func CoCreateInstance(rclsid *GUID, pUnkOuter *IUnknown, dwClsContext uint32, riid *GUID, ppv *unsafe.Pointer) HRESULT {
	ret, _, _ := syscall.SyscallN(coCreateInstance.Addr(), 5,
		uintptr(unsafe.Pointer(rclsid)),
		uintptr(unsafe.Pointer(pUnkOuter)),
		uintptr(dwClsContext),
		uintptr(unsafe.Pointer(riid)),
		uintptr(unsafe.Pointer(ppv)),
		0,
	)
	return HRESULT(ret)
}

The problem

The problem is that this is memory inefficient.

For every function all we need is:

name of the function to get its address in a dll. That is a string so its 8 bytes (address of the string) + 8 bytes (size of the string) + the content of the string.
address of a function, which is 8 bytes on a 64-bit CPU

Unfortunately in sys/windows each function requires this:

type LazyProc struct {
	Name string

	mu   sync.Mutex
	l    *LazyDLL
	proc *Proc
}

type Proc struct {
	Dll  *DLL
	Name string
	addr uintptr
}

// sync.Mutex
type Mutex struct {
	_ noCopy

	mu isync.Mutex
}

// isync.Mutex
type Mutex struct {
	state int32
	sema  uint32
}

Let’s eyeball the size of all those structures:

LazyProc : 16 + sizeof(Mutex) + 8 + 8 = 32 + sizeof(Mutex)
Proc : 8 + 16 + 8 = 32
Mutex : 8

Total: 32 + 32 + 8 = 72 and that’s not counting possible memory padding for allocations.

Windows has a lot of functions so this adds up.

Additionally, at startup we call NewProcfor every function, even if they are not used by the program. This increases startup time.

The better way

What we ultimately need is uintptr for the address of the function. It’ll be lazily looked up.

Let’s say we use 8 functions from ole32.dll. We can use a single array of uintptr values for storing function pointers:

var oleFuncPtrs = [8]uintptr
var oleFuncNames = []string{"CoCreateInstance", "CoGetClassObject", ... }

const kCoCreateInstance = 0
const kCoGetClassObject = 1
// etc.

const kFuncMissing = 1

func funcAddrInDLL(dll *windows.LazyDLL, funcPtrs []uintptr, funcIdx int, funcNames []string) uintptr {
  addr := funcPtrs[funcIdx];
  if addr == kFuncMissing {
    // we already tried to look it up and didn't find it
    // this can happen becuse older version of Windows might not implement this function
    return 0
  }
  if addr != 0 {
    return addr
  }
  // lookup the funcion by name in dll
  name := funcNames[funcIdx]
  /// ...
  return addr
}

In real life this would need multi-threading protection with e.g. a mutex.

Saving on strings

The following is not efficient:

var oleFuncNames = []string{"CoCreateInstance", "CoGetClassObject", ... }

In addition to the text of the string Go needs 16 bytes: 8 for a pointer to the string and 8 for the size of the string.

We can be more efficient by storing all names as a single string:

var oleFuncNames `
CoCreateInstance
CoGetClassObject
`

Only when we’re looking up the function by name we need to construct temporary string that is a slice of oleFuncNames.

We need to know the offset and size inside oleFuncNames which we can cleverly encode as a single number:

// Auto-generated shell procedure identifier: cache index | str start | str past-end.
const (
	_PROC_SHCreateItemFromIDList            _PROC_SHELL = 0 | (9 << 16) | (31 << 32)
	_PROC_SHCreateItemFromParsingName       _PROC_SHELL = 1 | (32 << 16) | (59 << 32)
  // ...
)

We pack the info into a single number:

bits 0-15 : index of function in array of function pointers
bits 16-31: start of function name in multi-name string
bits 32-47: end of function name in multi-name string

This technique requires code generation. It would be too difficult to write those numbers manually.

References

This technique is used in https://github.com/rodrigocfd/windigo win32 bindings Go library. See e.g. https://github.com/rodrigocfd/windigo/blob/master/internal/dll/dll_gdi.go

Edna	speedy note taking app with super powers
SumatraPDF	small, fast, free PDF / ePub / comic book reader for Windows