Go team wrote golang.org/x/sys/windows
package to call functions in a Windows DLL.
Their way is inefficient and this article describes a better way.
The sys/windows way
To call a function in a DLL, let’s say kernel32.dll
, we must:
- load the dll into memory with
LoadLibrary
- get the address of a function in the dll
- call the function at that address
Here’s how it looks when you use sys/windows library:
var (
libole32 *windows.LazyDLL
coCreateInstance *windows.LazyProc
)
func init() {
libole32 = windows.NewLazySystemDLL("ole32.dll")
coCreateInstance = libole32.NewProc("CoCreateInstance")
}
func CoCreateInstance(rclsid *GUID, pUnkOuter *IUnknown, dwClsContext uint32, riid *GUID, ppv *unsafe.Pointer) HRESULT {
ret, _, _ := syscall.SyscallN(coCreateInstance.Addr(), 5,
uintptr(unsafe.Pointer(rclsid)),
uintptr(unsafe.Pointer(pUnkOuter)),
uintptr(dwClsContext),
uintptr(unsafe.Pointer(riid)),
uintptr(unsafe.Pointer(ppv)),
0,
)
return HRESULT(ret)
}
The problem
The problem is that this is memory inefficient.
For every function all we need is:
- name of the function to get its address in a dll. That is a string so its 8 bytes (address of the string) + 8 bytes (size of the string) + the content of the string.
- address of a function, which is 8 bytes on a 64-bit CPU
Unfortunately in sys/windows each function requires this:
type LazyProc struct {
Name string
mu sync.Mutex
l *LazyDLL
proc *Proc
}
type Proc struct {
Dll *DLL
Name string
addr uintptr
}
// sync.Mutex
type Mutex struct {
_ noCopy
mu isync.Mutex
}
// isync.Mutex
type Mutex struct {
state int32
sema uint32
}
Let’s eyeball the size of all those structures:
LazyProc
: 16 + sizeof(Mutex) + 8 + 8 = 32 + sizeof(Mutex)
Proc
: 8 + 16 + 8 = 32
Mutex
: 8
Total: 32 + 32 + 8 = 72 and that’s not counting possible memory padding for allocations.
Windows has a lot of functions so this adds up.
Additionally, at startup we call NewProc
for every function, even if they are not used by the program. This increases startup time.
The better way
What we ultimately need is uintptr
for the address of the function. It’ll be lazily looked up.
Let’s say we use 8 functions from ole32.dll
. We can use a single array of uintptr
values for storing function pointers:
var oleFuncPtrs = [8]uintptr
var oleFuncNames = []string{"CoCreateInstance", "CoGetClassObject", ... }
const kCoCreateInstance = 0
const kCoGetClassObject = 1
// etc.
const kFuncMissing = 1
func funcAddrInDLL(dll *windows.LazyDLL, funcPtrs []uintptr, funcIdx int, funcNames []string) uintptr {
addr := funcPtrs[funcIdx];
if addr == kFuncMissing {
// we already tried to look it up and didn't find it
// this can happen becuse older version of Windows might not implement this function
return 0
}
if addr != 0 {
return addr
}
// lookup the funcion by name in dll
name := funcNames[funcIdx]
/// ...
return addr
}
In real life this would need multi-threading protection with e.g. a mutex.
Saving on strings
The following is not efficient:
var oleFuncNames = []string{"CoCreateInstance", "CoGetClassObject", ... }
In addition to the text of the string Go needs 16 bytes: 8 for a pointer to the string and 8 for the size of the string.
We can be more efficient by storing all names as a single string:
var oleFuncNames `
CoCreateInstance
CoGetClassObject
`
Only when we’re looking up the function by name we need to construct temporary string that is a slice of oleFuncNames
.
We need to know the offset and size inside oleFuncNames
which we can cleverly encode as a single number:
// Auto-generated shell procedure identifier: cache index | str start | str past-end.
const (
_PROC_SHCreateItemFromIDList _PROC_SHELL = 0 | (9 << 16) | (31 << 32)
_PROC_SHCreateItemFromParsingName _PROC_SHELL = 1 | (32 << 16) | (59 << 32)
// ...
)
We pack the info into a single number:
- bits 0-15 : index of function in array of function pointers
- bits 16-31: start of function name in multi-name string
- bits 32-47: end of function name in multi-name string
This technique requires code generation. It would be too difficult to write those numbers manually.
References