JavaScript optimization story

The following is based on a true story. The names have not been changed to protect the innocent.

I’m working on Filerion, a web-based file manager for online storage like Dropbox, One Drive, Google Drive, S3 etc.

Speed is my top priority. The app must be fast.

This is a story of how I optimized the core data for speed and low memory use.

How to optimize

There are 2 ways to optimize:

measure (i.e. profile) and use what you learn what is slow and optimize that
optimize based on general knowledge of how computers work

In this story I used the latter method: designed things based on knowing how computers works and how JavaScript engines work.

It’s all about memory

In modern processors the slow part is not the CPU but memory:

Modern processors typically have a clock cycle of 0.5ns while accesses to main memory are 50ns or more. Thus, an access to main memory is very expensive, over 100 clock cycles

Additionally, memory is hierarchical with multi-level caches (L1, L2, L3). Accessing L1 cache memory can be 50x faster. than accessing main memory.

Here are some numbers:

0.5 ns - CPU L1 dCACHE reference
           1   ns - speed-of-light (a photon) travel a 1 ft (30.5cm) distance
           5   ns - CPU L1 iCACHE Branch mispredict
           7   ns - CPU L2  CACHE reference
          71   ns - CPU cross-QPI/NUMA best  case on XEON E5-46*
         100   ns - MUTEX lock/unlock
         100   ns - own DDR MEMORY reference
         135   ns - CPU cross-QPI/NUMA best  case on XEON E7-*
         202   ns - CPU cross-QPI/NUMA worst case on XEON E7-*
         325   ns - CPU cross-QPI/NUMA worst case on XEON E5-46*
      10,000   ns - Compress 1K bytes with Zippy PROCESS
      20,000   ns - Send 2K bytes over 1 Gbps NETWORK
     250,000   ns - Read 1 MB sequentially from MEMORY
     500,000   ns - Round trip within a same DataCenter
  10,000,000   ns - DISK seek
  10,000,000   ns - Read 1 MB sequentially from NETWORK
  30,000,000   ns - Read 1 MB sequentially from DISK
 150,000,000   ns - Send a NETWORK packet CA -> Netherlands
|   |   |   |
|   |   | ns|
|   | us|
| ms|

All you need to know: to go fast, pack things tightly in memory.

And also garbage collection

JavaScript is a memory managed language. You don’t have to manually free the memory.

The price for that is that periodically the runtime has to scan all allocated objects to figure out which ones can be freed.

The time it takes to do the scan is at least linear with the number of allocated objects.

All you need to know: to go fast allocate less objects.

Ok, that’s two things you need to know.

Optimizing FSEntry

Filerion is a file manager. It shows information about files and directories in a list.

Instance of a class representing files and directories is the most frequently allocated object.

For small number of objects efficient representation doesn’t matter. But if we have millions of objects, small savings add up.

Here’s the naïve implementation of FSEntry:

class FSEntry {
  id = "";
	name = "";
	size = 0;
  lastMod;
  extra; // {}
  isDirectory = false;
  entries = null; // []
}

Name and size and last modification date is obvious.

Some storage systems allow duplicate names and have additional unique ids for the files.

We also support additional arbitrary file metadata which depend on the storage system. For example some storage system store an md5 and sha1 hash or mime type of the file. We store them as plain JavaScript object extra.

For directories, entries is a list of FSEntry objects representing files and directories in that directory.

Removing redundant information

For files entries will always be null. Therefore isDirectory flag is redundant. We can replace it with a getter:

get isDirectory() {
	return this.entries != null;
}

We saved 8 bytes per object.

A digression on JavaScript value representation

JavaScript has a small set of types: number, string, array, object.

JavaScript engines have converged on a similar, optimized representation of values: undefined, null, number take 8 bytes.

A string is 8 byte pointer + the utf-8 encoded value of string.

An array is 8 byte pointer + internal array representation + 8 bytes for each value in array.

An object (plain object, classes) is 8 byte pointer + internal object representation.

Using arrays instead of objects

A plain JavaScript object is a hash table that maps keys to values. If you store 2 values you need at least 4 8-byte pointers plus more data for internal management of hash tables. Due to how hash tables work, the keys and the values are somewhat scattered in memory.

When you store 2 values in an array, you only use 2 8-byte values and they are tightly packed in memory, one after another.

JavaScript engines are very good at optimizing classes with known, fixed layout so internal representation of FSEntry could be as optimized as storing fields in array, if it wasn’t for the extra object. It has to be allocated, will probably not be optimized and there’s overhead of representing a hash table.

Let’s store all known properties in an array:

const idxId = 0;
const idxName = 1;
const idxSize = 2;
const idxLastMod = 3;
const idxEntries = 4;

class FSEntry {
	meta; // []

  get id() { return this.meta[idxId]; }
	set id(v) { this.meta[idxId] = v; }

	// name, size, lastMod, entries implemented same as id
}

To store extra key / value pairs in an array is simple: we simply store them as 2 array entries.

A naïve implementation would be: array entry is itself a 2-value array i.e. meta = [knownValues..., [prop1Name, prop1Value], [prop2Name, prop2Value], ...]

This is not optimal. Instead of storing 2 8-byte values, we store an 8-byte pointer to an array, the internal representation of the array (at lest 3 8-byte values) and 2 8-byte values. So instead of 2 8-byte values we need at least 6 8-byte values, a 3x more.

Additionally, instead of the values being tightly packed together they are now allocated in a separate piece of memory, who knows how far away from the array.

So we store them as flattened array: meta = [knownValues...., prop1Name, prop1Value, prop2Name, prop2Value, ...]

const idxFirstProp = 5;

class FSEntry {
    addProp(key, val) {
			this.meta.push(key, val);
		}
		getProp(key) {
			// .. I trust you can implement this, a linear search starting at idxFirstProp
		}
}

A class that is an Array

JavaScript is one of those languages where every non-primitive type is also an Object.

An Array is an Object and therefore we can have a class that extends an Array:

class FSEntry extends Array {
	// no more meta! We are meta, which is soo meta
	get id()  { return this[idxId]; }
	set id(v) { this[idxId] = v; }
}

Instead of allocating 2 objects: FSEntry class instance and a meta array, we get it down to just one object, which is an array.

That saves 8 bytes for a pointer and internal representation of an object.

We save memory and create less objects, which also reduces time spent in garbage collection.

Metamorphosis

What if we have FSEntry data in an Array e.g. it came as JSON from a backend server.

To create FSEntry object we have to create a new Array / FSEntry and copy the data. That seems wasteful.

JavaScript has a way to do this without a copy:

let a = new Array(5);
Object.setPrototypeOf(a, FSEntry.prototype);

The reason it works is too complicated to explain here but we changed the identity of a from Array to FSEntry without creating a new object.

I use this technique to convert data coming as JSON plain objects from backend to an instance of a class with the same properties.

This is a dangerous operation so use it sparingly.

Also the docs warn about its performance so I don’t know if it’s faster than creating a new object.

Big array vs. many small arrays

To represent million file system entries we need a million small arrays.

It would be better to use just one array. We can do that because every programming problem can be solved with a layer of indirection:

class FSEntryPool {
	entries = [];
	entryIdx = [];
}

The idea is:

entries array conceptually stores many arrays representing FSEntry in a flattened representation
we need to know where the information about a given entry: where it starts in entries array and how many elements it has. Naively it requires using 2 numbers. However, we only really need 1 since the entries are stored consecutively so we can calculate the size of item n as: entryIdx[n+1] - entryIdx[n]
FSEntry is now represented as a number which is an index into entryIdx which is an index into entries

Let’s implement creating a new entry:

class FSEntryPool {
	allocEntry(values) {
		let id = len(this.entryIdx);
		let idx = len(this.entries);
		this.entryIdx.push(idx);
		this.entries.push(...values)
		return id;
	}
}

To get a property of a FSEntry we need the id representing the entry and a reference to FSEntryPool:

class FSEntryPool {

	getName(entryId) {
		let idx = this.entryIdx[entryId];
		return this.entries[idx + idxName];
	}
}

For props we need to know the number of elements of an entry:

class FSEntryPool {
	entrySize(entryId) {
		let idx = this.entryIdx[entryId];
		let nextIdx = this.entries.length;
		if (idx < this.entryIdx.length - 1) {
			nextIdx = this.entryIdx[entryId+1];
		}
		return nextIdx - idx;
	}
}

Limitations

There are two big limitations of this approach:

entries cannot be extended with more properties after creation. It’s not a problem for our use case
there is no way to free entries. That’s also not a big problem. It doesn’t happen often so the savings should more than compensate for occasional waste

Combating wasted space with array of arrays

Pretty much every implementation of expandable arrays implements them by re-allocating the array with the new size being 1.5 x- 2x of the previous size.

This is has good average behavior but has 2 deficiencies. Assuming expanding by the factor of 2x:

we touch 2x the amount of memory as actually used for date. A re-allocation requires copying the data from previous array to the new, bigger array
at the worst case, we waste 50% of space, on average 25% (I think)

We can solve that with another layer of indirection.

Instead of one gigantic array that expands we could store data in array of arrays of fixed size.

Let’s say we pick the size of fixed array as 1024 elements.

When we create the array, we can create a pre-allocated array. That’s more efficient than starting with small size and expanding by re-allocating the array several times.

This fixes both deficiencies:

we no longer need to re-allocate any memory, which is faster
the worst case of wasted space is an empty array of 1024 elements

Conclusion

A large number of programmers believe in “premature optimization is root of all evil”.

This is premature optimization. Is it the root of evil?

The argument against optimizations is: it will make the code harder to read, create more bugs.

It is certainly true. The most optimized version that uses array of arrays is indeed more complicated, harder to implement and harder to use than the naïve version.

To me, “is it harder” is not the question.

The question is: how much harder? Do performance benefits justify additional complexity?

That is a case by case determination.

In this particular case the implementation effort was a day or two. That’s more than manageable.

I wouldn’t use those techniques for every possible object but using it for one most frequently allocated object seems justified.

Edna	speedy note taking app with super powers
SumatraPDF	small, fast, free PDF / ePub / comic book reader for Windows