The following is based on a true story. The names have not been changed to protect the innocent.
I’m working on
Filerion, a web-based file manager for online storage like Dropbox, One Drive, Google Drive, S3 etc.
Speed is my top priority. The app must be fast.
This is a story of how I optimized the core data for speed and low memory use.
How to optimize
There are 2 ways to optimize:
- measure (i.e. profile) and use what you learn what is slow and optimize that
- optimize based on general knowledge of how computers work
In this story I used the latter method: designed things based on knowing how computers works and how JavaScript engines work.
It’s all about memory
In modern processors the slow part is not the CPU but memory:
Modern processors typically have a clock cycle of 0.5ns while accesses to main memory are 50ns or more. Thus, an access to main memory is very expensive, over 100 clock cycles
Additionally, memory is hierarchical with multi-level caches (L1, L2, L3). Accessing L1 cache memory can be 50x faster. than accessing main memory.
Here are some numbers:
0.5 ns - CPU L1 dCACHE reference
1 ns - speed-of-light (a photon) travel a 1 ft (30.5cm) distance
5 ns - CPU L1 iCACHE Branch mispredict
7 ns - CPU L2 CACHE reference
71 ns - CPU cross-QPI/NUMA best case on XEON E5-46*
100 ns - MUTEX lock/unlock
100 ns - own DDR MEMORY reference
135 ns - CPU cross-QPI/NUMA best case on XEON E7-*
202 ns - CPU cross-QPI/NUMA worst case on XEON E7-*
325 ns - CPU cross-QPI/NUMA worst case on XEON E5-46*
10,000 ns - Compress 1K bytes with Zippy PROCESS
20,000 ns - Send 2K bytes over 1 Gbps NETWORK
250,000 ns - Read 1 MB sequentially from MEMORY
500,000 ns - Round trip within a same DataCenter
10,000,000 ns - DISK seek
10,000,000 ns - Read 1 MB sequentially from NETWORK
30,000,000 ns - Read 1 MB sequentially from DISK
150,000,000 ns - Send a NETWORK packet CA -> Netherlands
| | | |
| | | ns|
| | us|
| ms|
All you need to know: to go fast, pack things tightly in memory.
And also garbage collection
JavaScript is a memory managed language. You don’t have to manually free the memory.
The price for that is that periodically the runtime has to scan all allocated objects to figure out which ones can be freed.
The time it takes to do the scan is at least linear with the number of allocated objects.
All you need to know: to go fast allocate less objects.
Ok, that’s two things you need to know.
Optimizing FSEntry
Filerion is a file manager. It shows information about files and directories in a list.
Instance of a class representing files and directories is the most frequently allocated object.
For small number of objects efficient representation doesn’t matter. But if we have millions of objects, small savings add up.
Here’s the naïve implementation of FSEntry
:
class FSEntry {
id = "";
name = "";
size = 0;
lastMod;
extra; // {}
isDirectory = false;
entries = null; // []
}
Name and size and last modification date is obvious.
Some storage systems allow duplicate names and have additional unique ids for the files.
We also support additional arbitrary file metadata which depend on the storage system. For example some storage system store an md5 and sha1 hash or mime type of the file. We store them as plain JavaScript object extra
.
For directories, entries
is a list of FSEntry
objects representing files and directories in that directory.
For files entries
will always be null
. Therefore isDirectory
flag is redundant. We can replace it with a getter:
get isDirectory() {
return this.entries != null;
}
We saved 8 bytes per object.
A digression on JavaScript value representation
JavaScript has a small set of types: number, string, array, object.
JavaScript engines have converged on a similar, optimized representation of values: undefined, null, number take 8 bytes.
A string is 8 byte pointer + the utf-8 encoded value of string.
An array is 8 byte pointer + internal array representation + 8 bytes for each value in array.
An object (plain object, classes) is 8 byte pointer + internal object representation.
Using arrays instead of objects
A plain JavaScript object is a hash table that maps keys to values. If you store 2 values you need at least 4 8-byte pointers plus more data for internal management of hash tables. Due to how hash tables work, the keys and the values are somewhat scattered in memory.
When you store 2 values in an array, you only use 2 8-byte values and they are tightly packed in memory, one after another.
JavaScript engines are very good at optimizing classes with known, fixed layout so internal representation of FSEntry
could be as optimized as storing fields in array, if it wasn’t for the extra
object. It has to be allocated, will probably not be optimized and there’s overhead of representing a hash table.
Let’s store all known properties in an array:
const idxId = 0;
const idxName = 1;
const idxSize = 2;
const idxLastMod = 3;
const idxEntries = 4;
class FSEntry {
meta; // []
get id() { return this.meta[idxId]; }
set id(v) { this.meta[idxId] = v; }
// name, size, lastMod, entries implemented same as id
}
To store extra
key / value pairs in an array is simple: we simply store them as 2 array entries.
A naïve implementation would be: array entry is itself a 2-value array i.e. meta = [knownValues..., [prop1Name, prop1Value], [prop2Name, prop2Value], ...]
This is not optimal. Instead of storing 2 8-byte values, we store an 8-byte pointer to an array, the internal representation of the array (at lest 3 8-byte values) and 2 8-byte values. So instead of 2 8-byte values we need at least 6 8-byte values, a 3x more.
Additionally, instead of the values being tightly packed together they are now allocated in a separate piece of memory, who knows how far away from the array.
So we store them as flattened array: meta = [knownValues...., prop1Name, prop1Value, prop2Name, prop2Value, ...]
const idxFirstProp = 5;
class FSEntry {
addProp(key, val) {
this.meta.push(key, val);
}
getProp(key) {
// .. I trust you can implement this, a linear search starting at idxFirstProp
}
}
A class that is an Array
JavaScript is one of those languages where every non-primitive type is also an Object.
An Array is an Object and therefore we can have a class that extends an Array:
class FSEntry extends Array {
// no more meta! We are meta, which is soo meta
get id() { return this[idxId]; }
set id(v) { this[idxId] = v; }
}
Instead of allocating 2 objects: FSEntry
class instance and a meta
array, we get it down to just one object, which is an array.
That saves 8 bytes for a pointer and internal representation of an object.
We save memory and create less objects, which also reduces time spent in garbage collection.
What if we have FSEntry
data in an Array e.g. it came as JSON from a backend server.
To create FSEntry
object we have to create a new Array
/ FSEntry
and copy the data. That seems wasteful.
JavaScript has a way to do this without a copy:
let a = new Array(5);
Object.setPrototypeOf(a, FSEntry.prototype);
The reason it works is too complicated to explain here but we changed the identity of a from Array
to FSEntry
without creating a new object.
I use this technique to convert data coming as JSON plain objects from backend to an instance of a class with the same properties.
This is a dangerous operation so use it sparingly.
Also the docs warn about its performance so I don’t know if it’s faster than creating a new object.
Big array vs. many small arrays
To represent million file system entries we need a million small arrays.
It would be better to use just one array. We can do that because every programming problem can be solved with a layer of indirection:
class FSEntryPool {
entries = [];
entryIdx = [];
}
The idea is:
entries
array conceptually stores many arrays representing FSEntry
in a flattened representation
- we need to know where the information about a given entry: where it starts in
entries
array and how many elements it has. Naively it requires using 2 numbers. However, we only really need 1 since the entries are stored consecutively so we can calculate the size of item n as: entryIdx[n+1] - entryIdx[n]
FSEntry
is now represented as a number which is an index into entryIdx
which is an index into entries
Let’s implement creating a new entry:
class FSEntryPool {
allocEntry(values) {
let id = len(this.entryIdx);
let idx = len(this.entries);
this.entryIdx.push(idx);
this.entries.push(...values)
return id;
}
}
To get a property of a FSEntry
we need the id representing the entry and a reference to FSEntryPool
:
class FSEntryPool {
getName(entryId) {
let idx = this.entryIdx[entryId];
return this.entries[idx + idxName];
}
}
For props we need to know the number of elements of an entry:
class FSEntryPool {
entrySize(entryId) {
let idx = this.entryIdx[entryId];
let nextIdx = this.entries.length;
if (idx < this.entryIdx.length - 1) {
nextIdx = this.entryIdx[entryId+1];
}
return nextIdx - idx;
}
}
Limitations
There are two big limitations of this approach:
- entries cannot be extended with more properties after creation. It’s not a problem for our use case
- there is no way to free entries. That’s also not a big problem. It doesn’t happen often so the savings should more than compensate for occasional waste
Combating wasted space with array of arrays
Pretty much every implementation of expandable arrays implements them by re-allocating the array with the new size being 1.5 x- 2x of the previous size.
This is has good average behavior but has 2 deficiencies. Assuming expanding by the factor of 2x:
- we touch 2x the amount of memory as actually used for date. A re-allocation requires copying the data from previous array to the new, bigger array
- at the worst case, we waste 50% of space, on average 25% (I think)
We can solve that with another layer of indirection.
Instead of one gigantic array that expands we could store data in array of arrays of fixed size.
Let’s say we pick the size of fixed array as 1024 elements.
When we create the array, we can create a pre-allocated array. That’s more efficient than starting with small size and expanding by re-allocating the array several times.
This fixes both deficiencies:
- we no longer need to re-allocate any memory, which is faster
- the worst case of wasted space is an empty array of 1024 elements
Conclusion
A large number of programmers believe in “premature optimization is root of all evil”.
This is premature optimization. Is it the root of evil?
The argument against optimizations is: it will make the code harder to read, create more bugs.
It is certainly true. The most optimized version that uses array of arrays is indeed more complicated, harder to implement and harder to use than the naïve version.
To me, “is it harder” is not the question.
The question is: how much harder? Do performance benefits justify additional complexity?
That is a case by case determination.
In this particular case the implementation effort was a day or two. That’s more than manageable.
I wouldn’t use those techniques for every possible object but using it for one most frequently allocated object seems justified.