Speeding up JavaScript function with AI help

A new JavaScript library pretext for fast text measuring / layout popped up on social media.

Potentially interesting given its focus on speeding up text rendering in web apps and me writing web apps and liking them being fast.

I looked at the code and saw a function isCJK(). Given my 3 decades of programming and performance optimization, it looked like it could be sped up.

This is a story about ideas on making JavaScript faster and the process of quickly implementing and benchmarking them.

The code

export function isCJK(s: string): boolean {
  for (const ch of s) {
    const c = ch.codePointAt(0)!
    if ((c >= 0x4E00 && c <= 0x9FFF) ||
        (c >= 0x3400 && c <= 0x4DBF) ||
        (c >= 0x20000 && c <= 0x2A6DF) ||
        (c >= 0x2A700 && c <= 0x2B73F) ||
        (c >= 0x2B740 && c <= 0x2B81F) ||
        (c >= 0x2B820 && c <= 0x2CEAF) ||
        (c >= 0x2CEB0 && c <= 0x2EBEF) ||
        (c >= 0x30000 && c <= 0x3134F) ||
        (c >= 0xF900 && c <= 0xFAFF) ||
        (c >= 0x2F800 && c <= 0x2FA1F) ||
        (c >= 0x3000 && c <= 0x303F) ||
        (c >= 0x3040 && c <= 0x309F) ||
        (c >= 0x30A0 && c <= 0x30FF) ||
        (c >= 0xAC00 && c <= 0xD7AF) ||
        (c >= 0xFF00 && c <= 0xFFEF)) {
      return true
    }
  }
  return false
}

My spider sense tingling

To make code run fast you have to have mechanical sympathy. You need a good mental model of how CPUs and programming languages work, at the low level.

Because I have mechanical sympathy, I know that to evaluate multiple || statements, the program has to check every statement until it finds one that is true.

For the case of not matching any range, it has to do all 15 comparisons.

My immediate thought was that most characters are ascii (non-cjk) and therefore require 15 comparisons.

An early exit should speed things up: if (c < 0x300) continue;. It adds 1 comparison for matching cases but removes 14 (15 - 1) for the most common case of ascii characters.

Hunch is good, benchmark is better

Micro-benchmarking is subtle. In the past I would google for a JavaScript benchmarking library and trust that it’s sound.

Claude wrote a sound, simple benchmark function:

function bench(name, fn, iterations = 1_000_000) {
  for (let i = 0; i < 10_000; i++) fn(); // warmup
  const start = performance.now();
  for (let i = 0; i < iterations; i++) fn();
  const elapsed = performance.now() - start;
  const opsPerSec = ((iterations / elapsed) * 1000).toFixed(0);
  console.log(`${name}: ${elapsed.toFixed(2)} ms (${opsPerSec} ops/sec)`);
}

// to call:
bench("isCJKRegex       ", () => isCJKRegex(input));

The subtle part is warmup phase.

JavaScript JIT engines that you’re likely to use (v8 in Chrome, JavaScriptCore in Safari, SpiderMonkey in FireFox) have several optimization tiers and only do most aggressive optimization for frequently called functions.

The warmup phase forces JIT to compile fn with most aggressive settings.

This is only fair - I assume isCJK is called frequently. If it wasn’t, there’s no point sweating to optimize it.

The new AI normal

In the past I would have to manually implement each optimization idea.

With AI I can delegate the coding and therefore test more ideas.

The following was done with Claude 4.6.

100% vibe coding

There are levels of delegating things to AI. On the extreme side of giving up control is 100% vibe coding.

I had 2 ideas for performance optimization:

early exit
replace 15 linear checks with binary search

But what if I didn’t? What if I just ask AI to come up with ideas?

The following is my conversation with Claude 4.6 and what I’ve learned.

I created is_cjk.js file and pasted the original function.

look at is_cjk.js. come up with 2 ideas for how to speed up isCJK(). Implement those as separate functions and write a javascript micro-benchmark that compares those 2 new implementations against isCJK

Two ideas:

Pre-compiled regex — push the entire check into the native regex engine, avoiding JS-level for…of iteration and codePointAt() overhead.

Typed array bitmap for BMP — for BMP characters (the vast majority of real text), a Uint8Array bitmap gives O(1) lookup with a single array access. Fall back to range checks only for supplementary plane characters.

I have to admit: I recognize those as valid optimization ideas but they didn’t immediately occur to me.

Regex works because regex implementations in JavaScript engines are insanely optimized.

Bitmap is a known optimization for doing true/false lookup for numbers. Let’s say you have total range 0-255 and you want a function that checks if a number is in range 1-10, 20-30, 40-50, 60-70, 80-90. That’s 5 if statements, each doing 2 comparisons.

You can instead use 1 bit for each of the 256 numbers (i.e. ²⁵⁶⁄₈ = 32 bytes), pre-compute the array of bytes and set the bit for the ranges.

Then lookup is just math to get byte offset and bit, load the byte, check the bit. Super fast.

Claude not only wrote the benchmark function, 2 new implementations but also ran the benchmarks using node and showed me results in a readable way.

The bitmap approach was significantly faster.

Human assisted coding

Bitmap was the fastest and yet I discarded it using my human judgement.

The bitmap needed 65kB of memory and that’s just too much for this function.

That’s where human judgement comes in to improve collaboration with AI.

I didn’t specify that I also care about memory usage so AI wrote fast code that was using too much memory.

You’re benchmarking what?

Here’s another tricky part of benchmarking: knowing what you benchmark.

There are 3 popular JavaScript engines. Claude ran benchmarks using node which uses Chrome’s v8.

That’s good because that’s the most popular browser and therefore most popular JavaScript JIT engine.

It’s also good to sanity-check with at least one other JavaScript engine. bun uses Safari’s JavaScriptCore so I asked Claude:

run the tests using bun

The results were similar, which is good. We don’t want code that is fast in one engine but slow in another. We don’t control which browser the user of our code is running.

Continuing collaboration with AI

I had some more ideas so here’s the rest of my conversation with Claude.

don’t benchmark bitmap, it uses too much memory; write a variant of isCJK that does an early false exit if char is less than smallest

Early exit is my first insight I wanted to test.

As expected, early exit is good 2x speedup for latin strings, although regex beats it on cjk strings.

That is the curse of engineering trade-offs: you have to decide if you want to optimize for cjk strings or non-cjk strings.

now implement a variant that does inline bisect / binary search

That was my second idea: do a binary search instead of sequential if comparisons.

It was faster than original but slower than regex / early exit.

Also, Claude didn’t do what I meant. It stored the ranges as array:

// Flat array: [lo0, hi0, lo1, hi1, ...] sorted by lo
const cjkRanges = new Uint32Array([
  0x3000, 0x303f,
  0x3040, 0x309f,
  ...

That’s what I meant:

implement a variant isCJBisect that doesn’t use array but unrolls binary search logic into if statements

The implementation is gnarly:

function inCJKUnrolled(c) {
  if (c < 0x3000) return false;
  if (c < 0xff00) {
    if (c < 0x3400) {
      if (c < 0x3040) {
        return c <= 0x303f;
      } else if (c > 0x309f) {
        return c >= 0x30a0 && c <= 0x30ff;
      }
      return true;
  // ... more gnarly code

But what do I care? Claude wrote it. Arguably it’s hard for me to review for correctness but it’s perfectly testable. We can write a test to compare if it returns the same result as original function for every possible input.

As expected, unrolled version was faster than array-based but not always faster than regex or early exit.

But Claude wrote something that caught my eye:

The for…of + codePointAt iterator overhead remains the bottleneck shared by all non-regex variants on long strings.

It’s well known that for ... of loop is slower than for (let i = 0; i < n; i++). It’s one of those things that you need to know.

I knew this but I didn’t connect the dots in this particular case.

write variant of isCJKEarly which replaces for of with a faster loop

Result:

isCJKEarlyFast crushes everything across the board on both runtimes

The code for isCKJEarlyFast.

Looks like we have a winner:

Replacing for…of + codePointAt() with charCodeAt() + index loop eliminates the iterator overhead, which was the main bottleneck. The early c < 0x3000 bail then skips all range checks for ASCII/latin characters cheaply. Only regex still wins on Node’s long latin miss case thanks to V8’s native string scanning.

There’s a subtlety that makes the rewrite of for .. of into for (let i = 0; i < n; i++) tricky.

Logically JavaScript strings are composed from 16-bit numbers (characters). Some Unicode code-points are bigger than 16-bit and are stored as 2 16-bit numbers, called surrogate pairs.

So writing a loop over unicode codepoints needs to take this into account.

It’s tricky code but again, what do I care? Claude wrote it and it’s perfectly testable.

Pressing on

The good thing about claude writing most of the code and removing other tedium like running the benchmarks is that we can do more in the same amount of time.

So I explored even more.

I noticed:

export function isCJKUnrolled(s) {
  for (const ch of s) {
    if (inCJKUnrolled(ch.codePointAt(0))) return true;
  }
  return false;
}

What happens when inCJKUnrolled is inlined?

inline inCJKUnrolled into isCJKUnrolled

Result:

Done. Inlining didn’t help — it actually got slightly slower in most cases (the engine was already inlining the small function). No change in rankings; EarlyFast remains the winner.

Turns out it’s not faster, which was surprising. Inlining is supposed to remove overhead of function calls, which matters in loops.

Surprising isn’t bad. We run benchmarks to validate our hunches.

For completness, I wanted to compare with original where for .. of was rewritten.

also write original with for of replaced by standard for loop and bench all and give results as markdown table

Claude was nice enough to show benchmark results in a nicely formatted table, without me even asking. It just knows things.

But as you can see above, you can ask it for results in markdown table to e.g. include in a GitHub bug report.

Here are final results. The number is operations/second in millions. Higher is better.

Bun

Scenario	Original	Regex	Early	Bisect	Unrolled	EarlyFast	ForLoop
Single CJK	11M	22M	12M	12M	16M	79M	57M
Single latin	20M	24M	49M	21M	22M	40M	36M
CJK string	12M	18M	17M	12M	14M	42M	42M
Latin string	1.5M	4.7M	3.6M	3.0M	3.5M	17M	6.5M
Mixed string	6.5M	5.9M	9.9M	8.6M	13M	62M	34M

Node

Scenario	Original	Regex	Early	Bisect	Unrolled	EarlyFast	ForLoop
Single CJK	85M	54M	51M	44M	56M	112M	110M
Single latin	59M	72M	83M	55M	74M	115M	103M
CJK string	79M	55M	60M	55M	64M	90M	129M
Latin string	4.2M	57M	4.3M	2.8M	4.5M	12M	8.3M
Mixed string	14M	15M	16M	10M	15M	41M	38M

Conclusions

AI is a big unlock. It took me under 30 minutes to test various hypotheses and find out a significant speed up.

Without Claude it would take several hours and I would likely not do it at all. It’s just not important enough to spend a working day on it.

To get best results we still need to apply human judgement and guide the AI.

Programming expert knowledge

An expert is simply someone who knows things. We know things because we learn them.

If you were paying attention you might have learned the following things:

importance of warmup phase when benchmarking JIT compilers
for .. of is slower than for (let i = 0; i < n; i++)
regex matching in JavaScript engines is fast
subtlety of surrogate pairs in JavaScript strings
using bitmaps to speed up range lookups

Resources

All the code is in https://gist.github.com/kjk/bdbea9d90c3bb0454fbe26353c521bfd

I like to write fast code.

If you want a fast bookmark manager / note taker, try MarkLexis.

If you want a fast PDF / ebook / comic book reader for Windows, try SumatraPDF.

Edna	speedy note taking app with super powers
SumatraPDF	small, fast, free PDF / ePub / comic book reader for Windows