Building wc in the browser
From time to time I like to run wc -l
on my source code to see how much code I wrote.
For those not in the know: wc -l
shows number of lines in files.
Actually, what I have to do is more like find -name "*.go" | xargs wc -l
because wc
isn’t a particularly good at handling directories.
I just want to see number of lines in all my source files, man. I don’t want to google the syntax of find
and xargs
for a hundredth time.
After learning about
File System API I decided to write a tool that does just that as a web app. No need to install software.
Here’s how it sees itself:
The rest of this article describes how I would have done it if I did it.
Building software quickly
It only took me 3 days, which is a testament to how productive the web platform can be.
My weapons of choice are:
For a small project Svelte and Tailwind CSS are arguably an overkill. I used them because I standardized on that toolset. Standardization allows me to re-use prior experience and sometimes even code.
Why those technologies?
Svelte is React without the bloat. Try it and you’ll love it.
Tailwind CSS is CSS but more productive. You have to try it to believe it.
JSDoc is happy medium between no types at all and TypeScript. I have great internal resistance to switching to TypeScript. Maybe 5 years from now.
And none of that would be possible without browser APIs that allow access to files on your computer. Which FireFox doesn’t implement because they are happy to loose market share to browser that implement useful features. Clearly
$3 million a year is not enough to buy yourself a CEO with understanding of the obvious.
Implementation tidbits
Getting list of files
To get a recursive listing of files in a directory use showDirectoryPicker
to get a FileSystemDirectoryHandle
. Call dirHandle.values()
to get a list of directory entries. Recurse if an entry is a directory.
Not all browsers
support that API. To detect if it works:
/**
* @returns {boolean}
*/
export function isIFrame() {
let isIFrame = false;
try {
// in iframe, those are different
isIFrame = window.self !== window.top;
} catch {
// do nothing
}
return isIFrame;
}
/**
* @returns {boolean}
*/
export function supportsFileSystem() {
return "showDirectoryPicker" in window && !isIFrame();
}
Because people on Hacker News always complain about slow, bloated software I took pains to make my code fast. One of those pains was using an array instead of an object to represent a file system entry.
Wait, now HN people will complain that I’m optimizing prematurely.
Listen buddy, Steve Wozniak wrote assembly in hex and he liked it. In comparison, optimizing memory layout of most frequently used object in JavaScript is like drinking champagne on Jeff Bezos’ yacht.
Here’s a JavaScript trick to optimizing memory layout of objects with fixed number of fields: derive your class from an Array.
Deriving a class from an Array
Little known thing about JavaScript is that an Array is just an object and you can derive your class from it and add methods, getters and setters.
You get a compact layout of an array and convenience of accessors.
Here’s the sketch of how I implemented FsEntry
object:
// a directory tree. each element is either a file:
// [file, dirHandle, name, path, size, null]
// or directory:
// [[entries], dirHandle, name, path, size, null]
// extra null value is for the caller to stick additional data
// without the need to re-allocate the array
// if you need more than 1, use an object
// handle (file or dir), parentHandle (dir), size, path, dirEntries, meta
const handleIdx = 0;
const parentHandleIdx = 1;
const sizeIdx = 2;
const pathIdx = 3;
const dirEntriesIdx = 4;
const metaIdx = 5;
export class FsEntry extends Array {
get size() {
return this[sizeIdx];
}
// ... rest of the accessors
}
We have 6 slots in the array and we can access them as e.g. entry[sizeIdx]
. We can hide this implementation detail by writing a getter as FsEntry.size()
shown above.
Reading a directory recursively
Once you get FileSystemDirectoryHandle
by using window.showDirectoryPicker()
you can read the content of the directory.
Here’s one way to implement recursive read of directory:
/**
* @param {FileSystemDirectoryHandle} dirHandle
* @param {Function} skipEntryFn
* @param {string} dir
* @returns {Promise<FsEntry>}
*/
export async function readDirRecur(
dirHandle,
skipEntryFn = dontSkip,
dir = dirHandle.name
) {
/** @type {FsEntry[]} */
let entries = [];
// @ts-ignore
for await (const handle of dirHandle.values()) {
if (skipEntryFn(handle, dir)) {
continue;
}
const path = dir == "" ? handle.name : `${dir}/${handle.name}`;
if (handle.kind === "file") {
let e = await FsEntry.fromHandle(handle, dirHandle, path);
entries.push(e);
} else if (handle.kind === "directory") {
let e = await readDirRecur(handle, skipEntryFn, path);
e.path = path;
entries.push(e);
}
}
let res = new FsEntry(dirHandle, null, dir);
res.dirEntries = entries;
return res;
}
Function skipEntryFn
is called for every entry and allows the caller to decide to not include a given entry. You can, for example, skip a directory like .git
.
It can also be used to show progress of reading the directory to the user, as it happens asynchronously.
Showing the files
I use tables and I’m not ashamed.
It’s still the best technology to display, well, a table of values where cells are sized to content and columns are aligned.
Flexbox doesn’t remember anything across rows so it can’t align columns.
Grid can layout things properly but I haven’t found a way to easily highlight the whole row when mouse is over it. With CSS you can only target individual cells in a grid, not rows.
With table I just style <tr class="hover:bg-gray-100">
. That’s Tailwind speak for: on mouse hover set background color to light gray.
Folder can contain other folders so we need recursive components to implement it. Svelte supports that with <svelte:self>
.
I implemented it as a tree view where you can expand folders to see their content.
It’s one big table for everything but I needed to indent each expanded folder to make it look like a tree.
It was a bit tricky. I went with indent
property in my Folder
component. Starts with 0
and goes +1
for each level of nesting.
Then I style the first file name column as <td class="ind-{indent}">...</td>
and use those CSS styles:
<style>
:global(.ind-1) {
padding-left: 0.5rem;
}
:global(.ind-2) {
padding-left: 1rem;
}
/* ... up to .ind-17 */
Except it goes to .ind-17
. Yes, if you have deeper nesting, it won’t show correctly. I’ll wait for a bug report before increasing it further.
Calculating line count
You can get the size of the file from FileSystemFileEntry
.
For source code I want to see number of lines. It’s quite trivial to calculate:
/**
* @param {Blob} f
* @returns {Promise<number>}
*/
export async function lineCount(f) {
if (f.size === 0) {
// empty files have no lines
return 0;
}
let ab = await f.arrayBuffer();
let a = new Uint8Array(ab);
let nLines = 0;
// if last character is not newline, we must add +1 to line count
let toAdd = 0;
for (let b of a) {
// line endings are:
// CR (13) LF (10) : windows
// LF (10) : unix
// CR (13) : mac
// mac is very rare so we just count 10 as they count
// windows and unix lines
if (b === 10) {
toAdd = 0;
nLines++;
} else {
toAdd = 1;
}
}
return nLines + toAdd;
}
It doesn’t handle Mac files that use CR for newlines. It’s ok to write buggy code as long as you document it.
I also skip known binary files (.png
, .exe
etc.) and known “not mine” directories like .git
and node_modules
.
Small considerations like that matter.
Remembering opened directories
I typically use it many times on the same directories and it’s a pain to pick the same directory over and over again.
Asking for permissions
When it comes to accessing files and directories on disk you can’t ask for forgiveness, you have to ask for permission.
User grants permissions in window.showDirectoryPicker()
and browser remembers them for a while, but they expire quite quickly.
You need to re-check and re-ask for permission to FileSystemFileHandle
and FileSystemDirectoryHandle
:
export async function verifyHandlePermission(fileHandle, readWrite) {
const options = {};
if (readWrite) {
options.mode = "readwrite";
}
// Check if permission was already granted. If so, return true.
if ((await fileHandle.queryPermission(options)) === "granted") {
return true;
}
// Request permission. If the user grants permission, return true.
if ((await fileHandle.requestPermission(options)) === "granted") {
return true;
}
// The user didn't grant permission, so return false.
return false;
}
If permissions did not expire, it’s a no-op. If not, the browser will show a dialog asking for permissions.
If you ask for write permissions, Chrome will show 2 confirmations dialogs vs. 1 for read-only access.
I start with read-only access and, if needed, ask again to get a write (or delete) permissions.
Important note: verifyHandlePermission()
only works in secure context. One of the requirements for secure context is that it’s a user-initiated operation.
Example of user initiated operation is clicking on a button. Code executed in onclick
handler is executed in secure context
For wc
it’s not an issue because every filesystem operation starts as initiated by user via click.
If that’s not the case you’ll have to write a gnarly code to force UI interaction from the user when permissions are note granted. For example: you show a dialog with a button user needs to click to re-acquire permissions to file or folder and then re-do the operation.
Deleting files and directories
Deleting files has nothing to do with showing line counts but it was easy to implement, it was useful so I added it.
You need to remember FileSystemDirectoryHandle
for the parent directory.
To delete a file: parentDirHandle.removeEntry("foo.txt")
To delete a directory: parentDirHandle.removeEntry("node_modules", {recursive: true})
Getting bit by a multi-threading bug
JavaScript doesn’t have multiple threads so you can’t have all those nasty multi-threading bugs? Right? Right?
Yes and no.
Async is not multi-threading but it does create non-obvious execution flows.
I had a bug: I noticed that some .txt
files were showing line count of 0 even though they clearly did have lines.
I went bug hunting.
I checked the lineCount
function. Seems ok.
I added console.log()
, I stepped through the code. Time went by and my frustration level was reaching DEFCON 1.
You see, JavaScript has async where some code can interleave with some other code. The browser can splice those async “threads” with UI code.
No threads means there are no data races i.e. writing memory values that other thread is in the middle of reading.
But we do have non-obvious execution flows.
Here’s how my code worked:
- get a list of files (async)
- show the files in UI
- calculate line counts for all files (async)
- update UI to show line counts after we get them all
Async is great for users: calculating line counts could take a long time as we need to read all those files.
If this process wasn’t async it would block the UI.
Thanks to async there’s enough checkpoints for the browser to process UI events in between processing files.
The issue was that function to calculate line counts was using an array I got from reading a directory.
I passed the same array to Folder
component to show the files. And I sorted the array to show files in human friendly order.
In JavaScript sorting mutates an array and that array was partially processed by line counting function.
If series of events was unfortunate enough, I would skip some files in line counting. They would be resorted to a position that line counting thought it already counted.
Result: no lines for you!
A happy ending and an easy fix: Folder
makes a copy of an array so sorting doesn’t affect line counting process.
The future
No software is ever finished but I arrived at a point where it does the majority of the job I wanted so I shipped it.
There is a feature I would find useful: statistics for each extensions.
How many lines in .go
files vs. .js
files etc.?
But I’m holding off implementing it until:
- I really, really want it
- I get feature requests from people who really, really want it