JSDOM is awesome, but it’s slow at pretty much everything, except repeated
querySelectorAll, which is a “not so interesting” use case to me.
It’s also something downloaded 14M times per week, accordingly to npm stats, making it the most battle-tested, non-browser, DOM module available: hard to compete there.
There are alternatives too, such as basicHTML, but it never really had momentum and, after my latest tests, I’m glad it didn’t, as it’s surely faster than JSDOM, but it fails at pretty much everything, being born to solve mostly viperHTML and herey-ssr use cases only.
Nothing really scales though
We use JSDOM to test/cover a lot of code, and with small chunks of HTML, it’s kinda OK, and performance are not something you might think as an issue.
Oddly enough, the w3c.org produces errors in JSDOM:
But what caught my attention, was that parsing ~32K of document took ~145ms. OK, maybe it’s because there are parsing errors, so let’s try to parse the living DOM standard specification page instead, shall we?
And there we go: ~1.9 seconds to parse a ~2.7M document … and let’s remember that even if the page is served compressed, JSDOM needs to read it as plain text, so that inflating remote sites would only add time to the equation.
But let’s look further: just parsing this page increases 10x the amount of RAM, and simply crawling twice child nodes or children, keep increasing the heap, almost duplicated after cloning the
<html> node of such document, an operation that took ~1.2 seconds, but at least there are zero errors, and yet … is this kind of performance really acceptable?
I went ahead and checked against basicHTML:
OK, 1/4th time spent to parse makes sense, as basicHTML doesn’t nealry implement all standards provided by JSDOM, but less than half heap, for a library that pre define all childNodes and children arrays, to excel in performance with template literal tag based libraries, looks great:
- cheaper on parsing
- faster on crawling
- lighter on the heap
- … something is off with the childNodes count though … but that’s a secondary gotcha … keep reading
“OK then, I’ll just keep using basicHTML and call it a day”, but then I did another benchmark, this time with a ~12MB plain document that is the living HTML standard, linked here as multipage version, because it’s insane to have such heavy page as their main entry point!
Things are less nice here, no matter where you look at:
JSDOM took +9 seconds to parse, used 1G of heap memory for a 12M document, it went up to 1.5G after crawling, and it completely crashed on an attempt to deep clone the
<html> node. On the other hand, basicHTML kept its heap at 1/3rd of the one used by JSDOM, but it failed at everything else … and how useful is a tool that crashes or fails at everything?
Is that all we have in these ever-growing desire for SSR in NodeJS days?
A battle tested module able to crash the program, or one that’s so greedy optimizing stuff, that fails through its own inner optimizations?
Key takes here:
- 1/9th of JSDOM time to parse the same document
- 1/3rd of JSDOM initial heap memory, 1/4th after crawling
- no crashes on deep cloning
- … and 3714 divs removed in 3.12 milliseconds 🤯
But how is the last bit even possible?
There is no Tree
The little crazy idea I had behind linkedom is something I’ve been thinking for long time, after developing my template literals based libraries: instead of seeing it as a tree, the DOM can be seen as a line of contiguous segments, where each segment is a node, and nodes capable of containing other nodes are two linked nodes with nodes in between … are you still following?
This design choice makes it possible to remove 1 to thousand nodes by swapping left and right links … and gosh you can’t imagine my face when I’ve seen the results … let’s try again with the DOM page, ordered left to right by time spent to benchmark:
Key take for linkedom:
- faster at parsing and cheaper on the memory
- faster at DOM manipulations … like twice as fast at least
- linearly fast at crawling
The last point is, as mentioned at the beginning, where JSDOM is unbeatable: repeated
querySelectorAll operations are extremely fast, even faster than basicHTML pre defined arrays, as there’s no crawling whatsoever.
linkedom/cached exports exactly the same stuff, except it caches results for all crawling and selectors, resulting in faster performance than JSDOM in there too. See this tweet for a head to head comparison with the non-cached base.
On the other hand, linkedom is linearly fast, and since it takes 1/3rd of the time, and 1/3rd of the heap, to finish the benchmark, does it really matter if its selector engine is not that optimized?
Another note here: linkedom uses CSSselect, which is another industry standard, while basicHTML doesn’t have a real selector engine, which is why this benchmark uses simple selectors.
For completeness sake, I’ve removed the cloning part from the HTML standard page benchmark, to see if JSDOM can at least complete such test, and here is the result:
I guess by now we’ve learned how to read these results, but please be sure you understand JSDOM here takes seconds in parsing, crawling, or manipulating.
15 seconds to remove divs in JSDOM, compared to 2.6 milliseconds with linkedom, and 2.5 seconds to crawl childNodes, vs 230ms on average with linkedom … and same goes for children … are you sold yet?
I am already replacing every repository of mine that uses, either for testing or as browser-less dependency DOM env, basicHTML with linkedom, as I’ve managed over the weekend to reach feature-parity, except everything is better:
- Custom Elements, and builtin extends, supported better
- MutationObserver almost fully supported, except for characterDataOldValue property, which I don’t care much
- Ranges are better supported, and so are fragments, and specialized HTML classes
- a full CSS selector engine by default
- tests are also much better, and code coverage is fixed at 100%
The reason I’ll keep basicHTML around is to benchmark childNodes crawling performance against, because template literals libraries use cloneNode(true), template element, and, at least my libraries, the path to map updates is retrieved once via childNodes, so that it’s important to be fast there, and basicHTML, born to do that, is simply the fastest for that.
However, components created via template literals are not like a 12M whole document with 580K child nodes written manually, and since the linkedom crawling time for the w3c 32K page is less than 1ms over 1200 nodes, I guess linkedom will replace basicHTML everywhere pretty soon.
Beside all this personal brainstorming though, I hope you’ll try linkedom and see if it can help performance in your projects too, specially because wasting seconds with builds, is also a waste of energy … and builds run daily, if not hourly, if not per each Ctrl-S … so if a lighter alternative that works just as fine for your tasks exists, and, you choose it, think about the fact you’re also helping the environment ❤️ 🌴 ️