What happened to XPath?

Andrea Giammarchi
4 min readOct 13, 2020
Photo by Jonas Denil on Unsplash

After being revolutionary in 2007 to unleash CSS selectors that never existed, the specification moved from v1.0 to the current, recommended, v3.1, but browsers stopped with v1.0 … and we’re missing out!

XPath vs CSS

The fundamental difference between these two standards is that one was born to crawl and analyze the DOM tree in all “axes”, while the other one job is to live-style the DOM, without any way to crawl its content.

That’s it: XPath is a searching tool while CSS is a real-time “drawing” tool, hence rich in functionalities to style, but poor in terms of DOM tree analysis.

What XPath can do that CSS cannot

Here an example: even if the CSS :has(...) selector is part of the most recent specifications, no browser has implemented it to date, while in XPath is:

// CSS a:has(b)
// XPath
.//a[count(.//b) > 0]

and that’s scratching only the surface of what’s possible to do in XPath, but it’s not possible via CSS.

// pseudo CSS
// span:contains('some text') ! div[class="wrap"]
// XPath
.//span[contains(normalize-space(),"some text")]/ancestor-or-self::div[@class="wrap"]

In this example we find an element that matches any expression we could have within [ squared brackets ] and then crawl the DOM tree up to find its wrapper, the same way element.closest('div.wrap') would do, except with XPath the condition is checked through the browser, no JS needed, no if (el.closest(...)) to verify: if it’s found, it’s returned.

Strawberry on top, XPath can query, and return, text nodes too, something CSS wouldn’t care/know at all!

How to test XPath

To start with, check this awesome CSS to XPath translator, read extra cases provided by such translator, and check its source code to find even more extras, such as icontains(...) to match anything in a case-insensitive fashion, then open devtools and type $x to realize there is an XPath utility provided out of the box. Try $x('//body') to read out the result, and play around with any sort of query you can think about.

What’s missing in XPath 1.0

Unfortunately, in 2007, the year XPath appeared as super-charged CSS selector in most famous frameworks, is the same year somebody decided that it wasn’t planned to update the XPath engine any further, so we’re stuck with v1.0, at least consistently implemented across browsers.

Here a cross browser, and cross env function, that replicates what devtools $x(...) helper brings in:

// basic XPath helper that works with JSDOM too
function X(Path, root = document) {
const flag = XPathResult.ORDERED_NODE_SNAPSHOT_TYPE;
const query = document.evaluate(Path, root, null, flag, null);
const result = [];
for (let i = 0, {snapshotLength} = query; i < snapshotLength; i++)
result.push(query.snapshotItem(i));
return result;
}

… and now that we have a way to perform powerful queries, all we are missing is the ability to use regular expressions, something introduced in 2010 via XPath 2.0, and yet something never shipped in any browser … or is it?

… but, what happened?

Since the current version is 3.1, I believe there are still plenty of use cases out there for XPath, but when it comes to Selenium tests, it looks like everyone is updated with at least v2.0 … but why on earth if XPath is considered such a powerful tool for crawlers, the browser shouldn’t provide JS a way to unleash full XPath potentials too?

The best part of XPath is that virtually no JS is needed to perform tons of operations in one go, instead of querying one target, verify via JS its surrounding conditions, then find a possible parent, then … we all know this dance, and it’s boring, slower, and more error prone than any XPath query could be, as if there’s no target node, there is no target node, nothing to check/filter, the engine does that for us already, and this is gold!

Please update XPath or bring in RegExp

This is like “my Xmas wish” for the Web platform: there is a very powerful querying language that runs natively in the browser and could help reducing a lot of hand written JS operations to reach desired nodes, plus, there is a language born to satisfy any DOM tree crawling need, targeting nodes in every direction, and adding RegExp to the equation would make complex search a “one liner” so I hope browser vendors would ship ASAP what could be a renascence of XPath for the modern, and complex, SPAs and PWAs world out there.

Thank you for considering this improvement to the platform, and thanks for reading ♥️

P.S. I’ve landed a proposal hoping vendors will be interested and will listen

--

--

Andrea Giammarchi

Web, Mobile, IoT, and all JS things since 00's. Formerly JS engineer at @nokia, @facebook, @twitter.