There are few things that made me ship 690K zip archive instead of 2.3MB database or, once zipped, 1.1MB, but while the current saving doesn’t look too relevant, the same technique made me create a 3MB zip database instead of 280MB file, using a different source of data (check version 0.2, if curious).
Create optimized tables
As obvious as it sounds, most distributed SQLite databases usually just ship a big table with all fields in it, either because these were imported right away from .csv files, or because anyone providing data suggested that, as it is for the ip2location case.
Instead of following their structure, I’ve created a table for all distinct cities, all distinct countries, and one for latitude and longitude, and this made me already save 300MB of unnecessary, duplicated, entries, specially important once
.zip is the target.
Unzip as postinstall operation
.zip file is part of the package, the
postinstall is in charge of extracting it, as shown in the cjs/extract.js file.
To keep the target occupied size as small as possible, the
.zip itself gets removed, but something else happens …
Modify the SQLite db in postinstall too
Instead of pre-allocating the Full Text Search table in pre-production, I let the installer create it once the database has been extracted.
With the previous set of data, there were also indexes created in postinstall, because indexes can increase quite relevantly the database size, but as these are irrelevant for the package itself, why would I push extra bytes to npm at all, if it’s a one-off operation that can be performed on install, or update, and it doesn’t need to be performed ever again?
If you are including any SQLite database within your npm package, maybe you could reduce its distribution size too, by zipping, and optimizing only once the user installs its content.
The installation time might take a bit longer, but it’s nothing different from shipping a module that needs to compile source files or download pre-distributed version of some software, and SQLite is also fast enough to rarely be a slow-down bottleneck during installation.
You can try all this via:
npm i geo2city
and verify yourself it takes nothing to download and install it, including creating the Full Text Search table.
Any gotcha with this approach?
The only “gotcha” is testing, as the distribution folder structure is likely the one needed to properly package our module, but the Full Text Search table, or indexes, might be needed for testing and code coverage.
Well, in my geo2city case I’ve kept it simple, I run
npm run import to make the package ready to be published, and I run
npm run postinstall when I want to fully test its data and results.
I then need to use
npm run import before publishing, but this could also be a
.git hook before any push, so that we won’t forget to prepare the db.
Happy packaging! 🎉