Replies: 6 comments 17 replies
-
btoa(String.fromCharCode(...data))this is a slippery slope for various reasons:
That being said, everything else is more than welcome but I need to double check technically this is the best we can offer although I do admire the effort and ideas you put into this to date and I'd be more than happy to finalize a concrete solution around this, of course with your review (or mine) needed to validate we're all good with the idea/proposal and results ... so, thanks a lot for digging this path, it all makes sense to me, the devil is in the details and we should try to provide the best/fastest details we can around it, I hope you would agree on that. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the response. Yeah, I know about the The important point I wanted to make was it appears we could freeze the filesystem relatively easily via the Emscripten FS API. But as you say, the devil is in the details. But that's where the fun lies. 😉 "Devils to be detailed" that I want to explore 😈:
|
Beta Was this translation helpful? Give feedback.
-
|
@ntoll I took a chance to revisit your logic so these are my variants:
function asNode({ mode, mtime }, data) {
return { mode, mtime, data };
}
function freezeDirectory(FS, path) {
const entries = FS.readdir(path);
const result = {};
for (const entry of entries) {
if (entry === '.' || entry === '..') continue;
const fullPath = path === '/' ? '/' + entry : path + '/' + entry;
try {
const stat = FS.stat(fullPath);
if (FS.isDir(stat.mode)) {
// Directory: recursively freeze its contents.
result[entry] = asNode(stat, freezeDirectory(FS, fullPath));
}
else if (FS.isFile(stat.mode)) {
const data = FS.readFile(fullPath, { encoding: 'binary' });
// TODO: the [...array] conversion *might* not be necessary,
// although this way it works as JSON too.
result[entry] = asNode(stat, [...data]);
}
else if (FS.isLink(stat.mode)) {
result[entry] = asNode(stat, FS.readlink(fullPath));
}
}
catch (e) {
// Log stuff that can't be accessed.
console.warn(`Skipping ${fullPath}: ${e.message}`);
}
}
return result;
}
function freezeFileSystem(FS) {
return freezeDirectory(FS, '/');
}
function restoreDirectory(FS, path, frozen) {
for (const [name, node] of Object.entries(frozen)) {
const fullPath = path === '/' ? '/' + name : path + '/' + name;
if (FS.isDir(node.mode)) {
try {
FS.mkdir(fullPath);
} catch (e) {
// Directory already exists, just ignore.
}
restoreDirectory(FS, fullPath, node.data);
}
else if (FS.isFile(node.mode)) {
FS.writeFile(fullPath, Uint8Array.from(node.data));
FS.chmod(fullPath, node.mode);
}
else if (FS.isLink(node.mode)) {
FS.symlink(node.data, fullPath);
}
}
}
function restoreFileSystem(FS, frozen) {
return restoreDirectory(FS, '/', frozen);
}now ... this version of mine:
in comparison, for some reason, your version fails at restoring so that all timings are kinda irrelevant but I still believe my version is easier to reason about, there's less magic involved, those buffers will compress well after gzip ... what do you think? your screenshot, if interested:
|
Beta Was this translation helpful? Give feedback.
-
|
OK, keeping the discussion relevant, now that we have an easy/fast enough way to produce a
My concerns I'd like to validate is that micropip might be deeply integrated in Pyodide, as example, so that maybe if modules don't get to be requested via it or registered somehow, things might break ... but I think we need to be sure that once we manage to freeze the FS, bootstrapping from it will actually work. In an ideal world, we should be able to bootstrap the FS ourselves automatically, but let's do one step after the other ... I will update this with results, once I have some! |
Beta Was this translation helpful? Give feedback.
-
UpdateI have everything finally working as desired:
I've published a module so I could test online and on GitHub pages with ease, the module is called emscripten-fs-blob and you can test it live on GitHub pages (remember to open the devtools console). That live demo bootstraps pyodide with matplotlib and everything else without needing a config to do so, and it "smokes" any other variant/cached alternative we offered to date but keep in mind we are using IndexedDB to cache the frozen micropip env, here we're using a fetch to retrieve the data. If needed, I can improve the test page to compare gzip VS plain Blob so we can see how the browser cache could help too in there because decompressing is actually not super fast despite me using the latest/greatest Web APIs to do so. The streaming bit has been left behind because Here a screenshot of various tiny benchmarks and related details (around operations):
I am confident you'll have similar results on your machine but we all know CPU matters specially for decompression related operations, yet I am extremely happy and glad this variant of @ntoll original idea really works well. To be discussedPart of the final size amount is due As summary
Happy to clarify or expand on any point, right now I hope this is a welcome feature pre x-mas time, and that people would play around and see how powerful is this possibility to entirely freeze a whole environment so that it will reproduce exact same results every single time it gets to be bootstrapped 👋 |
Beta Was this translation helpful? Give feedback.
-
|
Wow, this feature has really developed quickly! I think what you've got here suits PyScript well. But let me just say a few words related to MicroPython and its existing ROMFS feature:
So I think ROMFS would still have its place in MicroPython webassembly, independent to PyScript's frozen filesystem feature. Or maybe we could somehow make a PyScript frozen filesystem be also memory mappable... then bytecode could be executed directly from it. I have a branch with ROMFS enabled for the webassembly port here: https://github.com/dpgeorge/micropython/tree/webassembly-combined-patches . To use it it's very simple, you just need to load a ROMFS as a const { loadMicroPython } = await import(`${base}/micropython.min.mjs`);
const romfs = new Uint8Array(await (await fetch("app.romfs", {responseType:"arraybuffer"})).arrayBuffer());
const mp = await loadMicroPython({ url: `${base}/micropython.wasm`, romfs: romfs }); |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
-
TL;DR Could we use Emscripten's filesystem API to freeze and archive the complete Python runtime state (including installed packages) into a single downloadable asset? This would eliminate the need for pip-based package installation on every page load, significantly reducing startup time.
Currently, PyScript's startup process works like this (especially where Pyodide is concerned):
micropip.While we cache downloaded files in local storage to speed up subsequent loads, there's still a significant cost: on every page load, packages must be copied and installed into the browser's virtual filesystem. This installation can only happen after the Python interpreter is running, since we rely on Pyodide's
micropipfor the installation process.I'd like to explore a different (perhaps more efficient) approach ~ instead of installing packages on every load, we could:
This mitigates the costs of pip/files related operations on page load. The archived filesystem is downloaded once (and cached), then simply unarchived on subsequent loads. That's it!
It turns out great minds think alike (or fools seldom differ 😛) because @dpgeorge (MicroPython maintainer and friend-of-PyScript) has created something similar called RomFS for MicroPython. In his approach he:
makecommand to "freeze" a directory of files into a byte array (the RomFS).romfsfeature mounts the frozen filesystem at a specified mount point.Damien described RomFS to me as similar to an ISO image for a CD rom - a frozen snapshot of the filesystem at a point in time. This shows the approach is viable and that others have found it necessary for performance.
Enter the Emscripten Filesystem API, as a possible approach that could work with both Pyodide AND MicroPython.
The approach I've initially taken is to use the public Emscripten FS API to gather/write file contents and directory structure. This approach will work no matter the underlying implementation detail of the filesystem but requires many calls to the Emscripten FS API. There is an alternative approach (which I initially considered, but abandoned because it depends on implementation details rather than the public API): the default implementation of the Emscripten FS is MEMFS (basically a byte array), although other implementations are possible, and we could directly access MEMFS's internal node structure for better performance.
But I think I prefer the simplicity of the approach I've taken: in the following code, the
interpreterobject is a reference to either MicroPython or Pyodide, both of which have aFSreference to the underlying Emscripten FS API. It's basically a recursive creation of an object whose keys are the names of directories or files, and whose values are other similar objects (for subdirs) or base64 encoded raw data for files - done for simplicity's sake:To restore the filesystem you could do something like this:
Which is the same thing but in reverse... called like this:
restoreDirectory(interpreter, '/', frozen);wherefrozenis the frozen "object" created byfreezeFilesystem.I've tried my simple PoC code on a page like this:
I suspect I've missed many edge cases, but at least it works.
This code is merely to illustrate an approach and should not be considered as something production ready. Of course, we'd need some way to extract and zip up the object created by
freezeFilesystem, and this can be done if/when we choose to go ahead with this work. But the essentials are all there. 🚀Thoughts, ideas, constructive critique, and feedback most welcome.
Beta Was this translation helpful? Give feedback.
All reactions