avatar

to wasm and back again: the essence of portable programs

what is the essence of a portable program, a program expressed in a format such that it can then be interpreted or, ideally, compiled to run efficiently in a variety of different environments?

this doesn't just mean "different OSs" or "different CPU architectures", but can even expand to compiling programs to different forms of computing like GPU code or maybe even FPGAs.

when we tried to figure some of this out for web pages, we came up with "Native Client"; but it eventually became clear that that the LLVM intermediary representation that it uses wasn't a good fit for a variety of reasons, so we eventually settled on WASM; and now everyone is moving in that direction very fast despite its various issues.

alas, there can be a variety of factors that influence how a program should be compiled to take full advantage of the machine it's running on:

more importantly, effects of questions like those can ramificate upwards: a change in pointer size (say between 32 and 64) or knowledge about the cache sizes, or cache-fetching or branch-prediction algorithms of a CPU should be able to lead to an entire data structure "choosing" a very different implementation (not just for algorithms, but also but for memory layout!). and those changes are even more far-reaching once parts of this algorithm (such as accessor functions for data structures) get inlined in various ways in other parts of the code. WASM just assumes some relatively common invariants and that's it; there's no ability to provide entirely different algorithms based on even pointer size or alignment requirements.

so, to be able to make arbitrarily complex decisions based on those environmental conditions, a portable program should not be just a flat WASM, but should in fact be a dynamic metaprogram which, upon initialization, examines the environment and makes all the right choices to produce a code that is able to run optimally on the target environment. ideally, that metaprogram itself should be written in yet another meta²program; but this one compiles the meta¹program for the compiling environment rather than the target environment (those can be different! think of a meta²program, producing for the CPU a meta¹program whose goal is to produce for the GPU the end program). ultimately, though, we need some form of basic language to bootstrap this whole process, at the top of the tower of metaⁿprograms: this is where WASM can come back.

such metaⁿprograms should expect to interact with an API that would be like that of a JIT compiler library, with functions like create_function(parameter_types, return_types) → FunctionId or add_function_call(function_id, parameter_variables) → ValueId used to generate pieces of the metaⁿ⁻¹program. ideally, different metaⁿprograms from different places could even end up getting their functions inlined with each other; a library generated by a metaprogram, and a user program generated by another metaprogram from another vendor but using that library, should be able to be inlined with each other, rather than "naively" loaded like current dynamic libraries. maybe their two metaprograms should even be able to "negotiate" optimizations with one another using specification contracts, but this seems hard to set up.

if "object programs" (meta⁰programs) — but also higher up metaⁿprograms — are expected to be safely sandboxed, the best way to do this might not be to dynamically check everything and then hope optimization can remove some checks, but instead the metaⁿ⁺¹program that produces them should be able to manipulate logical statements to "logically prove" to the compiler that the program being produced is safe; and adding a dynamic check would be just one way to guarantee this safety. the point is, demonstrating safety should be able to, like code generation, be an arbitrarily complex process, rather than a very strict one limited to whatever type system is available, and whatever hacks one can build on top of it.

ideally, metaⁿprograms should also be able to tap into a collection of optimizing code-transformation libraries, which could be updated regularly such that old programs can benefit from new optimizations; but should be proven to be correct such that this doesn't affect behavior we care about. in this way, logically proving behavior is not just a matter of sandboxing or program safety, but also a matter of optimization even in "unsafe" programs.

this approach, in some ways despite its lesser dynamicity, is more general than "runtime metaprogrammability" in that the metaprogram is able to create a mesaprogram (the opposite of a metaprogram) for a vastly different target environment than the one it is itself running on.


RSS feed available here; new posts are also linked on my twitter.
CC_ -1 License Unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
This site lives at https://carado.moe and /ipns/k51qzi5uqu5di8qtoflxvwoza3hm88f5osoogsv4ulmhurge2etp9d37gb6qe9.