This gave me quite some thinking. I guess some story telling will explain some of the rationale that lead to the current implementation. The two main options on the table were:

Linker approach. Generating code with something that is essentially a C tool-chain using the system linker would have appeared as the first option. Still was unclear to me how to cope with side effects that are not barely function definitions. I guess some modification in the actual code-base would have been necessary. Also making something future proof for subsequent dumps of the Lisp image looked quite challenging. Including compiled functions into the image dump. Including single functions into the dump image had its difficulties too. Essentially I came to the conclusion that trying to break in functions an already compiled and linked shared library, store and then reload functions would translate into: having to parse the elf, re-implement part of the loader, make assumptions on the implementation etc… In short searching for troubles.

In my mind I formed the opinion that breaking the compilation unit in pieces was a mistake and I just had to cope with that. I then came up with the plan of storing each full eln file into the image dump.

This could work but the issue is that loading a shared library that resides into memory and not in a file came up not being completely trivial.

dlopen takes a filename as a parameter and I guess (I could be wrong) the secret reason for that is that the dynamic linker is just a program… and yes! In Unix programs work on files…

I then found about memfd_create as possible work-around. This a syscall present in recent Linux kernel that let you create a file descriptor from a piece of your user space memory without having to do any real file copy or real file-system access. The file will appear in some /proc sub-folder.

I wasn't very keen on this option because of it being non portable but it looked like the last option and some alternative equivalent to memfd_create seemed to exists for other non Linux kernels.

I was already half way into the implementation when I stepped back and wanted to rethink to the problem as a whole.

Why the Emacs dump image mechanism was created?

Startup time is the motivation for that.

The fact that the result is a single file (or two in case of the portable dumper) is, I think, just a (nice) side effect.

Having a single file to load is faster mainly because objects in it are serialized in a binary format and don't need to go through the reader (as it is the case for elc files).

But thinking about functions into eln files are already serialized being compiled into machine code and the action of calling dlsym is just doing a lookup asking for the entry point address into memory.

So what's the motivation of dumping if we have already eln files? Couldn't we just load them?

No. This would be sub optimal because still in the eln file format there is a piece of data left that goes through the reader during its load. These are the objects used by functions. We are talking about a vector that is the per compilation unit aggregation of the original constant vectors . And last but not least is absolutely nice to have a system where all the enviroment state is dump-able.

Given all these constraints and thoughts my final solution is:

store into the compilation unit Lisp object (already defined for the gc support) a reference to the eln file originally loaded.

Lisp object (already defined for the gc support) a reference to the file originally loaded. at dump-time just dump the compilation unit object but do not include the eln file content in the dump file.

object but do include the file content in the dump file. at compilation unit load time (during relocation to use precise pdump nomenclature) revive it with a dlopen on the original eln .

load time (during relocation to use precise pdump nomenclature) revive it with a dlopen on the original . after all compilation units have been relocated proceed with native functions relocating all function pointer doing dlsym into the compilation unit.

The good side effect of this mechanism is that the data vector does not have to be read by the reader because is naturally dumped and restored by the pdumper infrastructure.

Am I cheating then? Maybe, but I think this solution is quite sound. Nevertheless in case having a unique file is an hard requirement falling back to an memfd_create like solution would be just an incremental step on the implemented mechanism.