Pros and cons of dynamic linking versus static linking - here we go again

Regarding the ongoing discussion on the pros and cons of operating systems using dynamic linking of application provided vs static linking - Jesse Smith in the DistroWatch Q and A addressed most of the concerns I had with the original article, by Drew DeVault, that sparked this discussion, but I think the question of the disk space still needs to be addressed more fully. So:

Wouldn't statically linked executables be huge?

Not necessarily so - would they be larger? Sure - the original articles does some hand-waving about how many dynamically loaded symbols are directly referenced by a linking executable and concludes that as on average applications directly reference only 4.6% of symbols exported from the libraries they consume, and modern compilers are good at eliminating code that isn't used, the size difference can't be that significant.

So lets look at an example: the cURL program is available on many operating system as it is useful tool to interact with the internet. It is a good example, in my opinion, as while it has a rather simple facade, it has a lot of internal complexity as it attempts to handle anything the internet might throw at you (and not only with HTTP, it supports FTP, IMAP, SMB and more). It also uses a fair amount of features from external libraries from OpenSSL to SQLite.

The Drew's symbol analysis code from the above referenced article reports /usr/bin/curl on Ubuntu 20.04 to be using 6.8% of the symbols exported by the dynamic libraries it is linking with, so it is still in the ballpark of what you'd expect if you'd read Drew's article.

The problem with that analysis is that it assumes that counting the number of directly referenced symbols maps well to the additional code size that will be required in the application if we bundle the libraries into the executable: 6.8% use translates to 6.8% (of the size of all libraries) executable size increase. But this is very likely not true! The reason libraries are even a good idea, in and of itself (regardless of the discussion of dynamic vs. static linking) is that they hide a lot of complexity behind a small API. Usually libraries handle all that internal complexity by being made out of a lot of small pieces that call each other to implement complex logic. My application may only call into a library in a couple of places, but those functions will call other functions inside that library (and in other libraries) to do what I want them to do.

How can we even measure that? We'd need to build a tool that traces not only which exported symbols my application uses, but also what symbols those symbol use, and what those other symbols use, and so on and so forth - recursively to the leafs of the tree. Fortunately - such a tool already exist! We call this tool "a compiler". As asserted in Drew DeVault's article, a modern compiler walks through the tree of symbols an application uses and only includes what code it needs to make sure everything needed is available, trimming out everything else. So lets just get down to it - if we compile cURL statically, how large would it be? Because cURL uses quite a few libraries (45 on Ubuntu 20.04) and each and everyone of them has to be built statically to be able to build cURL statically, and I'm pretty lazy, I would answer that by looking at Stali Linux - a completely statically linked Linux operating system, whose completely built "root filesystem" can be downloaded or browsed in their GitLab repository. By the way, the Stali FAQ spends quite a bit discussing the space efficiency of static linked executables, including memory consumption - which is a whole new can of worms that I will not be getting into here, but is worth exploring when considering a modern workstation operating system.

On Stali Linux, the cURL executable is 2.8MiB in size - that is quite a bit more than the same executable on Ubuntu 20.04, where it is only 236KiB in size. This is of course not the whole story as in Ubuntu 20.04 cURL requires an additional 45 dyanmically loaded libraries weighting in at 19.8MiB(1). Assuming that cURL under Stali uses has the same set of features (which it doesn't - mostly because Stali hasn't seen an update in over 3 years and its current version is using the - even older - version 7.48 of curl, but it also doesn't use a lot of featuers you do get on Ubuntu such as support for IDN, HTTP2, SSH, Kerberos and others). I'm going to ignore that for now (but we'll get back to it much later) and just conclude that Stali's cURL executable is 14% more space efficient than Ubuntu installation. That is more than twice what you'd expect from the symbol analysis! The actual figure, excluding all the libraries that the Ubuntu's cURL uses and Stali's doesn't is actually 21%.

This shows that the analysis of directly referenced symbols does not provide a good estimate of the size effiency of statically executed binaries as to make the application work, the compiler has to grab a lot more than just those directly referenced symbols from the application.

So to summarize:

  • Q - Wouldn't statically linked executables be huge?
  • A - Yes, they would - often ten times larger or much more. They will still be smaller than the sum size of a dynamically linked executable and its libraries - but not by as much as you'd expect - maybe a fifth.

But, you might be asking, wouldn't that still make it a net benefit to have the entire OS statically linked? I would like to save 80% on my disk space bill.

Well, the thing is - as we've just shown - most of the bulk of a dynamically linked executable is in the libraries, the executable itself is just a tiny part of the code size of a dynamically linked executable. While cURL is 236KiB, Firefox - if we take another example - which does almost everything that cURL does and oh so mmmuuuccchhh more has an executable that is 688Kib on Ubuntu 20.04 - just 450 more KiB for a massive graphical interface, virtual machines, databases, sandboxing, etc. It is all in the dyanmically linked libraries - which are shareable! And re-use of dynamic libraries actually pays back very well. cURL is actually a good example of such re-use: most of the cURL logic is in its shared library - libcurl - and there are quite a few application that use it to support cURL like features for getting and putting stuff off of the internet with all the weird and exciting complexit of the the internet. Looking at my system, I can see(2) that libcurl is used by 22 OS-provided applications, other than cURL itself - which puts it in the top 20% of most used dynamic libraries on my system. Using the above 21% as a guideline, having a library used by 5 executables will a net disk size reduction compared to statically linking and libcurl definetly passes that threshold - so building and using cURL dynamically saves a lot of disk space.

But how does this apply to the entire operating system? According to Drew DeVault, "Over half of your libraries are used by fewer than 0.1% of your executables" - so do benefits like we get from cURL apply system wide? Taking the 21% "less space than statically linking" for building cURL statically I wanted to see what might happen if all OS provided executables were delivered as statically linked executables and all the dynamic libraries removed - would we gain or lose space and how much?

The results on my system(3) are that dynamic libraries being used by executables weight in at 1.43GiB, while the suspected cost to apply 21% of that cost to each using executable would be 7.29GiB. So re-use of dynamic libraries serves about 5 times reduction in disk space!

As we discussed Stali Linux before, as an example of a completely statically linked operating system, then another way to look at the same question is to just do a straight up comparison of how much does a Stali Linux installation takes vs. an installation of a dynamically linked operating system, like Ubuntu. It is not really an apples to apples comparison as Stali is much more minimal than most operating systems. Still, it is worth noting that:

  • Stali's rootfs ( https://gitlab.com/garbeam/rootfs-x86_64 ) bin directory has 212 files weighing 50.2MiB for an average of 242KiB per application.
  • ubuntu:20.04 docker image (which is supposedly similarly minimal to a Stali installation) has 385 executables in its OS application directories(4) with 68.2MiB between bin, sbin and the library directories (that contain things other than dynamic libraries, but I decided not to bother with filtering those out) for an average of 171KiB per application.

This is not as pronounced a difference, but still clearly shows that dynamic linking is a considerable net disk space positive - 30% saving in the above test, not taking into account that amount of extra functionality that a minimal Ubuntu provides over Stali.

P.S. Yet Another Point For Dynamic Linking

Something else worth noting about the advantage of dynamic linking: it is much better for downstream developers. If I'm developing an application that uses libcurl on Ubuntu, the dev package includes just the things that I need: some header files and a tiny static library that just causes the compiler to link with the dynamic libcurl library. After building I know for a fact that whatever I do with libcurl I'll get identical results to running curl on the command line, making it easier to debug problems with my code (if the behavior is different compared to curl command line, then my code is doing something incorrectly). On Stali Linux there are no dev packages, but if I imagine that there are, it would likely be a complete source dump of the curl repo (which it kind of is where I would first need to build the library locally and when my application behaves differently than the curl command line - who knows why? it could be because I'm doing something wrong, or because my build process is somewhat different than Stali's, or because we are not using the same version - it is impossible to tell without a lot more additional testing.

Scripts

1) ldd $(which curl) | perl -nle 'm,=> (/\S+), and push @libs,$1; END{ foreach $lib (@libs) { @st=stat $lib; $sum+=$st[7]; } print $sum/1024/1024;}'

2) find /usr/bin -type f -executable | while read file; do ldd "$file" 2>/dev/null | grep -q libcurl && echo $file; done | wc -l

3) find /usr/bin -type f -executable | xargs ldd 2>/dev/null | perl -nle 'm,=> (/\S+), and $libs{$1}++; END{ foreach my $lib(keys %libs){ @s=stat $lib; $actual+=$s[7]; $cost+=($s[7]*0.21*$libs{$lib}); } print "Actual total size: $actual"; print "Expected total cost: $cost";}'

4) for dir in /usr/bin /usr/sbin; do find $dir -type f -executable; done | wc -l; du -ks /usr/bin /usr/sbin /usr/lib* | perl -nle 'm/^(\d+)/ and $sum+=$1; END { print $sum/1024; }'

License

Licensed under CC0 license: To the extent possible under law, I hereby waive all copyright and related or neighboring rights to this article.


You'll only receive email when Oded Arbel publishes a new post

More fromĀ Oded Arbel