Catégorie: "Debian"

Debian running on Rust coreutils

Mars 9th, 2021

tldr: Rust/coreutils ( https://github.com/uutils/coreutils/ ) is now available in Debian, good enough to boot a Debian with GNOME, install the top 1000 packages, build Firefox, the Linux Kernel and LLVM/Clang. Even if I wrote more than 100 patches to achieve that, it will probably be a bumpy ride for many other use cases.
It is also a terrific project to learn Rust. See the list of good first bugs.

Even if I see Rust code every day at Mozilla, I was looking for an actual personal project (i.e. this isn't a Mozilla project) to learn Rust during the various COVID lockdowns.

I started contributing to the alternative Coreutils developed in Rust. The project aims at proposing a drop-in replacement of the C-based GNU Coreutils, and I wanted to evaluate if this could be used to run a regular Debian. Similar to what I have done with clang.debian.net a few years ago (rebuilding the Debian archive using clang instead of gcc).

I expect that most of the readers know what is the Coreutils. It is a set of programs performing simple operations (copy/move file, change permissions/ownership, etc). Even if some commands are from the 70s, they are at the base of Linux, Unix and macOS. While different implementations can be found, they are trying to remain compatible in terms of arguments, options, etc. This implementation of Coreutils isn’t different!

If you want to learn more about the history of Unix, I recommend this great Corecursive podcast with Brian Kernighan.

While a lot of people contributed to this project, much was left to be done:

  • missing programs to be implemented. See https://github.com/uutils/coreutils#utilities
  • missing options in the various programs
  • code not following latest Rust best practices
  • lack of consistency in the code base (ex: functions with too many arguments)
  • lack of tests/low code coverage
  • Lots of failures when running the GNU Coreutils testsuite (141 tests pass on 613) - Some trivial to fix, some others harder… A good way to start on a Rust project!

To start easy, I defined 4 goals for this work:

  1. Package Coreutils in Debian/Ubuntu
  2. Boot a Debian system with a Rust-based coreutils
  3. Install the top 1000 packages in Debian - including GNOME
  4. Build Firefox, the Linux Kernel and LLV/Clang

Packaging of Coreutils in Debian

Packaging in Debian isn't a trivial or even simple task. It requires uploading independently all the dependencies in the archive. Rust, with its new ecosystem and small crates, is making this task significantly harder.

The package is called rust-coreutils - https://tracker.debian.org/pkg/rust-coreutils

For Debian/Ubuntu users, to have an idea of the complexity of packaging such applications, just run
debtree --build-dep rust-coreutils | dot -Tsvg > coreutils.svg (should be around 1M).

Since it isn't production ready, the rust-coreutils is installable in parallel with coreutils. This package does NOT replace the GNU/coreutils files (yet?), the new files are installed in /usr/lib/cargo/bin/.

They can be used with:

export PATH=/usr/lib/cargo/bin/:$PATH

Or, uglier, overriding the files with the new ones.

Booting Debian with rust-coreutils

To achieve this, because I knew I would likely break the image a few times, I created a new project to quickly install a full Debian with PXE and preseed.

The project is available here:

https://github.com/opencollab/qemu-debian-install-pxe-preseed/

A script to create the full qemu image: build_qemu_debian_image.sh

A second script to boot on the newly created image: boot.sh

Then, building and installing coreutils on the system (yeah, it is ugly - don’t do that at home):

apt install rust-coreutils
cd /usr/lib/cargo/bin/
for f in *; do
cp -f $f /usr/bin/
done

First surprise, unlike the old init.d init system, as systemd is not relying on a series of scripts (it is mostly written in C), replacing the coreutils did not have an impact. Therefore, I didn't experience any issue during the boot process

Debian packages rely a lot on post-install scripts (stored in /var/lib/dpkg/info/*) to finalize and configure packages. They are (almost?) all using /bin/sh (or /bin/bash) to perform these actions. They intensively call coreutils applications.

For example, in /var/lib/dpkg/info/exim4-base.postinst, we can find:

install -d -oDebian-exim -gadm -m2750 /var/log/exim4

With an ugly script, we can test the installations of the 1000 most popular packages one by one.

Running this, some classes of issues in Rust/coreutils could be easily identified.

Implementing missing options

A significant number of problems could be easily identified as a lack of support for some options.

Here is a list of most of the fixes I had to implement to make this plan work:

Different behavior

Most of the programs behaved as expected. Here is a list of differences:

  • install doesn't support using /dev/null as source file
    Setting up libreoffice-common (1:6.1.5-3+deb10u6) ...
    install: error: install: cannot install ‘/dev/null’ to ‘/etc/apparmor.d/local/usr.lib.libreoffice.program.oosplash’: the source path is not an existing regular file
    A limitation of rust itself https://github.com/rust-lang/rust/issues/79390

Compile Firefox, Clang and the Linux Kernel

Build systems can vary significantly one from the other.

To verify their usage of coreutils, I built these three major projects

Firefox

As Firefox relies mostly on Python as a build system, it went smoothly. I didn’t encounter any issue.

The only unrelated issue that I noticed working on it was apt-key was broken because the script relied on a buggy option of mktemp.

Linux Kernel

I identified only two issues compared to GNU Coreutils:

  • The chown command on a non-existing symlink target doesn’t fail on the GNU version, the Rust one was triggering an error.
    https://github.com/uutils/coreutils/pull/1694
  • Linux kernel
    ln -fsn ../../x86/boot/bzImage ./arch/x86_64/boot/bzImage
    ln: error: Unrecognized option: 'n'

LLVM/Clang

The llvm toolchain relies on Cmake. Just like for Firefox, I didn’t face any issue.

Comparing with GNU coreutils using its testsuite

Recently, James Robson added a new test to run the GNU testsuite on the Rust/coreutils.

# TOTAL: 611
# PASS: 144
# SKIP: 86
# XFAIL: 0
# FAIL: 342
# XPASS: 0
# ERROR: 39
compared to 546 test passing with the GNU version. Even if a bunch of errors are just different outputs, it demonstrates that there is still a long road ahead.

Next steps & contribute

First, we will need more motivated contributors to work on this project. Many features remain to be implemented, optimizations to be done (e.g. decreasing the memory usage), etc.
I started to create a list of good first bugs for newcomers:

https://github.com/uutils/coreutils/issues?q=is%3Aissue+is%3Aopen+label%3A%22Good+first+bug%22
I will update this list of there is some interest for this project.

Helping improve the support of the GNU coreutils testsuite would be a huge step while being a great way to learn Rust!

Then, once it is in a better state, we will be able to make it a reliable alternative in Debian/Ubuntu to the GNU/Coreutils.

This might be also interesting for other folks who prefer a BSD license over a GPL.



Debian rebuild with clang 10 + some patches

Juin 2nd, 2020

Because of the lock-down in France and thanks to Lucas, I have been able to make some progress rebuilding Debian with clang instead of gcc.

TLDR

Instead of patching clang itself, I used a different approach this time: patching Debian tools or implementing some workaround to mitigate an issue.
The percentage of packages failing drop from 4.5% to 3.6% (1400 packages to 1110 - on a total of 31014).

I focused on two classes of issues:

Qmake

As I have no intention to merge the patch upstream, I used a very dirty workaround. I overwrote the g++ qmake file by clang's:
https://salsa.debian.org/lucas/collab-qa-tools/-/blob/master/modes/clang10#L44-47

I dropped the number of this failure to 0, making some packages build flawlessly (example: qtcreator, chessx, fwbuilder, etc).

However, some packages are still failing later and therefore increasing the number of failures in some other categories like link error. For example, qtads fails because of ordered comparison between pointer and zero or oscar fails on a -Werror,-Wdeprecated-copy error.

Breaking the build later also highlighted some new classes of issues which didn't occur with clang < 10.
For example, warnings related to C++ range loop or implicit int float conversion (I fixed a bunch of them in Firefox) .

Symbol differences

Historically, symbol management for C++ in Debian has been a pain. Russ Allbery wrote a blog post in 2012 explaining the situation. AFAIK, it hasn't changed much.
Once more, I took the dirty approach: if there new or missing symbols, don't fail the build.
The rational is the following: Packages in the Debian archive are supposed to build without any issue. If there is new or missing symbols, it is probably clang generating a different library but this library is very likely working as expected (and usable by a program compiled with g++ or clang). It is purely a different approach taken by the compiler developer.

In order to mitigate this issue, before the build starts, I am modifying dpkg-gensymbols to transform the error into a warning.
So, the typical Debian error some new symbols appeared in the symbols file or some symbols or patterns disappeared in the symbols file will NOT fail the build.

Unsurprisingly, all but one package (libktorrent) build.

Even if I am pessimistic, I reported a bug on dpkg-dev to evaluate if we could improve dpkg-gensymbol not to fail on these cases.

Next steps

The next offender is Imake.tmpl:2243:10: fatal error: ' X11 .rules' file not found with more than an hundred occurrences, reported upstream quite sometime ago.

Then, the big issues are going to be much harder to fix as they are real issues/warnings (with -Werror) in the code of the packages. Example: -Wc++11-narrowing & Wreserved-user-defined-literal... The list is long.
I will probably work on that when llvm/clang 11 are in RC phase.

For maintainers & upstream

Maintainer of Debian/Ubuntu packages? I am providing a list of failing packages per maintainer: https://clang.debian.net/maintainers.php
For upstream, it is also easy to test with clang. Usually, apt install clang && CC=clang CXX=clang++ <build step> is good enough.

Conclusion

With these two changes, I have been able to fix about 290 packages. I think I will be able to get that down a bit more but we will soon reach a plateau as many warnings/issues will have to fix in the C/C++ code itself.

Some clang rebuild results (8.0.1, 9.0.1 & 10rc2)

Mars 22nd, 2020

As part of the LLVM release cycle, I am continuing rebuilding the Debian archive with clang instead of gcc to evaluate potential regressions.

Processed results are available on the website: https://clang.debian.net/status.php - Now includes some fancy graphs to show the evolution
Raw logs are published on github: https://github.com/opencollab/clang.debian.net/tree/master/logs

Since my last blog post on the subject (August 2017), Clang is more and more present in the tech ecosystem. It is now the compiler used to build Firefox and Chrome upstream binaries on all the supported architectures/operating systems. More architectures are supported, it has a new linker (lld), a new hybrid IR (MLIR), a lot of checkers in clang-tidy, cross-language linking with Rust, etc.


Results

Now, about Debian results, we rebuilt using 8.0.1, 9.0.1 and 10.0rc2. Results are pretty similar to what we had with previous versions: between 4 to 5% of packages are failing when gcc is replaced by clang.

Some clang rebuild results (8.0.1, 9.0.1 &amp; 10rc2)

Even if most of the software are still using gcc as compiler, we can see that clang has a positive effect on code quality. With many different kinds of errors and warnings found clang over the years, we noticed a steady decline of the number of errors. For example, the number of incorrect C/C++ main declarations has been decreasing years after years:

Some clang rebuild results (8.0.1, 9.0.1 &amp; 10rc2)

Errors found

The biggest offender is still the qmake changes which doesn't allow the used workaround (replacing /usr/bin/gcc by /usr/bin/clang) - about 250 errors. Most of these packages would probably compile fine with clang. More on the Qt bug tracker. The workaround proposed in the bug isn't applicable for us as we use the dropped-in replacement of the compiler.

The second error is still some differences in symbol generation. Unlike gcc, it seems that clang doesn't generate some symbols (or adds some). As a bunch of Debian packages are checking the list of symbols in the library (for ABI management), the build fails on purpose. For example, with libcec, the symbol _ZN10P8PLATFORM14CConditionImplD1Ev@Base 3.1.0 isn't generated anymore. I am not expecting this to be a big deal: the generated libraries probably works most of the time. More on C++ symbol management in Debian.
I reported this bug upstream a while back: https://bugs.llvm.org/show_bug.cgi?id=30441

Current status

As previously said in a blog post, I don't think there is a strong intensive to go away from gcc for most of the Linux distributions. The big reason for BSD was the license (even if the move to the Apache 2 license wasn't received positively by some of them).
While the LLVM/clang ecosystem clearly won the tooling battle, as a C/C++ compiler, gcc is still an excellent compiler which supports more architecture and more languages.
In term of new warnings and checks, as the clang community moved the efforts in clang-tidy (which requires more complex tooling), out of the box, gcc provides a better experience (as example, see the Firefox meta bug to build with -Werror with the default warnings using gcc 9, gcc 10 and clang trunk for example).

Next steps

I see some potential next steps to decrease the number of failure:

  • Workaround the Qt/Qmake issue
  • Fix the objective-c header include issues (echo "#include <objc/objc.h>" > foo.m && clang -c foo.m is currently failing)
  • Identify why clang generates more/less symbols that gcc in the library and try to fix that
  • Rebuild the archive with clang-7 - Seems that I have some data problem

Many thanks to Lucas Nussbaum for the rebuilds.

Rebuild of Debian using Clang 3.9, 4.0 and 5.0

Août 24th, 2017

tldr: The percentage of failure is decreasing, Clang support is improving but there is a long way to go.

The goal of this initiative is to rebuild Debian using Clang as a compiler instead of gcc. I have been doing this analysis for the last 6 years.

Recently, we rebuilt the archive of the Debian archive with Clang 3.9.1 (July 6th), 4.0.1 (July 6th) and 5.0 rc2 (August 20th).

For various reasons, we didn't perform a rebuild since June 2016 with version 3.8. Therefor, we took the opportunity to do three over the last month.

Now, the 3.9 & 4.0 results are impacted by a build failure when building all haskell packages (the -no-pie option in Clang doesn't exist - I introduced it in clang 5.0). Fixing this issue with 5.0 removed more than 860 failures.

Also, for the same versions, a Qt compiler detection is considering that Clang is not a C++11 compiler because clang++, by default, defines __cplusplus as 199711L (-std=c++11 has to be added to define a correct __cplusplus). See https://bugreports.qt.io/browse/QTBUG-62535 for more information. Some discussions happened on the upstream mailing list about changing the default C++ dialect.
For example, with 4.0, this is causing 132 errors. With 5.0, probably thanks to a new Qt version, roughly the same number of packages are failing but because gcc just triggers a warning with the "nodiscard" attribute being incorrectly used when clang triggers an error.

In parallel, ignoring the haskell build failures, the numbers sightly increased since last year even if the overall percentage decreased (new packages being uploaded in the archive).

VersionBuild failuresIgnoring haskell pkgs
3.81367 / 5.6%
3.92274 / 8.1%1618 / 5.8%
4.02311 / 8.3%1655 / 5.9%
5.01445 / 5.1%

In parallel, new warnings and errors showed up in Clang.
This is causing a new set of build failures (especially with the usage of -Werror).

As few examples:
* Starting with 4.0, clang triggers an error ordered comparison between pointer and zero ('char *' and 'int').
* Similarly, with this version, -Wmain introduces a new warning which will trigger a warning when a bool literal is returned from main.
* clang also introduced a new warning called -Waddress-of-packed-member causing 5 new errors.
* With the same version, clang can trigger a new error when auto is used in function return type.

Now, as a conclusion, having Debian being built with clang by default is still a long shot.
First, when Clang became usable for a general audience, gcc was lagging in term of warning and error detections. Now, gcc is in a much better position than it was, decreasing the interest to have clang replacing gcc. In parallel, most of the efforts in term of warnings
and mistake detections are currently done under the clang tidy umbrella, making them less intrusive as part of this initiative (but harder to use and to deploy).
As an example, the gcc warning -Wmisleading-indentation has been implemented under a clang-tidy checker.
Second, the very permissive license of clang has been a key factor for some operating systems to switch like the PS4, Mac OS X or FreeBSD. With Debian, the community is generally happy with the GPL.
Third, the performances are similar enough that it is not worth the work, except for some projects with very special needs.

Last, despite that it is much easier to contribute to llvm/clang than gcc (not copyright assignment or actual review system for example), this isn't a big differentiator for most of the projects.

Of course, I will continue to run and analysis these rebuilds as this is a great source of information for clang upstream developers to improve the compatibility with gcc and understand some impacts. However, until there is a big game changer, I will stop pursuing the goal of having Debian switching to clang instead of gcc. I will stop effort on the debile project (which was aiming to rebuild in the background packages).

Rebuild of Debian using Clang 3.5.0

September 11th, 2014

Clang 3.5.0 has just been released. A new rebuild has been done highlight the progress to get Debian built with clang.

tl;dr: Great progress. We decreased from 9.5% to 5.7% of failures. Full results are available on http://clang.debian.net

At time of the rebuild with 3.4.2, we had 2040 packages failing to build with clang. With 3.5.0, this dropped to 1261 packages.

Fixes

With Arthur Marble and Alexander Ovchinnikov, both GSoC students, we worked on various ways to decrease the number of errors.

Upstream fixes

First, the most obvious way, we fixed programming bugs/mistakes in upstream sources. Basically, we took categories of failure and fixed issues one after the other. We started with simple bugs like 'Wrong main declaration', 'non-void function should return a value' or 'Void function should not return a value'.

They are trivial to fix. We continued with harder fixes like ' Undefined reference' or 'Variable length array for a non POD (plain old data) element'.

So, besides these one, we worked on:


In total, we reported 295 bugs with patches. 85 of them have been fixed (meaning that the Debian maintainer uploaded a new version with the fix).

In parallel, I think that the switch by FreeBSD and Mac OS X to Clang also helped to fix various issues by upstreams.

Hacking in clang

As a parallel approach, we started to implement a suggestion from Linus Torvalds and a few others. Instead of trying to fix all upstream, where we can, we tried to update clang to improve the gcc compatibility.

gcc has many flags to disable or enable optimizations. Some of them are legacy, others have no sense in clang, etc. Instead of failing in clang with an error, we create a new category of warnings (showing optimization flag '%0' is not supported) and moved all relevant flags into it. Some examples, r212805, r213365, r214906 or r214907

We also updated clang to silent some useless arguments like -finput-charset=UTF-8 (r212110), clang being UTF-8 compliant.

Finally, we worked on the forwarding of linker flags. Clang and gcc have a very different behavior: when gcc does not know an argument, it is going to forward the argument to the linker. Clang, in this case, is going to reject the argument and fail with an error. In clang, we have to explicitly declare which arguments are going to be transfer to the linker. Of course, the correct way to pass arguments to the linker is to use -Xlinker or -Wl but the Debian rebuild proved that these shortcuts are used. Two of these arguments are now forwarded:

  • -z keyword - r213198
  • -u Force symbol to be entered in the output file as an undefined symbol - r211756. This one fixed most of the haskell build failures. It fixed the most common issue that we had (701 occurrences but this does not mean that all these packages build fine now, some haskell-based package are failing later in the process)

New errors

Just like in other releases, new warnings are added in clang. With (bad) usage of -Werror by upstream software, this causes new build failures:

I also took the opportunity to add some further categorizations in the list of errors. Some examples:

Next steps

The Debile project being close to ready with Clément Schreiner's GSoC, we will now have an automatic and transparent way to rebuild packages using clang.

Conclusion

As stated, we can see a huge drop in term of number of failures over time:

Hopefully, Clang getting better and better, more and more projects adopting it as the default compiler or as a base for plugin/extension developments, this percentage will continue to decrease.
Having some kind of release goal with clang for Jessie+1 can now be considered as potentially reachable.

Want to help?

There are several things which can be done to help:

  • Point me common error patterns in the Not categorized list of errors to create new categories
  • Report and fix packages
  • As an upstream, integrate clang as part of your continuous integration system
  • Hack on cqa-scanlogs, the error detection tool to detect error patterns (example: Undetected error). This tool is used also for the regular rebuilds of the archive.

  • Improve clang.debian.net website

Acknowledgments

Thanks to David Suarez for the rebuilds of the archive, Arthur Marble and Alexander Ovchinnikov for their GSoC works and Nicolas Sévelin-Radiguet for the few fixes.