Well, it is an understatement to say my previous blog post interested many people. Many articles, blog posts and some podcasts talked about it! As we pushed coreutils 0.0.12 a few days ago and getting closer to the 10 000 stars on github, it is now time to give an update!
This has brought a lot of new contributors to this project. Instead of 30 to 60 patches per month, we jumped to 400 to 472 patches every month. Similarly, we saw an increase in the number of contributors (20 to 50 per month from 3 to 8). Two new maintainers (Michael Debertol & Terts Diepraam) stepped in and have been doing a much better job than myself as reviewers now! As a silly metric, according to github, we had 5 561 clones of the repository over the last 2 weeks!
The new contributors focused on:
Refactoring the code to simplify the maintenance. Examples:
Closing the gap with GNU
As far as I know, we are only missing stty (change and print terminal line settings)
as a program.
Thanks to some heroes, basenc, pr, chcon and runcon have been implemented. For example, for the two last programs, Koutheir Attouchi wrote new crates to manage SELinux properly. This crate has been used for some other utilities like cp, ls or id.
Leveraging the GNU testsuite to test this implementation
Because the GNU testsuite is excellent, we now have a proper CI using it to run the tests. It is pretty long on the Github action CI (almost two hours to run it) but it is an amazing improvement to the way we work. It was a joint work from a bunch of folks (James Robson, Roy Ivy III, etc). To achieve this, we also made it easier to run the GNU testsuite locally with the Rust implementation but also to ignore some tests or adjust some error messages (see build-gnu.sh and run-gnu-test.sh).
Following a suggestion of Brian G, a colleague at Mozilla (he did the same for some Firefox major change), we are now collecting the history of fail/pass/error into a separate repository and generating a daily graph showing the evolution of regression. At this date, we have, with GNU/Coreutils 9.0:
Total | 611 tests |
Pass | 214 |
Skip | 84 |
Fail | 298 |
Error | 15 |
We are now automatically identifying new passing tests and regressions in the CI.
For example:
Warning: Congrats! The gnu test tests/chmod/c-option is now passing!
<br />Warning: Congrats! The gnu test tests/chmod/silent is now passing!
<br />Warning: Congrats! The gnu test tests/chmod/umask-x is now passing!
<br />Error: GNU test failed: tests/du/long-from-unreadable. tests/du/long-from-unreadable is passing on 'master'. Maybe you have to rebase?
[...]
<br />Warning: Changes from master: PASS +4 / FAIL +0 / ERROR -4 / SKIP +0
This is also beneficial to GNU as, by implementing some options, Michael Debertol noticed some incorrect behaviors (with sort and cat) or an uninitialized variable (with chmod).
Documentations
Every day, we are generating the user documentation and of the internal coreutils.
User documentation: https://uutils.github.io/coreutils-docs/user/ Example: ls or cp
The internal documentation can be seen on: https://uutils.github.io/coreutils-docs/dev/uucore/
For example, the backup style is documented here: https://uutils.github.io/coreutils-docs/dev/uucore/backup_control/index.html
More?
Besides my work on Debian/Ubuntu, I have also noticed that more and more operating systems are starting to look at this:
In parallel, https://github.com/uutils/findutils/, a rust dropped-in replacement for find, is getting more attention lately! Here, the graph showing the evolution of the program using the BFS testsuite (much better than GNU's).
What is next?
How to help?
I have been maintaining a list of good first bugs for new comers in the repo!
Don't hesitate to contribute, it is much easier than it seems and a terrific way to learn Rust!
]]>
Even if I see Rust code every day at Mozilla, I was looking for an actual personal project (i.e. this isn't a Mozilla project) to learn Rust during the various COVID lockdowns.
I started contributing to the alternative Coreutils developed in Rust. The project aims at proposing a drop-in replacement of the C-based GNU Coreutils, and I wanted to evaluate if this could be used to run a regular Debian. Similar to what I have done with clang.debian.net a few years ago (rebuilding the Debian archive using clang instead of gcc).
I expect that most of the readers know what is the Coreutils. It is a set of programs performing simple operations (copy/move file, change permissions/ownership, etc). Even if some commands are from the 70s, they are at the base of Linux, Unix and macOS. While different implementations can be found, they are trying to remain compatible in terms of arguments, options, etc. This implementation of Coreutils isn’t different!
If you want to learn more about the history of Unix, I recommend this great Corecursive podcast with Brian Kernighan.
While a lot of people contributed to this project, much was left to be done:
To start easy, I defined 4 goals for this work:
Packaging in Debian isn't a trivial or even simple task. It requires uploading independently all the dependencies in the archive. Rust, with its new ecosystem and small crates, is making this task significantly harder.
The package is called rust-coreutils - https://tracker.debian.org/pkg/rust-coreutils
For Debian/Ubuntu users, to have an idea of the complexity of packaging such applications, just run debtree --build-dep rust-coreutils | dot -Tsvg > coreutils.svg
(should be around 1M).
Since it isn't production ready, the rust-coreutils is installable in parallel with coreutils. This package does NOT replace the GNU/coreutils files (yet?), the new files are installed in /usr/lib/cargo/bin/
.
They can be used with:
export PATH=/usr/lib/cargo/bin/:$PATH
Or, uglier, overriding the files with the new ones.
To achieve this, because I knew I would likely break the image a few times, I created a new project to quickly install a full Debian with PXE and preseed.
The project is available here:
https://github.com/opencollab/qemu-debian-install-pxe-preseed/
A script to create the full qemu image: build_qemu_debian_image.sh
A second script to boot on the newly created image: boot.sh
Then, building and installing coreutils on the system (yeah, it is ugly - don’t do that at home):
apt install rust-coreutils
cd /usr/lib/cargo/bin/
for f in *; do
cp -f $f /usr/bin/
done
First surprise, unlike the old init.d init system, as systemd is not relying on a series of scripts (it is mostly written in C), replacing the coreutils did not have an impact. Therefore, I didn't experience any issue during the boot process
Debian packages rely a lot on post-install scripts (stored in /var/lib/dpkg/info/*
) to finalize and configure packages. They are (almost?) all using /bin/sh
(or /bin/bash)
to perform these actions. They intensively call coreutils applications.
For example, in /var/lib/dpkg/info/exim4-base.postinst
, we can find:
install -d -oDebian-exim -gadm -m2750 /var/log/exim4
With an ugly script, we can test the installations of the 1000 most popular packages one by one.
Running this, some classes of issues in Rust/coreutils could be easily identified.
A significant number of problems could be easily identified as a lack of support for some options.
Here is a list of most of the fixes I had to implement to make this plan work:
update-initramfs: Generating /boot/initrd.img-4.19.0-12-amd64
ln: error: Unrecognized option: 'r'
cp: error: Option 'archive' not yet implemented.
cp: error: Option 'dereference' not yet implemented.
sync -fs
: https://github.com/uutils/coreutils/pull/1639install --owner & --group
: https://github.com/uutils/coreutils/pull/1641mktemp -t
https://github.com/uutils/coreutils/pull/1677Most of the programs behaved as expected. Here is a list of differences:
Setting up libreoffice-common (1:6.1.5-3+deb10u6) ...
install: error: install: cannot install ‘/dev/null’ to ‘/etc/apparmor.d/local/usr.lib.libreoffice.program.oosplash’: the source path is not an existing regular file
A limitation of rust itself https://github.com/rust-lang/rust/issues/79390 mkdir --parent
is sometime used (instead of --parents) - https://github.com/uutils/coreutils/commit/dbc716546b352c8b81af1952915137049ed12239 install: error: target ‘bar.txt’ is not a directory
install -d
won't fail if a directory already exists - https://github.com/uutils/coreutils/pull/1643ls non-existing-file
wasn't returning an error code. "if ls" was used by some scripts - https://github.com/uutils/coreutils/pull/1654install -d /tmp/foo
could not be executed twice (while the second run with the gnu version wouldn't trigger an error if already existing) - https://github.com/uutils/coreutils/pull/1641Build systems can vary significantly one from the other.
To verify their usage of coreutils, I built these three major projects
As Firefox relies mostly on Python as a build system, it went smoothly. I didn’t encounter any issue.
The only unrelated issue that I noticed working on it was apt-key was broken because the script relied on a buggy option of mktemp.
I identified only two issues compared to GNU Coreutils:
ln -fsn ../../x86/boot/bzImage ./arch/x86_64/boot/bzImage
ln: error: Unrecognized option: 'n'
The llvm toolchain relies on Cmake. Just like for Firefox, I didn’t face any issue.
Recently, James Robson added a new test to run the GNU testsuite on the Rust/coreutils.
compared to 546 test passing with the GNU version. Even if a bunch of errors are just different outputs, it demonstrates that there is still a long road ahead.
# TOTAL: 611
# PASS: 144
# SKIP: 86
# XFAIL: 0
# FAIL: 342
# XPASS: 0
# ERROR: 39
First, we will need more motivated contributors to work on this project. Many features remain to be implemented, optimizations to be done (e.g. decreasing the memory usage), etc.
I started to create a list of good first bugs for newcomers:
https://github.com/uutils/coreutils/issues?q=is%3Aissue+is%3Aopen+label%3A%22Good+first+bug%22
I will update this list of there is some interest for this project.
Helping improve the support of the GNU coreutils testsuite would be a huge step while being a great way to learn Rust!
Then, once it is in a better state, we will be able to make it a reliable alternative in Debian/Ubuntu to the GNU/Coreutils.
This might be also interesting for other folks who prefer a BSD license over a GPL.
Instead of patching clang itself, I used a different approach this time: patching Debian tools or implementing some workaround to mitigate an issue.
The percentage of packages failing drop from 4.5% to 3.6% (1400 packages to 1110 - on a total of 31014).
I focused on two classes of issues:
As I have no intention to merge the patch upstream, I used a very dirty workaround. I overwrote the g++ qmake file by clang's:
https://salsa.debian.org/lucas/collab-qa-tools/-/blob/master/modes/clang10#L44-47
I dropped the number of this failure to 0, making some packages build flawlessly (example: qtcreator, chessx, fwbuilder, etc).
However, some packages are still failing later and therefore increasing the number of failures in some other categories like link error. For example, qtads fails because of ordered comparison between pointer and zero
or oscar fails on a -Werror,-Wdeprecated-copy
error.
Breaking the build later also highlighted some new classes of issues which didn't occur with clang < 10.
For example, warnings related to C++ range loop or implicit int float conversion (I fixed a bunch of them in Firefox) .
Historically, symbol management for C++ in Debian has been a pain. Russ Allbery wrote a blog post in 2012 explaining the situation. AFAIK, it hasn't changed much.
Once more, I took the dirty approach: if there new or missing symbols, don't fail the build.
The rational is the following: Packages in the Debian archive are supposed to build without any issue. If there is new or missing symbols, it is probably clang generating a different library but this library is very likely working as expected (and usable by a program compiled with g++ or clang). It is purely a different approach taken by the compiler developer.
In order to mitigate this issue, before the build starts, I am modifying dpkg-gensymbols to transform the error into a warning.
So, the typical Debian error some new symbols appeared in the symbols file
or some symbols or patterns disappeared in the symbols file
will NOT fail the build.
Unsurprisingly, all but one package (libktorrent) build.
Even if I am pessimistic, I reported a bug on dpkg-dev to evaluate if we could improve dpkg-gensymbol not to fail on these cases.
The next offender is Imake.tmpl:2243:10: fatal error: ' X11 .rules' file not found
with more than an hundred occurrences, reported upstream quite sometime ago.
Then, the big issues are going to be much harder to fix as they are real issues/warnings (with -Werror) in the code of the packages. Example: -Wc++11-narrowing & Wreserved-user-defined-literal... The list is long.
I will probably work on that when llvm/clang 11 are in RC phase.
Maintainer of Debian/Ubuntu packages? I am providing a list of failing packages per maintainer: https://clang.debian.net/maintainers.php
For upstream, it is also easy to test with clang. Usually, apt install clang && CC=clang CXX=clang++ <build step>
is good enough.
With these two changes, I have been able to fix about 290 packages. I think I will be able to get that down a bit more but we will soon reach a plateau as many warnings/issues will have to fix in the C/C++ code itself.
]]>Processed results are available on the website: https://clang.debian.net/status.php - Now includes some fancy graphs to show the evolution
Raw logs are published on github: https://github.com/opencollab/clang.debian.net/tree/master/logs
Since my last blog post on the subject (August 2017), Clang is more and more present in the tech ecosystem. It is now the compiler used to build Firefox and Chrome upstream binaries on all the supported architectures/operating systems. More architectures are supported, it has a new linker (lld), a new hybrid IR (MLIR), a lot of checkers in clang-tidy, cross-language linking with Rust, etc.
Now, about Debian results, we rebuilt using 8.0.1, 9.0.1 and 10.0rc2. Results are pretty similar to what we had with previous versions: between 4 to 5% of packages are failing when gcc is replaced by clang.
Even if most of the software are still using gcc as compiler, we can see that clang has a positive effect on code quality. With many different kinds of errors and warnings found clang over the years, we noticed a steady decline of the number of errors. For example, the number of incorrect C/C++ main declarations has been decreasing years after years:
The biggest offender is still the qmake changes which doesn't allow the used workaround (replacing /usr/bin/gcc by /usr/bin/clang) - about 250 errors. Most of these packages would probably compile fine with clang. More on the Qt bug tracker. The workaround proposed in the bug isn't applicable for us as we use the dropped-in replacement of the compiler.
The second error is still some differences in symbol generation. Unlike gcc, it seems that clang doesn't generate some symbols (or adds some). As a bunch of Debian packages are checking the list of symbols in the library (for ABI management), the build fails on purpose. For example, with libcec, the symbol _ZN10P8PLATFORM14CConditionImplD1Ev@Base 3.1.0
isn't generated anymore. I am not expecting this to be a big deal: the generated libraries probably works most of the time. More on C++ symbol management in Debian.
I reported this bug upstream a while back: https://bugs.llvm.org/show_bug.cgi?id=30441
As previously said in a blog post, I don't think there is a strong intensive to go away from gcc for most of the Linux distributions. The big reason for BSD was the license (even if the move to the Apache 2 license wasn't received positively by some of them).
While the LLVM/clang ecosystem clearly won the tooling battle, as a C/C++ compiler, gcc is still an excellent compiler which supports more architecture and more languages.
In term of new warnings and checks, as the clang community moved the efforts in clang-tidy (which requires more complex tooling), out of the box, gcc provides a better experience (as example, see the Firefox meta bug to build with -Werror with the default warnings using gcc 9, gcc 10 and clang trunk for example).
I see some potential next steps to decrease the number of failure:
echo "#include <objc/objc.h>" > foo.m && clang -c foo.m
is currently failing)Many thanks to Lucas Nussbaum for the rebuilds.
]]>The goal of this initiative is to rebuild Debian using Clang as a compiler instead of gcc. I have been doing this analysis for the last 6 years.
Recently, we rebuilt the archive of the Debian archive with Clang 3.9.1 (July 6th), 4.0.1 (July 6th) and 5.0 rc2 (August 20th).
For various reasons, we didn't perform a rebuild since June 2016 with version 3.8. Therefor, we took the opportunity to do three over the last month.
Now, the 3.9 & 4.0 results are impacted by a build failure when building all haskell packages (the -no-pie option in Clang doesn't exist - I introduced it in clang 5.0). Fixing this issue with 5.0 removed more than 860 failures.
Also, for the same versions, a Qt compiler detection is considering that Clang is not a C++11 compiler because clang++, by default, defines __cplusplus as 199711L (-std=c++11 has to be added to define a correct __cplusplus). See https://bugreports.qt.io/browse/QTBUG-62535 for more information. Some discussions happened on the upstream mailing list about changing the default C++ dialect.
For example, with 4.0, this is causing 132 errors. With 5.0, probably thanks to a new Qt version, roughly the same number of packages are failing but because gcc just triggers a warning with the "nodiscard" attribute being incorrectly used when clang triggers an error.
In parallel, ignoring the haskell build failures, the numbers sightly increased since last year even if the overall percentage decreased (new packages being uploaded in the archive).
Version | Build failures | Ignoring haskell pkgs |
3.8 | 1367 / 5.6% | |
3.9 | 2274 / 8.1% | 1618 / 5.8% |
4.0 | 2311 / 8.3% | 1655 / 5.9% |
5.0 | 1445 / 5.1% |
In parallel, new warnings and errors showed up in Clang.
This is causing a new set of build failures (especially with the usage of -Werror).
As few examples:
* Starting with 4.0, clang triggers an error ordered comparison between pointer and zero ('char *' and 'int').
* Similarly, with this version, -Wmain introduces a new warning which will trigger a warning when a bool literal is returned from main.
* clang also introduced a new warning called -Waddress-of-packed-member causing 5 new errors.
* With the same version, clang can trigger a new error when auto is used in function return type.
Now, as a conclusion, having Debian being built with clang by default is still a long shot.
First, when Clang became usable for a general audience, gcc was lagging in term of warning and error detections. Now, gcc is in a much better position than it was, decreasing the interest to have clang replacing gcc. In parallel, most of the efforts in term of warnings
and mistake detections are currently done under the clang tidy umbrella, making them less intrusive as part of this initiative (but harder to use and to deploy).
As an example, the gcc warning -Wmisleading-indentation has been implemented under a clang-tidy checker.
Second, the very permissive license of clang has been a key factor for some operating systems to switch like the PS4, Mac OS X or FreeBSD. With Debian, the community is generally happy with the GPL.
Third, the performances are similar enough that it is not worth the work, except for some projects with very special needs.
Last, despite that it is much easier to contribute to llvm/clang than gcc (not copyright assignment or actual review system for example), this isn't a big differentiator for most of the projects.
Of course, I will continue to run and analysis these rebuilds as this is a great source of information for clang upstream developers to improve the compatibility with gcc and understand some impacts. However, until there is a big game changer, I will stop pursuing the goal of having Debian switching to clang instead of gcc. I will stop effort on the debile project (which was aiming to rebuild in the background packages).
]]>tl;dr: Great progress. We decreased from 9.5% to 5.7% of failures. Full results are available on http://clang.debian.net
At time of the rebuild with 3.4.2, we had 2040 packages failing to build with clang. With 3.5.0, this dropped to 1261 packages.
With Arthur Marble and Alexander Ovchinnikov, both GSoC students, we worked on various ways to decrease the number of errors.
First, the most obvious way, we fixed programming bugs/mistakes in upstream sources. Basically, we took categories of failure and fixed issues one after the other. We started with simple bugs like 'Wrong main declaration', 'non-void function should return a value' or 'Void function should not return a value'.
They are trivial to fix. We continued with harder fixes like ' Undefined reference' or 'Variable length array for a non POD (plain old data) element'.
So, besides these one, we worked on:
In total, we reported 295 bugs with patches. 85 of them have been fixed (meaning that the Debian maintainer uploaded a new version with the fix).
In parallel, I think that the switch by FreeBSD and Mac OS X to Clang also helped to fix various issues by upstreams.
As a parallel approach, we started to implement a suggestion from Linus Torvalds and a few others. Instead of trying to fix all upstream, where we can, we tried to update clang to improve the gcc compatibility.
gcc has many flags to disable or enable optimizations. Some of them are legacy, others have no sense in clang, etc. Instead of failing in clang with an error, we create a new category of warnings (showing optimization flag '%0' is not supported) and moved all relevant flags into it. Some examples, r212805, r213365, r214906 or r214907
We also updated clang to silent some useless arguments like -finput-charset=UTF-8 (r212110), clang being UTF-8 compliant.
Finally, we worked on the forwarding of linker flags. Clang and gcc have a very different behavior: when gcc does not know an argument, it is going to forward the argument to the linker. Clang, in this case, is going to reject the argument and fail with an error. In clang, we have to explicitly declare which arguments are going to be transfer to the linker. Of course, the correct way to pass arguments to the linker is to use -Xlinker or -Wl but the Debian rebuild proved that these shortcuts are used. Two of these arguments are now forwarded:
Just like in other releases, new warnings are added in clang. With (bad) usage of -Werror by upstream software, this causes new build failures:
I also took the opportunity to add some further categorizations in the list of errors. Some examples:
The Debile project being close to ready with Clément Schreiner's GSoC, we will now have an automatic and transparent way to rebuild packages using clang.
As stated, we can see a huge drop in term of number of failures over time:
Hopefully, Clang getting better and better, more and more projects adopting it as the default compiler or as a base for plugin/extension developments, this percentage will continue to decrease.
Having some kind of release goal with clang for Jessie+1 can now be considered as potentially reachable.
There are several things which can be done to help:
Hack on cqa-scanlogs, the error detection tool to detect error patterns (example: Undetected error). This tool is used also for the regular rebuilds of the archive.
Improve clang.debian.net website
Thanks to David Suarez for the rebuilds of the archive, Arthur Marble and Alexander Ovchinnikov for their GSoC works and Nicolas Sévelin-Radiguet for the few fixes.
]]>So, just like gcc, the different version can be called with clang-3.4, clang-3.5 or clang-3.6.
/usr/bin/clang, /usr/bin/clang++, /usr/bin/scan-build and /usr/bin/scan-view are now handled through the llvm-defaults package.
llvm-defaults is also now managing clang-check, clang-tblgen, c-index-test, clang-apply-replacements, clang-tidy, pp-trace and clang-query.
Changes are also available on llvm.org/apt/.
The next step will be to manage also llvm-defaults on llvm.org/apt to simplify the transition for people using these packages.
So, with:
# /etc/apt/sources.list deb http://llvm.org/apt/unstable/ llvm-toolchain main deb http://llvm.org/apt/unstable/ llvm-toolchain-3.4 main deb http://llvm.org/apt/unstable/ llvm-toolchain-3.5 main
$ apt-get install clang-3.4 clang-3.5 clang-3.6 $ clang-3.4 --version Debian clang version 3.4.2 (branches/release_34) (based on LLVM 3.4.2) Target: x86_64-pc-linux-gnu Thread model: posix $ clang-3.5 --version Debian clang version 3.5.0-+rc2-1~exp1 (tags/RELEASE_350/rc2) (based on LLVM 3.5.0) Target: x86_64-pc-linux-gnu Thread model: posix $ clang-3.6 --version Debian clang version 3.6.0-svn214990-1~exp1 (trunk) (based on LLVM 3.6.0) Target: x86_64-pc-linux-gnu Thread model: posix]]>
New Twitter feed ideas are welcome.
]]>