Catégorie: "Développement"

An update on rust/coreutils

Janvier 29th, 2022

TLDR: we are making progress on the Rust implementation of the GNU coreutils.

Well, it is an understatement to say my previous blog post interested many people. Many articles, blog posts and some podcasts talked about it! As we pushed coreutils 0.0.12 a few days ago and getting closer to the 10 000 stars on github, it is now time to give an update!

This has brought a lot of new contributors to this project. Instead of 30 to 60 patches per month, we jumped to 400 to 472 patches every month. Similarly, we saw an increase in the number of contributors (20 to 50 per month from 3 to 8). Two new maintainers (Michael Debertol & Terts Diepraam) stepped in and have been doing a much better job than myself as reviewers now! As a silly metric, according to github, we had 5 561 clones of the repository over the last 2 weeks!

The new contributors focused on:

  • Performances. Now, some binaries are significantly faster than GNU (ex: head, cut, etc)
  • Adding missing binaries or options (see below)
  • Improve the testsuite: we grew the overall code coverage from 55% to 75% (in general, we consider that a 80% code coverage on a project is excellent).
  • Refactoring the code to simplify the maintenance. Examples:

    • Using the same code for permissions for chgrp and chown
    • Managing error the same way in the various binaries - (Kudos to Jeffrey Finkelstein for the huge work)
    • Improving the GNU compatibility (thanks to Jan Verbeek, Jan Scheer, kimono-koans and many others)
    • Move to clap 3. Upgrade by Terts which unblocks us on various problems.
  • ...

Closing the gap with GNU

As far as I know, we are only missing stty (change and print terminal line settings) as a program.

Thanks to some heroes, basenc, pr, chcon and runcon have been implemented. For example, for the two last programs, Koutheir Attouchi wrote new crates to manage SELinux properly. This crate has been used for some other utilities like cp, ls or id.

Leveraging the GNU testsuite to test this implementation

Because the GNU testsuite is excellent, we now have a proper CI using it to run the tests. It is pretty long on the Github action CI (almost two hours to run it) but it is an amazing improvement to the way we work. It was a joint work from a bunch of folks (James Robson, Roy Ivy III, etc). To achieve this, we also made it easier to run the GNU testsuite locally with the Rust implementation but also to ignore some tests or adjust some error messages (see build-gnu.sh and run-gnu-test.sh).

Following a suggestion of Brian G, a colleague at Mozilla (he did the same for some Firefox major change), we are now collecting the history of fail/pass/error into a separate repository and generating a daily graph showing the evolution of regression. Evolution over time At this date, we have, with GNU/Coreutils 9.0:

Total 611 tests
Pass 214
Skip 84
Fail 298
Error 15

We are now automatically identifying new passing tests and regressions in the CI.

For example:

Warning: Congrats! The gnu test tests/chmod/c-option is now passing!
<br />Warning: Congrats! The gnu test tests/chmod/silent is now passing!
<br />Warning: Congrats! The gnu test tests/chmod/umask-x is now passing!
<br />Error: GNU test failed: tests/du/long-from-unreadable. tests/du/long-from-unreadable is passing on 'master'. Maybe you have to rebase?
[...]
<br />Warning: Changes from master: PASS +4 / FAIL +0 / ERROR -4 / SKIP +0

This is also beneficial to GNU as, by implementing some options, Michael Debertol noticed some incorrect behaviors (with sort and cat) or an uninitialized variable (with chmod).

Documentations

Every day, we are generating the user documentation and of the internal coreutils.

User documentation: https://uutils.github.io/coreutils-docs/user/ Example: ls or cp

The internal documentation can be seen on: https://uutils.github.io/coreutils-docs/dev/uucore/
For example, the backup style is documented here: https://uutils.github.io/coreutils-docs/dev/uucore/backup_control/index.html

More?

Besides my work on Debian/Ubuntu, I have also noticed that more and more operating systems are starting to look at this:

In parallel, https://github.com/uutils/findutils/, a rust dropped-in replacement for find, is getting more attention lately! Here, the graph showing the evolution of the program using the BFS testsuite (much better than GNU's).

Evolution over time - BFS testsuite

What is next?

  1. stty needs to be implemented
  2. Improve the GNU compatibility on key programs and reduce the gap
  3. Investigate how to reduce the size of the binaries
  4. Allow Debian and Ubuntu to switch by default without tricky manipulation

How to help?

I have been maintaining a list of good first bugs for new comers in the repo!

Don't hesitate to contribute, it is much easier than it seems and a terrific way to learn Rust!

 

Debian running on Rust coreutils

Mars 9th, 2021

tldr: Rust/coreutils ( https://github.com/uutils/coreutils/ ) is now available in Debian, good enough to boot a Debian with GNOME, install the top 1000 packages, build Firefox, the Linux Kernel and LLVM/Clang. Even if I wrote more than 100 patches to achieve that, it will probably be a bumpy ride for many other use cases.
It is also a terrific project to learn Rust. See the list of good first bugs.

Even if I see Rust code every day at Mozilla, I was looking for an actual personal project (i.e. this isn't a Mozilla project) to learn Rust during the various COVID lockdowns.

I started contributing to the alternative Coreutils developed in Rust. The project aims at proposing a drop-in replacement of the C-based GNU Coreutils, and I wanted to evaluate if this could be used to run a regular Debian. Similar to what I have done with clang.debian.net a few years ago (rebuilding the Debian archive using clang instead of gcc).

I expect that most of the readers know what is the Coreutils. It is a set of programs performing simple operations (copy/move file, change permissions/ownership, etc). Even if some commands are from the 70s, they are at the base of Linux, Unix and macOS. While different implementations can be found, they are trying to remain compatible in terms of arguments, options, etc. This implementation of Coreutils isn’t different!

If you want to learn more about the history of Unix, I recommend this great Corecursive podcast with Brian Kernighan.

While a lot of people contributed to this project, much was left to be done:

  • missing programs to be implemented. See https://github.com/uutils/coreutils#utilities
  • missing options in the various programs
  • code not following latest Rust best practices
  • lack of consistency in the code base (ex: functions with too many arguments)
  • lack of tests/low code coverage
  • Lots of failures when running the GNU Coreutils testsuite (141 tests pass on 613) - Some trivial to fix, some others harder… A good way to start on a Rust project!

To start easy, I defined 4 goals for this work:

  1. Package Coreutils in Debian/Ubuntu
  2. Boot a Debian system with a Rust-based coreutils
  3. Install the top 1000 packages in Debian - including GNOME
  4. Build Firefox, the Linux Kernel and LLV/Clang

Packaging of Coreutils in Debian

Packaging in Debian isn't a trivial or even simple task. It requires uploading independently all the dependencies in the archive. Rust, with its new ecosystem and small crates, is making this task significantly harder.

The package is called rust-coreutils - https://tracker.debian.org/pkg/rust-coreutils

For Debian/Ubuntu users, to have an idea of the complexity of packaging such applications, just run
debtree --build-dep rust-coreutils | dot -Tsvg > coreutils.svg (should be around 1M).

Since it isn't production ready, the rust-coreutils is installable in parallel with coreutils. This package does NOT replace the GNU/coreutils files (yet?), the new files are installed in /usr/lib/cargo/bin/.

They can be used with:

export PATH=/usr/lib/cargo/bin/:$PATH

Or, uglier, overriding the files with the new ones.

Booting Debian with rust-coreutils

To achieve this, because I knew I would likely break the image a few times, I created a new project to quickly install a full Debian with PXE and preseed.

The project is available here:

https://github.com/opencollab/qemu-debian-install-pxe-preseed/

A script to create the full qemu image: build_qemu_debian_image.sh

A second script to boot on the newly created image: boot.sh

Then, building and installing coreutils on the system (yeah, it is ugly - don’t do that at home):

apt install rust-coreutils
cd /usr/lib/cargo/bin/
for f in *; do
cp -f $f /usr/bin/
done

First surprise, unlike the old init.d init system, as systemd is not relying on a series of scripts (it is mostly written in C), replacing the coreutils did not have an impact. Therefore, I didn't experience any issue during the boot process

Debian packages rely a lot on post-install scripts (stored in /var/lib/dpkg/info/*) to finalize and configure packages. They are (almost?) all using /bin/sh (or /bin/bash) to perform these actions. They intensively call coreutils applications.

For example, in /var/lib/dpkg/info/exim4-base.postinst, we can find:

install -d -oDebian-exim -gadm -m2750 /var/log/exim4

With an ugly script, we can test the installations of the 1000 most popular packages one by one.

Running this, some classes of issues in Rust/coreutils could be easily identified.

Implementing missing options

A significant number of problems could be easily identified as a lack of support for some options.

Here is a list of most of the fixes I had to implement to make this plan work:

Different behavior

Most of the programs behaved as expected. Here is a list of differences:

  • install doesn't support using /dev/null as source file
    Setting up libreoffice-common (1:6.1.5-3+deb10u6) ...
    install: error: install: cannot install ‘/dev/null’ to ‘/etc/apparmor.d/local/usr.lib.libreoffice.program.oosplash’: the source path is not an existing regular file
    A limitation of rust itself https://github.com/rust-lang/rust/issues/79390

Compile Firefox, Clang and the Linux Kernel

Build systems can vary significantly one from the other.

To verify their usage of coreutils, I built these three major projects

Firefox

As Firefox relies mostly on Python as a build system, it went smoothly. I didn’t encounter any issue.

The only unrelated issue that I noticed working on it was apt-key was broken because the script relied on a buggy option of mktemp.

Linux Kernel

I identified only two issues compared to GNU Coreutils:

  • The chown command on a non-existing symlink target doesn’t fail on the GNU version, the Rust one was triggering an error.
    https://github.com/uutils/coreutils/pull/1694
  • Linux kernel
    ln -fsn ../../x86/boot/bzImage ./arch/x86_64/boot/bzImage
    ln: error: Unrecognized option: 'n'

LLVM/Clang

The llvm toolchain relies on Cmake. Just like for Firefox, I didn’t face any issue.

Comparing with GNU coreutils using its testsuite

Recently, James Robson added a new test to run the GNU testsuite on the Rust/coreutils.

# TOTAL: 611
# PASS: 144
# SKIP: 86
# XFAIL: 0
# FAIL: 342
# XPASS: 0
# ERROR: 39
compared to 546 test passing with the GNU version. Even if a bunch of errors are just different outputs, it demonstrates that there is still a long road ahead.

Next steps & contribute

First, we will need more motivated contributors to work on this project. Many features remain to be implemented, optimizations to be done (e.g. decreasing the memory usage), etc.
I started to create a list of good first bugs for newcomers:

https://github.com/uutils/coreutils/issues?q=is%3Aissue+is%3Aopen+label%3A%22Good+first+bug%22
I will update this list of there is some interest for this project.

Helping improve the support of the GNU coreutils testsuite would be a huge step while being a great way to learn Rust!

Then, once it is in a better state, we will be able to make it a reliable alternative in Debian/Ubuntu to the GNU/Coreutils.

This might be also interesting for other folks who prefer a BSD license over a GPL.



About optimization flags in software

Septembre 6th, 2013

Following a thread on the Clang mailing list, I did a quick analysis of the optimization flags used to build C or C++ code.

I used a rebuild result from last mid July of all Debian native packages (meaning that I don't have the log of full Java, Python or other packages without native code) and wrote a small script to count all their occurrences.
By default, deb packages uses the command "dpkg-buildflags --get CFLAGS" which will set the optimization level to -O2. However, many packages and upstream build systems do not respect such flags.

On 10320 binary-only packages (in total, Debian has around 19000), the results are the following:

Optimization levelNumber of occurrencesNumber of packages
-O35597269
-Os30849145
-O013993294
-O125494809
-O29958037685
-O3106048531
-O4140216
-O5
-O6693549

Since a graphic talks more than numbers (note that the Y axis is logarithmic):

Disclaimer:
Some packages are using several optimization flags during the same build.
Some packages are not showing the full command line (see jessie release goal: verbose build logs)
The results might be approximate since I am basically grepping "-OX " on the log files.

Clang 3.3 and Debian

Août 19th, 2013

The LLVM toolchain version 3.3 has been released a couple months ago.
Here are now the result of the rebuild of Debian archive using this version of the compiler.
Like the previous releases, we are at a bit less than 12% of packages failing (2188 packages on a total of 18854).
More warnings / errors detections have been added to the software (for example: like this defect of the C++ standard or the detection of unused linker option) causing more build failures, but, in the mean time, we fixed some issues in the Debian packages...

As usual, the following image shows clearly the evolution of the build failures over time.

As stated in my blog post for the release 3.2, this rebuilds prove that Clang is ready for production in term of support of the C, C++ and Objective C languages.

With the performance results showed by Chandler Carruth from Google at the last Euro LLVM summit (see this video from 5:40), I believe that it is now time to report and fix the bugs in the upstream packages.

I also presented this work (video) at the Debconf 13 last week in Vaumarcus (Switzerland) and I will be also presenting this work at the Linux Plumbers Conference, New Orleans.

With Léo Cavaille (as part of his GSOC) and Paul Tagliamonte, we are also working on providing a better automatic rebuild infrastructure for clang-built packages (and other static analyzers). More in the next few weeks.

Finally, I would like to thank folks at AWS for the Debian credit and David Suarez for helping on with the Ruby segfaults.

Rebuild of Debian using clang 3.2

Février 6th, 2013

After the studies of the rebuild of the Debian archive using clang 2.9 & 3.0 and 3.1, the results of the 3.2 rebuild have been published on http://clang.debian.net.

The percentage of failure is the same as clang 3.1: 12.1% of failure. That means that on 18264 packages, 2204 failed to be built with clang (it was 17710/2137 with the version 3.1).

The fact that the percentage is the same can be explained by at least two reasons:
First, some upstreams packages have been fixed to be built correctly with clang.
Second, the new warnings + -Werror (example: -Wsometimes-uninitialized) introduced in clang 3.2 triggered some new build failures.

To conclude, I think that the 3.2 release confirms the turning point of the 3.1. In term of support of the C and C++ standard, clang has now reached an equivalent quality to gcc (with better detection of errors and an extended set of warnings in -Wall).
Now, from the perspective of the Debian rebuilds, the decrease of number of failures will come from the upstream developers improving their codes (exemple of the error non-void function should return a value) or the Debian Developer/maintainer fixing the programming errors.