Catégorie: "Linux"

An update on rust/coreutils

Janvier 29th, 2022

TLDR: we are making progress on the Rust implementation of the GNU coreutils.

Well, it is an understatement to say my previous blog post interested many people. Many articles, blog posts and some podcasts talked about it! As we pushed coreutils 0.0.12 a few days ago and getting closer to the 10 000 stars on github, it is now time to give an update!

This has brought a lot of new contributors to this project. Instead of 30 to 60 patches per month, we jumped to 400 to 472 patches every month. Similarly, we saw an increase in the number of contributors (20 to 50 per month from 3 to 8). Two new maintainers (Michael Debertol & Terts Diepraam) stepped in and have been doing a much better job than myself as reviewers now! As a silly metric, according to github, we had 5 561 clones of the repository over the last 2 weeks!

The new contributors focused on:

  • Performances. Now, some binaries are significantly faster than GNU (ex: head, cut, etc)
  • Adding missing binaries or options (see below)
  • Improve the testsuite: we grew the overall code coverage from 55% to 75% (in general, we consider that a 80% code coverage on a project is excellent).
  • Refactoring the code to simplify the maintenance. Examples:

    • Using the same code for permissions for chgrp and chown
    • Managing error the same way in the various binaries - (Kudos to Jeffrey Finkelstein for the huge work)
    • Improving the GNU compatibility (thanks to Jan Verbeek, Jan Scheer, kimono-koans and many others)
    • Move to clap 3. Upgrade by Terts which unblocks us on various problems.
  • ...

Closing the gap with GNU

As far as I know, we are only missing stty (change and print terminal line settings) as a program.

Thanks to some heroes, basenc, pr, chcon and runcon have been implemented. For example, for the two last programs, Koutheir Attouchi wrote new crates to manage SELinux properly. This crate has been used for some other utilities like cp, ls or id.

Leveraging the GNU testsuite to test this implementation

Because the GNU testsuite is excellent, we now have a proper CI using it to run the tests. It is pretty long on the Github action CI (almost two hours to run it) but it is an amazing improvement to the way we work. It was a joint work from a bunch of folks (James Robson, Roy Ivy III, etc). To achieve this, we also made it easier to run the GNU testsuite locally with the Rust implementation but also to ignore some tests or adjust some error messages (see build-gnu.sh and run-gnu-test.sh).

Following a suggestion of Brian G, a colleague at Mozilla (he did the same for some Firefox major change), we are now collecting the history of fail/pass/error into a separate repository and generating a daily graph showing the evolution of regression. Evolution over time At this date, we have, with GNU/Coreutils 9.0:

Total 611 tests
Pass 214
Skip 84
Fail 298
Error 15

We are now automatically identifying new passing tests and regressions in the CI.

For example:

Warning: Congrats! The gnu test tests/chmod/c-option is now passing!
<br />Warning: Congrats! The gnu test tests/chmod/silent is now passing!
<br />Warning: Congrats! The gnu test tests/chmod/umask-x is now passing!
<br />Error: GNU test failed: tests/du/long-from-unreadable. tests/du/long-from-unreadable is passing on 'master'. Maybe you have to rebase?
[...]
<br />Warning: Changes from master: PASS +4 / FAIL +0 / ERROR -4 / SKIP +0

This is also beneficial to GNU as, by implementing some options, Michael Debertol noticed some incorrect behaviors (with sort and cat) or an uninitialized variable (with chmod).

Documentations

Every day, we are generating the user documentation and of the internal coreutils.

User documentation: https://uutils.github.io/coreutils-docs/user/ Example: ls or cp

The internal documentation can be seen on: https://uutils.github.io/coreutils-docs/dev/uucore/
For example, the backup style is documented here: https://uutils.github.io/coreutils-docs/dev/uucore/backup_control/index.html

More?

Besides my work on Debian/Ubuntu, I have also noticed that more and more operating systems are starting to look at this:

In parallel, https://github.com/uutils/findutils/, a rust dropped-in replacement for find, is getting more attention lately! Here, the graph showing the evolution of the program using the BFS testsuite (much better than GNU's).

Evolution over time - BFS testsuite

What is next?

  1. stty needs to be implemented
  2. Improve the GNU compatibility on key programs and reduce the gap
  3. Investigate how to reduce the size of the binaries
  4. Allow Debian and Ubuntu to switch by default without tricky manipulation

How to help?

I have been maintaining a list of good first bugs for new comers in the repo!

Don't hesitate to contribute, it is much easier than it seems and a terrific way to learn Rust!

 

Debian running on Rust coreutils

Mars 9th, 2021

tldr: Rust/coreutils ( https://github.com/uutils/coreutils/ ) is now available in Debian, good enough to boot a Debian with GNOME, install the top 1000 packages, build Firefox, the Linux Kernel and LLVM/Clang. Even if I wrote more than 100 patches to achieve that, it will probably be a bumpy ride for many other use cases.
It is also a terrific project to learn Rust. See the list of good first bugs.

Even if I see Rust code every day at Mozilla, I was looking for an actual personal project (i.e. this isn't a Mozilla project) to learn Rust during the various COVID lockdowns.

I started contributing to the alternative Coreutils developed in Rust. The project aims at proposing a drop-in replacement of the C-based GNU Coreutils, and I wanted to evaluate if this could be used to run a regular Debian. Similar to what I have done with clang.debian.net a few years ago (rebuilding the Debian archive using clang instead of gcc).

I expect that most of the readers know what is the Coreutils. It is a set of programs performing simple operations (copy/move file, change permissions/ownership, etc). Even if some commands are from the 70s, they are at the base of Linux, Unix and macOS. While different implementations can be found, they are trying to remain compatible in terms of arguments, options, etc. This implementation of Coreutils isn’t different!

If you want to learn more about the history of Unix, I recommend this great Corecursive podcast with Brian Kernighan.

While a lot of people contributed to this project, much was left to be done:

  • missing programs to be implemented. See https://github.com/uutils/coreutils#utilities
  • missing options in the various programs
  • code not following latest Rust best practices
  • lack of consistency in the code base (ex: functions with too many arguments)
  • lack of tests/low code coverage
  • Lots of failures when running the GNU Coreutils testsuite (141 tests pass on 613) - Some trivial to fix, some others harder… A good way to start on a Rust project!

To start easy, I defined 4 goals for this work:

  1. Package Coreutils in Debian/Ubuntu
  2. Boot a Debian system with a Rust-based coreutils
  3. Install the top 1000 packages in Debian - including GNOME
  4. Build Firefox, the Linux Kernel and LLV/Clang

Packaging of Coreutils in Debian

Packaging in Debian isn't a trivial or even simple task. It requires uploading independently all the dependencies in the archive. Rust, with its new ecosystem and small crates, is making this task significantly harder.

The package is called rust-coreutils - https://tracker.debian.org/pkg/rust-coreutils

For Debian/Ubuntu users, to have an idea of the complexity of packaging such applications, just run
debtree --build-dep rust-coreutils | dot -Tsvg > coreutils.svg (should be around 1M).

Since it isn't production ready, the rust-coreutils is installable in parallel with coreutils. This package does NOT replace the GNU/coreutils files (yet?), the new files are installed in /usr/lib/cargo/bin/.

They can be used with:

export PATH=/usr/lib/cargo/bin/:$PATH

Or, uglier, overriding the files with the new ones.

Booting Debian with rust-coreutils

To achieve this, because I knew I would likely break the image a few times, I created a new project to quickly install a full Debian with PXE and preseed.

The project is available here:

https://github.com/opencollab/qemu-debian-install-pxe-preseed/

A script to create the full qemu image: build_qemu_debian_image.sh

A second script to boot on the newly created image: boot.sh

Then, building and installing coreutils on the system (yeah, it is ugly - don’t do that at home):

apt install rust-coreutils
cd /usr/lib/cargo/bin/
for f in *; do
cp -f $f /usr/bin/
done

First surprise, unlike the old init.d init system, as systemd is not relying on a series of scripts (it is mostly written in C), replacing the coreutils did not have an impact. Therefore, I didn't experience any issue during the boot process

Debian packages rely a lot on post-install scripts (stored in /var/lib/dpkg/info/*) to finalize and configure packages. They are (almost?) all using /bin/sh (or /bin/bash) to perform these actions. They intensively call coreutils applications.

For example, in /var/lib/dpkg/info/exim4-base.postinst, we can find:

install -d -oDebian-exim -gadm -m2750 /var/log/exim4

With an ugly script, we can test the installations of the 1000 most popular packages one by one.

Running this, some classes of issues in Rust/coreutils could be easily identified.

Implementing missing options

A significant number of problems could be easily identified as a lack of support for some options.

Here is a list of most of the fixes I had to implement to make this plan work:

Different behavior

Most of the programs behaved as expected. Here is a list of differences:

  • install doesn't support using /dev/null as source file
    Setting up libreoffice-common (1:6.1.5-3+deb10u6) ...
    install: error: install: cannot install ‘/dev/null’ to ‘/etc/apparmor.d/local/usr.lib.libreoffice.program.oosplash’: the source path is not an existing regular file
    A limitation of rust itself https://github.com/rust-lang/rust/issues/79390

Compile Firefox, Clang and the Linux Kernel

Build systems can vary significantly one from the other.

To verify their usage of coreutils, I built these three major projects

Firefox

As Firefox relies mostly on Python as a build system, it went smoothly. I didn’t encounter any issue.

The only unrelated issue that I noticed working on it was apt-key was broken because the script relied on a buggy option of mktemp.

Linux Kernel

I identified only two issues compared to GNU Coreutils:

  • The chown command on a non-existing symlink target doesn’t fail on the GNU version, the Rust one was triggering an error.
    https://github.com/uutils/coreutils/pull/1694
  • Linux kernel
    ln -fsn ../../x86/boot/bzImage ./arch/x86_64/boot/bzImage
    ln: error: Unrecognized option: 'n'

LLVM/Clang

The llvm toolchain relies on Cmake. Just like for Firefox, I didn’t face any issue.

Comparing with GNU coreutils using its testsuite

Recently, James Robson added a new test to run the GNU testsuite on the Rust/coreutils.

# TOTAL: 611
# PASS: 144
# SKIP: 86
# XFAIL: 0
# FAIL: 342
# XPASS: 0
# ERROR: 39
compared to 546 test passing with the GNU version. Even if a bunch of errors are just different outputs, it demonstrates that there is still a long road ahead.

Next steps & contribute

First, we will need more motivated contributors to work on this project. Many features remain to be implemented, optimizations to be done (e.g. decreasing the memory usage), etc.
I started to create a list of good first bugs for newcomers:

https://github.com/uutils/coreutils/issues?q=is%3Aissue+is%3Aopen+label%3A%22Good+first+bug%22
I will update this list of there is some interest for this project.

Helping improve the support of the GNU coreutils testsuite would be a huge step while being a great way to learn Rust!

Then, once it is in a better state, we will be able to make it a reliable alternative in Debian/Ubuntu to the GNU/Coreutils.

This might be also interesting for other folks who prefer a BSD license over a GPL.



Rebuild of the Debian archive with clang

February 29th, 2012

Recently, I have been working on a side project for Debian. The goal was to rebuild of the Debian archive (the distribution) with clang, a new C/C++ compiler.

clang is now ready to build software for production (either for C, C++ or Objective-C). This compiler is providing many more warnings and interesting errors than the gcc suite while not carrying the same legacy as gcc.
This rebuild has several goals. The first one is to prove (or not) that clang is a viable alternative. Second, building a software with different compilers improves the overall quality of code by providing different checks and alerts.

The result are detailed and explained here:
http://clang.debian.net/

Conclusions
When I had the idea to rebuild Debian with a new compiler, I was expecting many issues and bugs caused by clang but I have been surprised to notice that most of the issues are either difference in C standard supported, difference of interpretation or corner cases.
My personal opinion is that clang is now stable and good enough to rebuild most of the packages in the Debian archive, even if many of them will need minor tweaks to compile properly.
In the next few years, coupled with better static analysis tools, clang might replace gcc/g++ as the C/C++ compiler used by default in Linux and BSD distributions.
The clang developers are progressing very fast: 14.5% of the packages were failing with version 2.9 against 8.8% with version 3.0.
Several major steps in the clang adoptions have been made like chromium/chrome being built by default with clang, Xcode providing clang by default, FreeBSD working on the gcc -> clang switch, etc.
However, on the Debian point of view, one of the important step would be to make sure that clang manages all the Debian architecture/kernel (11 official, 6 unofficial)

Update of the linear algebra libraries in Debian

April 6th, 2010

In the numerical computing world, the cornerstones libraries are BLAS and LAPACK. They have been used in most of the numerical software for decades (like Scilab, R, numpy, OpenOffice with calc, etc).

During that time, many implementations appeared to improve the performances taking advantages of clusters, multicore, SEE{1,2,3,4}, various levels of cache...
Between the reference BLAS (refblas) to an optimized one like ATLAS or MKL (Math Kernel Library by Intel - non-free), it is not rare to have a 15 factor.

In Debian, we use by default the reference implementation of BLAS (168 reverse dependencies) and LAPACK (178 reverse dependencies). If the results are usually bad, they are pretty easy to use. What is hard to use, is switch between highly optimized libraries.
For now, the main one in the archive is ATLAS. ATLAS build process will launch many computations to know what will work best on the architecture. Results are usually excellent.

1) Upload of a refactoring of the ATLAS package.
I have been working on this for a while and after 19 uploads into Debian Experimental and I am happy (and kind of relief) to upload into debian unstable the release 3.8.3 of ATLAS.

The new key elements in this release are:

  • Package of the release 3.8.3 ... Long overdue
  • Much more packages for recent architectures (sse3, core2sse3, etc)
  • A simplified maintenance
  • Easy to build a custom package: fakeroot debian/rules custom
  • Easy upgrade to version 3.9.X when it is stable
  • 12 bugs closed in Debian (including 4 RCs)
  • 6 bugs closed in Launchpad.
  • MMX optimized package removed

Note that, as before, all prebuilt binaries of ATLAS will be always slower than if you built them on the target architecture (but using Debian binary packages will save a few kilograms of Uranium).

And one of most important feature is the capability to switch to any ATLAS implementation.

2) Switch between the different implementation
The problem in Debian (and Ubuntu) was that it was hard to switch between the ref BLAS/LAPACK and the optimized libraries. The user has to play with the LD_LIBRARY_PATH to use the various optimized packages and since there is no convention between the various distribution, the upstream developer has to develop crappy tricks to handle such things.

It is why I implemented the following proposal: Handle different versions of BLAS and LAPACK.

The main idea is to use the update-alternatives system to allow a quick and easy switch. For example:

# update-alternatives --config libblas.so.3gf 
There are 3 choices for the alternative libblas.so.3gf (providing /usr/lib/libblas.so.3gf).

  Selection    Path                                           Priority   Status
------------------------------------------------------------
* 0            /usr/lib/atlas-core2sse3/atlas/libblas.so.3gf   55        auto mode
  1            /usr/lib/atlas-base/atlas/libblas.so.3gf        35        manual mode
  2            /usr/lib/atlas-core2sse3/atlas/libblas.so.3gf   55        manual mode
  3            /usr/lib/libblas/libblas.so.3gf                 10        manual mode

# update-alternatives --config liblapack.so.3gf
There are 3 choices for the alternative liblapack.so.3gf (providing /usr/lib/liblapack.so.3gf).

  Selection    Path                                             Priority   Status
------------------------------------------------------------
* 0            /usr/lib/atlas-core2sse3/atlas/liblapack.so.3gf   55        auto mode
  1            /usr/lib/atlas-base/atlas/liblapack.so.3gf        35        manual mode
  2            /usr/lib/atlas-core2sse3/atlas/liblapack.so.3gf   55        manual mode
  3            /usr/lib/lapack/liblapack.so.3gf                  10        manual mode

Thanks to this, it is just trivial to switch from one to the other...

Conclusion:
I just pushed the changes into Debian unstable for blas, lapack and atlas.
I have been testing a lot these deep modifications and I fixed all the problems that I found. However, in case I missed something, please report a bug...

A few news around Scilab (packaging & other stuff)

Décembre 30th, 2008

Here is a quick list of new things around Scilab (note that it is slightly modified message of the one I sent on the dev mailing list).

  • Sagemath - it is a software which combines the power of various opensource software.

    A Experimental "Scilab/Sage package" is planed for Sage 3.4 and an experimental package by Jaap Spies is already available

  • Debian/Ubuntu
    Packages are available on my Scilab homepage
    Debian packages are up-to-date (5.0.3-2). I will to upload the new Ubuntu's packages in 2009 (for now, it is 5.0.3-1 which is working too). I might backport them to Debian Lenny (future stable) & Ubuntu Hardy.

  • Mandriva
    Tomasz Pawel Gajc (a regular Mandriva contributor) created a package available on zarb.org

  • Opensuse
    A Scilab package for Opensuse has been created by Andrea Florio.
    It is available on packman
    and should be included in the next version of opensuse.
    Note that Mandriva & Opensuse packages have been created for Scilab 5.0.3 and I applied most of their patches (or update some part of the code) for Scilab 5.0.4.

  • Redhat/Fedora
    The work is still going on.
    They are also doing a great work packaging the misc dependencies but they are a bit stuck about the JOGL packaging (jogl and glugen should produce two different packages ... which I should also do in Debian/Ubuntu too)

  • Arch Linux
    It is also available under Arch Linux by Simon Lipp (one of our former trainee).

  • Gentoo
    A bit stuck for now but some activities have been seen lately around on jrosetta (one of the dependency introduced by Scilab 5).

  • Slackware
    Scilab has been packaged by the Italian Slackware community. It is available on their website. I don't know if it is going to be included in Slackware by default or not;