class: center, middle, inverse # Reimplementing the Coreutils in a modern language ## (Spoiler: Rust, not C++) ## Sylvestre Ledru - sylvestre@debian.org ## FOSDEM - BXL - Sunday 5th Feb 2023 Video: https://www.youtube.com/watch?v=90Q5N1qT7BQ --- # Who am I ? Debian developer, LLVM/Clang contributor, and other things. Director at Mozilla (but this work is unrelated to Mozilla). And been managing some
developers but also worked next to Rust core developers. --- # Who am I ? (bis) I also uploaded the initial version of Rustc in Debian/Ubuntu. And maintainer of ripgrep, fd and some others Rust based packages in Debian/Ubuntu. --- # Early 2020
--- class: center, middle, inverse template: inverse # How to make the best use this time? --- # Some people made
.footnote[Credit Julien Danjou - April 2020] --- # Other did Some gardening, wood working, etc
.footnote[Credit Serge Guelton] --- # At first, we started with
--- # Then
.footnote[Total: 106 hours 42 minutes] --- # Continued with
.footnote[Total: 102 hours 30 minutes] --- # Me? I want to learn Rust for real with an impactful project. --- # Clang & Debian Back in 2012, I played with clang to rebuild the Debian archive.
https://clang.debian.net/
--- # Rust ? How can I replicate something similar with Rust? --- # Rust ? How can I replicate something similar with Rust? ## glibc ? Too hard --- # Rust ? How can I replicate something similar with Rust? ## glibc ? Too hard ## Linux kernel, clang, llvm ? Way too hard --- class: center, middle, inverse template: inverse # What about the Coreutils ? --- # What about the Coreutils ? I have always been curious to see how it works. Even if I am terrible with ASM... -- ## Good news There is almost no ASM in the GNU/coreutils (besides a few lines in src/longlong.h). --- # What the point ? Well, it is a good question --- # What the point ? Well, it is a good question -- 1. Why not? -- 2. Rust rocks. -- 3. It is NOT about security
https://security-tracker.debian.org/tracker/source-package/coreutils
Only 17 CVE in GNU/Coreutils since 2003 -- 4. It is NOT about license (for me at least).
Not interested by the MIT vs GPL debate. -- 5. It is very interesting. --- # You keep on talking about the coreutils
but what is it? Let's start with a quizz? --- # You keep on talking about the coreutils
but what is it? -- Who was born after 2000 ? -- Who was born after 1990 ? -- Who was born after 1980 ? -- Who was born after 1971 ? --
Congratulations, you are younger than the initial implementation
--- # First version UNIX: PDP-7
Author: Ken Thompson
Date: Tue Jun 30 05:00:00 1970 -0500 We can find: * chmod.s * chown.s * cp.s * ...
See sources on Software Heritage
--- # Side note
Adam Gordon Bell interviewed Brian Kernighan in his podcast:
https://corecursive.com/brian-kernighan-unix-bell-labs1/
he [Ken] built himself basically a working operating system in three weeks.
In the expression,
standing on the shoulders of giants
. They are our giants. --- # 'New' version in C Research Unix - first versions
More on https://en.wikipedia.org/wiki/Research_Unix Author: Ken Thompson
Date: Tue Nov 21 14:35:16 1972 -0500
Co-Authored-By: Dennis Ritchie
Mix of ASM (ex: ls.s, ln.s, etc) & C (cp.c, if.c, etc)
See sources on Software Heritage
--- # 'New' version in C Example of code - CLI implementation ```c // usr/source/s1/chmod.c for(m=0; *c; c++) { if(*c < '0' || *c > '7') { printf("bad mode\n"); exit(1); } m = (m<<3) | *c - '0'; } for(i=2; i
See source on Software Heritage --- # 'New' version in C Example of code - chmod function implementation ```c // usr/sys/ken/sys4.c chmod() { register *ip; if ((ip = owner()) == NULL) return; ip->i_mode =& ~07777; if (u.u_uid) u.u_arg[1] =& ~ISVTX; ip->i_mode =| u.u_arg[1]&07777; ip->i_flag =| IUPD; iput(ip); } ```
See source on Software Heritage
--- # A simple programming language We take it for granted and rarely think about it but
pipes `|` and redirection `>` are amazing Quick example: How would you implement an algorithm to find the 5 most popular words in all Shakespeare books? (with 6 chars or more) -- ``` $ tr -cs A-Za-z '\n' < shakespeare.txt | tr A-Z a-z | grep -E '^.{6,}$' | sort | uniq -c | sort -rn | head -n 5 ``` -- By the way, the results: ``` 1706 should 1125 father 1082 exeunt 952 before 876 master ``` --- class: center, middle, inverse template: inverse # Fast forward to 2023 --- # Fast forward to 2023 About 105 commands in the GNU implementation. ```bash arch cut fold mkfifo printenv split tsort base32 date groups mknod printf stat tty base64 dd hashsum mktemp ptx stdbuf uname basename df head more pwd stty unexpand basenc dir hostid mv readlink sum uniq cat dircolors hostname nice realpath sync unlink chcon dirname id nl relpath tac uptime chgrp du install nohup rm tail users chmod echo join nproc rmdir tee vdir chown env kill numfmt runcon test wc chroot expand link od seq timeout who cksum expr ln paste shred touch whoami comm factor logname pathchk shuf tr yes cp false ls pinky sleep true csplit fmt mkdir pr sort truncate ``` Note: grep, find, less, top or watch are NOT part of the coreutils. --- # And plenty of arguments Example with `cp` and other commands ```bash $ cp --help | grep " -" | wc -l 34 $ chown --help | grep " -" | wc -l 15 $ ln --help | grep " -" | wc -l 16 $ ls --help | grep " -" | wc -l 61 ``` --- # Quick quizz Who knows about ... -- * ls/cp/mv/rm/mkdir -- * chown/chmod/chgrp -- * numfmt ```bash echo 1.18GB|coreutils numfmt --from=si --suffix=B 1180000000B ``` --- # Quick quizz Who knows about ... * ls/cp/mv/rm/mkdir * chown/chmod/chgrp * numfmt * pr -- ``` convert text files for printing ``` --- # Quick quizz Who knows about ... * ls/cp/mv/rm/mkdir * chown/chmod/chgrp * numfmt * pr -- * csplit -- ``` split a file into sections determined by context lines ``` --- # Quick quizz Who knows about ... * ls/cp/mv/rm/mkdir * chown/chmod/chgrp * numfmt * pr * csplit * and others like factor, pinky, tsort, shuf, shred, seq, etc --- # A bunch of different implementations * GNU's coreutils * BSD * Busybox * Toybox (Android) * V lang implementation https://github.com/vlang/coreutils * ... ??? (if you know, please sylvestre@debian.org) --- # Now, this Rust implementation? * Started by Jordi Boggiano in 2013 (yeah, before Rust 1.0) * Found it early 2020, saw that some maintainers were still around helping * Started to contribute in April 2020 for real and quickly took over --- # Some stats now
The second contributor doing amazing work is in this room too. --- # Status * Packaged in most of the linux distros -- * Used by Apertis -- * Used by a famous social network via the Yocto project --- # But why it makes sense to use Rust? * Secure by default for memory mgmt -- * Very portable -- * Lot of crates (Rust libraries) available - Don't have to reimplement the wheel: - lscolors - provides the same colors as ls - walkdir - recursive operations on the fs (cp, rm, etc) - tempfile - generate some temporary files/dirs (mktemp, etc) - teminal_size - to know how to display information - ... -- * Terrific performances (more later) -- * Very popular language (easy to find new contributors) --- # Goal of this implementation? * Replicate the success of clang - dropped in replacement for GNU's -- * Cross platform - support Linux, Windows, MacOS, Android, FreeBSD, Fuchsia, etc -- * Easy to test:
`cargo test --features unix`
and fast:
`finished in 44.29s`
-- * I don't care (ie please don't troll me) but the license is MIT -- * It isn't about fighting the GNU project or the FSF --- # My Goals * My intial goals were: - Be able to boot Debian -- - Install the top 1000 packages -- - Build Firefox, LLVM and the Linux Kernel -- - Package it for Debian & Ubuntu -- * Published some
blog
posts
about it TLDR: it works. This presentation is done a Debian testing with Firefox running the Rust/coreutils --- # My Goals (bis) To achieve, we had to deploy: -- * Proper CI (and therefore build system) for all supported platforms -- * Add code coverage support -- * Improve the code coverage results (ie write plenty of tests)
Currently
80.62%
-- * Plug clippy and rustfmt -- * Document the various processes -- Took about a year to be in that state --- class: center, middle, inverse template: inverse # Current status --- # Current status Running the GNU testsuite with our binaries:
--- # Now, How do we work? Our goals are pretty simple: dropped in replacement for the gold standard (GNU's). Great news: the GNU test suite is amazing and we made it super easy to test with: ```bash bash util/run-gnu-test.sh tests/touch/not-owner.sh ``` Will run a GNU's test for touch using the Rust implementation --- # Remaining work We have a simple script to list the failing tests: ```bash python3 util/remaining-gnu-error.py ```
https://uutils.github.io/user/test_coverage.html
--- # Dropped in replacement? Also means output. Example: GNU mkdir: ```bash $ mkdir -p -v a/../b/../c mkdir: created directory 'a' mkdir: created directory 'a/../b' mkdir: created directory 'a/../b/../c' ``` -- and we had ```bash $ ./target/debug/mkdir -p -v a/../b/../c ./target/debug/mkdir: created directory 'a/../b/../c' ``` .footnote[https://github.com/uutils/coreutils/pull/3217] --- # Example of a corner case ```bash $ install /dev/null $(mktemp) ``` Used in
Apparmor debian helper
--- # Corner cases for Rust stdlib ```rust use std::fs; fn main() { match fs::copy("/dev/null", "bar.txt") { Ok(_n) => (), Err(e) => { println!("{}", e); () } } } ``` Fails with: ``` the source path is not an existing regular file ``` .footnote[https://github.com/rust-lang/rust/issues/79390] --- # Workaround ```rust if from.as_os_str() == "/dev/null" { /* workaround a limitation of fs::copy * https://github.com/rust-lang/rust/issues/79390 */ if let Err(err) = File::create(to) { return Err( InstallError::InstallFailed(from.to_path_buf(), to.to_path_buf(), err).into(), ); } } else if let Err(err) = fs::copy(from, to) { return Err(InstallError::InstallFailed(from.to_path_buf(), to.to_path_buf(), err).into()); } ``` .footnote[https://github.com/uutils/coreutils/blob/a75cfd96555f44198b5410f65ec501f89cc6ef30/src/uu/install/src/install.rs#L672-L683] --- # Let's try together Demo time! --- class: center, middle, inverse template: inverse # Performances ? --- # Performance Disclaimer: benchmarking performances is HARD Some binaries are significantly faster. -- Example #1: ``` $ hyperfine -r 10 \ "target/release/coreutils sort -R shakespeare.txt > shakespeare.txt.rand; \ target/release/coreutils sort shakespeare.txt.rand > /dev/null" \ "sort -R shakespeare.txt > shakespeare.txt.rand; \ sort shakespeare.txt > /dev/null" é [...] Summary 'target/release/coreutils sort -R shakespeare.txt > shakespeare.txt.rand; target/release/coreutils sort shakespeare.txt.rand > /dev/null' ran 4.63 ± 0.19 times faster than 'sort -R shakespeare.txt > shakespeare.txt.rand; sort shakespeare.txt > /dev/null' ``` 4.63 times faster than GNU's --- # Performance Example #2: ``` $ bash "Clone Firefox" $ hyperfine -i --warmup 2 "target/release/coreutils ls -al -R ~/dev/mozilla/mozilla-unified.hg > /dev/null" "ls -al -R ~/dev/mozilla/mozilla-unified.hg > /dev/null" [...] Summary 'target/release/coreutils ls -al -R ~/dev/mozilla/mozilla-unified.hg > /dev/null' ran 1.43 ± 0.15 times faster than 'ls -al -R ~/dev/mozilla/mozilla-unified.hg > /dev/null' ``` 1.43 times faster than GNU's -- Similar result for cp -R --- # Performance To be fair, we can be slower too Example #3: ``` $ hyperfine "seq 18446744073708551615 18446744073709551615 | factor" \ "./target/release/coreutils seq 18446744073708551615 18446744073709551615| ./target/release/coreutils factor" [...] 'seq 18446744073708551615 18446744073709551615 | factor' ran 5.08 ± 0.01 times faster than './target/release/coreutils seq 18446744073708551615 18446744073709551615| ./target/release/coreutils factor' ``` --- # Benchmarking Demo time! With some product placement ;) Using:
https://github.com/mstange/samply
https://profiler.firefox.com/ --- # Extending the coreutils We have a beautiful doc: https://uutils.github.io/user/ with example provided by https://tldr.sh/
And a good developer doc: https://uutils.github.io/dev Example:
Example with backup control
--- # Extending the coreutils We took the liberty to add extra features like: * progress bars for cp & mv -- * b3sum (works like b2sum) -- * cut `-w` from FreeBSD -
separate fields by whitespace (Space and Tab)
--- # What is next? * Implement missing options in the various binaries -- * Complete the full compatibility with the GNU tools -- * Improve performances on some key programs --- # How can I help? * We have good first bugs
As I probably said, it is a great way to learn Rust See https://uutils.github.io/user/test_coverage.html -- * We will apply to the Google summer of code 2023 -- * Would appreciate corporate/foundation sponsors for the CI --- class: center, middle, inverse template: inverse # Let's do some predictions --- # What about the Linux Kernel? Merged in the main branch: ``` $ ohcount git@github.com:torvalds/linux.git Language Files Code Comment Comment % Blank Total ---------------- ----- --------- --------- --------- --------- --------- c 55128 23468316 3757486 13.8% 3915921 31141723 [...] rust 37 5451 5439 49.9% 936 11826 ``` For now, in the main branch, build + glue code. Still waiting for an user-visible feature written in Rust. --- # In 2024 We will start to see cloud provider proposing images with Rust core components. We will see more and more piece of the core infrastructure of Linux improved with Rust. --- class: center, middle, inverse template: inverse # Thanks for your time --- class: center, middle, inverse template: inverse # Reminder: not a Mozilla project. --- class: center, middle, inverse template: inverse # Any questions?