I'm not confident that I understand what you're asking for, but couldn't you just sed off the timestamp from every line? Or for a more extreme example, I have occasionally used ... tr, I think? to completely remove all numbers from logs in order to aggregate error messages without worrying about the fact that they kept including irrelevant changing numbers (something like tail -5000 logfile | tr -d [0-9] | sort | uniq -c | sort -n or so).
how would you do it if your logs were printed on paper with a printer, each line printed with stochastic timing (due to a bug), with an ink containing a chemical tracer with halflife `h` (after being put to paper), but the ink is randomly sampled from several (`m`) inks of different halflives `h1`, h2`,... `hn`?
assume `p` different printers scattered across the 10 most populous US cities.
you may use standard unix utilities.
Have interns shovel it all into scanners, run the result through tesseract, then do the thing I said before. Nonetheless, I don't think your question is sincere; what point are you actually trying to get at?
Sorry, I was just trying to make a joke (about both insane systems and interview questions) since the question you answered was a bit unclear. Guess it didn't land, haha.
Ah, that makes much more sense. It read as somewhat aggressive in a way that I couldn't quite make sense of; my best guess was that you were insinuating that the unix tools I was reaching for were arcane and unwieldy. Thanks for clarifying.
A different but more powerful method of ensuring reproducibility is more rigorous compilation using formally verifiable proofs.
That’s what https://pi2.network/ does. It uses K-Framework, which is imo very underrated/deserves more attention as a long term way of solving this kind of problem.
I think general consensus is against you. Fedora packaging policy [1]:
> Packages including libraries should exclude static libs as far as possible (eg by configuring with --disable-static). Static libraries should only be included in exceptional circumstances. Applications linking against libraries should as far as possible link against shared libraries not static versions.
I'd far rather a static binary than a bundled vm for a single app which produces all the same bad points of a static binary plus 900 new bad points on top.
Packaging guidelines from a distros docs like this are not any kind of counter argument to that comment.
This is the current orthodoxy, so obviously all docs say it. We all know the standard argument for the current standard. Their comment was explicitly "I'd like to see a change from the current orthodoxy". They are saying that maybe that argument is not all it promised to be back in the 90's when we started using dynamic libs.
Given that the comment is talking about python he probably already has those 1.000.000 libraries.
The common thing to do for python programs that are not directly bundled with the os is to set up a separate virtual environment for each one and download/compile the exact version of each dependency from scratch.
Their point is that if 1000 programs use the same 1000 libraries, static linking duplicates all those libraries across each binary, taking that much more storage and memory (which can hurt performance as well), effectively making 1000000 libraries in use.
Dynamic linking gives you M binaries + N libraries. Static linking is M * N.
But there are not 1000 programs being proposed. No one said every binary in a system. Only some binaries are a problem. That is silly hyperbole that isn't useful or any kind of valid argument.
What I said specifically is I'd rather a static binary than a flatpak/snap/appimage/docker/etc. That is a comparison between 2 specific things, and neither of them is "1000 programs using 1000 libraries"
And some binaries already ship with their own copies of all the libraries anyway, just in other forms than static linking. If there are 1000 flatpaks/snaps/docker images etc, then those million libraries are already out there in an even worse form than if they were all static binaries. But there are not, generally, on any give single system, yet, though the number is growing not shrinking.
For all the well known and obvious benefits of dynamic linking, there are reasons why sometimes it's not a good fit for the task.
And in those cases where, for whatever reason, you want the executable to be self-contained, there are any number of ways to arrange it, from a simple tar with the libs & bin in non-conflicting locations and a launcher script that sets a custom lib path (or bin is compiled with the lib path), to appimage/snap/etc, to a full docker/other container, to unikernel, to simple static bin.
All of those give different benefits and incur different costs. Static linking simply has the benefit of being dead simple. It's both space and complexity-efficient compared to any container or bundle system.
A literal vm is hyperbole and sloppy language on my part. I meant all the different forms of containerising or bundling apps with all of their dependencies and os environment.
If we believe we have a reproducible build, that's constitutes a big test case which gives us confidence in the determininism of the whole software stack.
To validate that test case, we actually have to repeat the build a number of times.
If we spot a difference, something is wrong.
For instance, suppose that a compiler being used has a bug whereby it is relying on the value of an unitialized variable somewhere. That could show up as a difference in the code it generates.
Without reproducible builds, of course there are always differences in the results of a build: we cannot use repeated builds to discover that something is wrong.
(People do diffs between irreproducible builds anyway. For instance, disassemble the old and new binaries, and do a textual diff, validating that only some expected changes are present, like string literals that have embedded build dates. If you have reproducible builds, you don't have to do that kind of thing to detect a change.
Reproducible builds will strengthen the toolchains and surrounding utilities. They will flush out instabilities in build systems, like parallel Makefiles with race conditions, or indeterminate orders of object files going into a link job, etc.
That's already been a thing in all the Redhat variants. RPM/DNF have checksums of the installed binaries and there is GPG signing of packages and repositories. The only part of that ecosystem I've always had a gripe with is putting the GPG public keys in the mirrors. People should have to grab those from non mirrors or any low skilled attacker can just replace the keys and sign everything again. It would be caught but not right away.
Changes can also be caught using bolt on tools like Tripwire, OSSEC and it's alternatives or even home grown tools that build signed manifests of approved packages usually for production approval.
Yes! The attack on SolarWinds Orion was an attack on its build process. A verified reproducible build would have detected the subversion, because the builds would not have matched (unless the attackers managed to detect and break into all the build processes).
You know what does not give me confidence? Updating software, but whats that, its still printing the same build date? Of course hours later tens of files deep I found out some reproducability goof just hardcoded it.
So far, reproducible builds are heavy on the former, zero on these bugs you mention and zero on supply chain attacks.
Some of these supposed use cases make no sense. You update the compiler. Oh no, all the code is different? Enjoy the 16h deep dive to realize someone tweaked code generation based on the cycle times given on page 7893 of the Intel x64 architecture reference manual.
They should be setting the build days for a package from say the commit date of the top commit of the branch that's being built. It can't be something that doesn't change when the next version is spun. If you see a behavior like that in anybody's reproducible package system or distro, you have a valid complaint.
My impression is that reproducible builds improve your security by helping make it more obvious that packages haven't been tampered with in late stages of the build system.
* Edit, it's quoted in the linked article:
> Jędrzejewski-Szmek said that one of the benefits of reproducible builds was to help detect and mitigate any kind of supply-chain attack on Fedora's builders and allow others to perform independent verification that the package sources match the binaries that are delivered by Fedora.
The supply chain attacks you have to most worry about are not someone breaking into Fedora build machines.
It's the attacks on the upstream packages themselves.
Reproducible builds would absolutely not catch a situation like the XZ package being compromised a year ago, due to the project merging a contribution from a malicious actor.
A downstream package system or OS distro will just take that malicious update and spin it into a beautifully reproducing build.
Reproducible builds COULD fix the xz issues. The current level would not, but github could do things to make creating the downloadable packages scrip table and thus reproducible. Fedora could checkout the git hash instead of downloading the provided tarball and again get reproducible builds that bypass this.
The above are things worth looking at doing.
However I'm not sure what you can code that tries to obscure the issues while looking good.
And anything designed to catch upstream problems like the XZ compromise will not detect a compromise in the Fedora package build environment. Kinda need both.
When builds are reproducible, one thing a distro can do is have multiple build farms with completely different operators, so there's no shared access and no shared secrets. Then the results of builds of each package on each farm can be compared, and if they differ, you can suspect tampering.
So it could help you detect tampering earlier, and maybe even prevent it from propagating depending on what else is done.
Better security! A malicious actor only needs to change a few bytes in either the source or binary of OpenSSL to break it entirely (i.e. disable certificate checking).
Reproducible builds remove a single point of failure for authenticating binaries – now anyone can do it, not just the person with the private keys.
True, but every step we add makes the others harder too. It is unlikely Ken Thompson's "trusting trust" compiler would detect modern gcc, much less successfully introduce the backdoor. Even if you start with a compromised gcc of that type there is a good chance that after a few years it would be caught when the latest gcc fails to build anymore for someone with the compromised compiler. (now add clang and people using that...)
We may never reach perfection, but the more steps we make in that direction the more likely it is we reach a point where we are impossible to compromise in the real world.
Can someone provide a brief clarification about build reproducibility in general?
The stated aim is that when you compile the same source, environment, and instructions the end result is bit identical.
There is, however; hardware specific optimizations that will naturally negate this stated aim, and I don't see how there's any way to avoid throwing out the baby with the bathwater.
I understand why having a reproducible build is needed on a lot of fronts, but the stated requirements don't seem to be in line with the realities.
At its most basic, there is hardware, where the hardware may advertise features it doesn't have, or doesn't perform the same instructions in the same way, and other nuances that break determinism as a property, and that naturally taints the entire stack since computers rely heavily on emergent design.
This is often hidden in layers of abstraction and/or may be separated into pieces that are architecture dependent vs independent (freestanding), but it remains there.
Most if not all of the beneficial properties of reproducible builds rely on the environment being limited to a deterministic scope, and the reality is manufacturers ensure these things remain in a stochastic scope.
> hardware specific optimizations that will naturally negate this stated aim
Distro packages are compiled on their build server and distributed to users with all kinds of systems; therefore, by nature, it should not use optimizations specific to the builder's hardware.
On source-based distros like Gentoo, yes, users adding optimization flags would get a different output. But there is still value in having the same hardware/compilation flags result in the same output.
Well the point is that if N of M machines produce the same output, it provides the opportunity to question why it is different on the others. If the build is not reproducible then one just throws up their arms.
It’s not clear if you’re also talking about compiler optimizations—a reproducible build must have a fixed target for that.
I was thinking more from a reference point along the lines of LLVM type performance optimizations when I was speaking about optimizations, if that sufficiently clarifies.
> Committing profiles directly in the source repository is recommended as profiles are an input to the build important for reproducible (and performant!) builds. Storing alongside the source simplifies the build experience as there are no additional steps to get the profile beyond fetching the source.
I very much hope other languages/frameworks can do the same.
The Performant claim there is counter to research I have heard. Plus as the PGO profile data is non-deterministic in most cases, even when compiled on the same hardware as the end machine "Committing profiles directly in the source repository" is the reason why they are deleted or at least excluded from the comparison.
A quote from the paper that I remember on the subject[1] as these profiles are just about as machine dependent as you can get.
> Unfortunately, most code improvements are not machine independent, and the few that truly are machine independent interact with those that are machine dependent causing phase-ordering problems. Hence, effectively there are no machine-independent code improvements.
There were some differences between various Xeon chip's implementations of the same or neighboring generations that I personally ran into when we tried to copy profiles to avoid the cost of the profile runs that may make me a bit more sensitive to this, but I personally saw huge drops in performance well into the double digits that threw off our regression testing.
IMHO this is exactly why your link suggested the following:
> Your production environment is the best source of representative profiles for your application, as described in Collecting profiles.
That is very different from Fedora using some random or generic profile for x86_64, which may or may not match the end users specific profile.
If those differences matter so much for your workloads, treat your different machine types as different different architectures, commit profiling data for all of them and (deterministically) compile individual builds for all of them.
Fedora upstream was never going to do that for you anyway (way too many possible hardware configurations), so you were already going be in the business of setting that up for yourself.
It's not at odds at all but it'll be "Monadic" in the sense that the output of system A will be part of the input to system A+1 which is complicated to organize in a systems setting, especially if you don't have access to a language that can verify. But it's absolutely achievable if you do have such a tool, e.g. you can do this in nix.
This is one of the "costs" of reproducible builds, just like the requirement to use pre-configured seeds for pseudo random number generators etc.
It does hit real projects and may be part of the reason that "99%" is called out but Fedora also mentions that they can't match the official reproducible-builds.org meaning in the above just due to how RPMs work, so we will see what other constraints they have to loosen.
Here is one example of where suse had to re-enable it for gzip.
There are other costs like needing to get rid of parallel builds for some projects that make many people loosen the official constraints. The value of PGO+LTO being one.
gcda profiles are unreproducible, but the code they produce is typically the same. If you look into the pipeline of some projects, they just delete the gcda output and then often try a rebuild if the code is different or other methods.
While there are no ideal solutions, one that seems to work fairly well, assuming the upstream is doing reproducible builds, is to vendor the code, build a reproducible build to validate that vendored code, then enable optimizations.
But I get that not everyone agrees that the value of reproducibility is primarily avoiding attacks on build infrastructure.
However reproducible builds as nothing to do with MSO model checking etc... like some have claimed. Much of it is just deleting non-deterministic data as you can see here with debian, which fedora copied.
As increasing the granularity of address-space randomization at compile and link time is easier than at the start of program execution, obviously there will be a cost (that is more than paid for by reducing supply chain risks IMHO) of reduced entropy for address randomization and thus does increase the risk of ROP style attacks.
Regaining that entropy at compile and link time, if it is practical to recompile packages or vendor, may be worth the effort in some situations, probably best to do real PGO at that time too IMHO.
> For example, Haskell packages are not currently reproducible when compiled by more than one thread
Doesn't seem like a big issue to me. The gcc compiler doesn't even support multithreaded compiling. In the C world, parallelism comes from compiling multiple translation units in parallel, not any one with multiple threads.
"trust in the contents of both source and binary packages is low." - I wonder will this convince organisations to adopt proper artifact management processes? If supply chain attacks are on the rise, than surely its more imperative than ever for businesses to adopt secure artifact scanning with tools like Cloudsmith or jFrog?
There's a long tail of obscure packages that are rarely used, and almost certainly a power law in terms of which packages are common. Reproducibility often requires coordination between both the packagers and the developers, and achieving that for each and every package is optimistic.
If they just started quarantining the long tail of obscure packages, then people would get upset. And failing to be 100% reproducible will make a subset of users upset. Lose-lose proposition there, given that intelligent users could just consciously avoid packages that aren't passing reproducibility tests.
100% reproducibility is a good goal, but as long as the ubiquitous packages are reproducible then that is probably going to cover most. Would be interesting to provide an easy way to disallow non-reproducible packages.
I'm sure one day they will be able to make it a requirement for inclusion into the official repos.
There's an interesting thought - in addition to aiming for 99% of all packages, perhaps it would be a good idea to target 100% of the packages that, say, land in the official install media? (I wouldn't even be surprised if they already meet that goal TBH, but making it explicit and documenting it has value)
Linux folks continue with running away with package security paradigms while NPM, PyPI, cargo, et. al. (like that VSCode extension registry that was on the front page last week) think they can still get away with just shipping what some rando pushes.
I have observed a sharp disconnect in the philosophies of 'improving developer experience' and 'running a tight ship'.
I think the last twenty years of quasi-marketing/sales/recruiting DevRel roles have pushed a narrative of frictionless development, while on the flip side security and correctness have mostly taken a back seat (special industries aside).
I think it's a result of the massive market growth, but I so welcome the pendulum swinging back a little bit. Typo squatting packages being a concern at the same time as speculative execution exploits shows mind bending immaturity.
I think that's a consequence of programmers making tools for programmers. It's something I've come to really dislike. Programmers are used to doing things like editing configurations, setting environment variables, or using custom code to solve a problem. As a result you get programs that can be configured, customized through code (scripting or extensions), and little tools to do whatever. This is not IMHO how good software should be designed to be used. On the positive side, we have some really good tooling - revision control in the software world is way beyond the equivalent in any other field. But then git could be used in other fields if not for it being a programmers tool designed by programmers... A lot of developers even have trouble doing things with git that are outside their daily use cases.
Dependency management tools are tools that come about because it's easier and more natural for a programmer to write some code than solve a bigger problem. Easier to write a tool than write your own version of something or clean up a complex set of dependencies.
It's not quite a straight trade; IIRC the OpenBSD folks really push on good docs and maybe good defaults precisely because making it easier to hold the tool right makes it safer.
This is obvious, the question here is why everybody traded security for convenience and what else has to happen for people to start taking security seriously.
Regarding "what else has to happen": I would say something catastrophic. Nothing comes to mind recently.
Security is good, but occasionally I wonder if technical people don't imagine fantastic scenarios of evil masterminds doing something with the data and manage to rule the world.
While in reality, at least the last 5 years there are so many leaders (and people) doing and saying so plainly stupid that I feel we should be more afraid of stupid people than of hackers.
In the last 5 years several major medical providers have had sensitive person data of nearly everyone compromised. The political leaders are biggest problem today, but that could change again.
And what is the actual impact? Don't get me wrong, I don't think it is not bad, but then again abusing information could be done already by the said providers (ex: hike insurance rates based on previous conditions, taking advantage of vulnerable people).
Society works by agreements and laws, not by (absolute) secrecy.
There are of course instances like electrical grid stopping for days, people being killed remotely in hospitals, nuclear plants exploding, that would have a different impact and we might get there, just that it did not happen yet.
The actual impact is that your private medical data is in the hands of thieves, which most people don’t want.
It’s similar to how most people are distressed after a break-in, because they considered their home to be a private space, even though the lock manufacturer never claimed 100% security (or the thieves simply bypassed the locks by smashing a window).
Agreements and laws don’t solve that problem, because thieves already aren’t stopped by those.
>the question here is why everybody traded security for convenience
I don't think security was traded away for convenience. Everything started with convenience, and security has been trying to gain ground ever since.
>happen for people to start taking security seriously
Law with enforced and non-trivial consequences are the only thing that will force people to take security seriously. And even then, most probably still wont.
I think the opposite is mostly true. Linux packaging folks are carefully sculpting their toys, while everyone else is mostly using upstream packages and docker containers to work around the beautiful systems. For half the software I care about on my Debian system, I have a version installed either directly from the web (curl | bash style), from the developer's own APT repo, or most likely from a separate package manager (be it MELPA, pypi, Go cache, Maven, etc).
I use nix package manager on 3 of the systems I'm working daily (one of them HPC cluster) and none of them run NixOS. It's possible to carefully sculpt one's tools and use latest and greatest.
Rightly so. The idea that all software should be packaged for all distros, and that you shouldn't want to use the latest version of software is clearly ludicrous. It only seems vaguely reasonable because it's what's always been done.
If Linux had evolved a more sensible system and someone came along and suggested "no actually I think each distro should have its own package format and they should all be responsible for packaging all software in the world, and they should use old versions too for stability" they would rightly be laughed out of the room.
You can have both python 2 and python 3 installed. Apps should get the dependencies they request. Distros swapping out dependencies out from under them has caused numerous issues for developers.
We can’t have any “your python 2 code doesn’t work on python 3” nonsense
This only happens because distros insit on shipping python and then everyone insisted on using that python to run their software.
In an alternate world everybody would just ship their own python with their own app and not have that problem. That's how windows basically solves this
nixpkgs packages pretty much everything I need. It’s a very large package set and very fresh. It’s mostly about culture and tooling. I tried to contribute to Debian once and gave up after months. I was contributing to nixpkgs days after I started using Nix.
Having every package as part of a distribution is immensely useful. You can declaratively define your whole system with all software. I can roll out a desktop, development VM or server within 5 minutes and it’s fully configured.
Yeah, because they allow anyone to contribute with little oversight. As Lance Vick wrote[1], "Nixpkgs is the NPM of Linux." And Solène Rapenne wrote[2], "It is quite easy to get nixpkgs commit access, a supply chain attack would be easy to achieve in my opinion: there are so many commits done that it is impossible for a trustable group to review everything, and there are too many contributors to be sure they are all trustable."
Pretty much every PR does get reviewed before merging (especially of non-committers) and compromises would be easy to detect in the typical version bump PR. At least it's all out in the open in a big monorepo. E.g. in Debian, maintainers could push binaries directly to the archive a few years ago (I think this is still true for non-main) and IIRC even for source packages people upload them with little oversight and they are not all in open version control.
Of course, Debian developers/maintainers are vetted more. But an intentional compromise in nixpkgs would be much more visible than in Debian, NPM, PyPI or crates.io.
The problem with NixOS (Nix?) is that it is currently embroiled in a culture war. Just to give an example, someone made an April's Fools joke about Elon Musk sponsoring NixOS and they got their post muted and (almost?) caught a suspension.
There is currently a gazillion forks, some being forks of forks because they weren't considered culturally pure enough for the culturally purged fork.
Hopefully Determinate Systems or Ekela can get some real maturity and corporate funding into the system and pull the whole thing out of the quagmire.
Yes Nix is forked. There were some attempts to fork nixpkgs, but they failed because it's far too much work to maintain something like nixpkgs. Nix being forked is less of an issue, because at the very least everyone needs to be compatible with nixpkgs. I think it's ok to have multiple implementations, they lead to exploration of new directions (e.g. one of the projects attempts to reimplement Nix in Rust). Well and of course, Guix started as a fork.
I agree that the infighting is not nice. But to be honest, when you just use NixOS and submit PRs, you do not really notice them. It's not like people are fighting them in the actual PRs to nixpkgs.
Some of the more drawn-out arguments are in or about the RFCs repo, though, so it's not just chat platforms or forum where the fights break out. To your point the RFC repo is also something that not every would-be contributor would touch or need to.
I agree, but it makes me feel supremely uneasy to use something where the main team does Soviet-style purges where they mute/exile/ban community members not because they were being offensive but because they committed wrongthink. Even worse is that they're often prolific devs.
Ironically enough the closest comparison I could make is driving a Tesla. Even if the product is great, you're supporting an organisation that is the opposite.
I think the Nix team will continue to slowly chase away competent people until the rot makes the whole thing wither, at which point everyone switches their upstream over to Determinate Systems' their open core. Although I'm hoping DS will ultimately go the RHEL-Fedora route.
>Just to give an example, someone made an April's Fools joke about Elon Musk sponsoring NixOS and they got their post muted and (almost?) caught a suspension.
This can't be real. Are you sure it was something innocuous and not something bigoted?
Real as you or me, though I disagree with almost everything else the quoted poster expressed other than that forks are happening and the dividing lines include cultural norms rather than strictly technical disagreements.
I personally found it incredibly distasteful and also fairly representative of the quality of conversation you often get from some of the Nix community. I'm not offensive, you're just thin skinned, can't you take a joke, etc. is extremely common. You'll have to judge for yourself whether it's bigoted or dog whistle or neither.
I'm a former casual community member with modest open source work in that ecosystem (projects and handful of nixpkgs PRs) before I left permanently last spring. I no longer endorse its use for any purpose and I seek to replace every piece of it that I was using.
I still hear about the ways they continue to royally fuck up their governance and make negligible progress on detoxifying the culture. It took them until last fucking week to ban Anduril from making hiring posts on the same official forum.
Am I supposed to seriously believe people found that so offensive that the poster was banned for that? Were those crying, feigning offence, being satirical themselves? Truly Poe's Law at work here.
>I personally found it incredibly distasteful
How? Why? It's clearly satire, written in the style of The Onion.
>and also fairly representative of the quality of conversation you often get from some of the Nix community
Good satire? At least some members aren't brainrotted out to the point of no return.
> I'm not offensive, you're just thin skinned, can't you take a joke, etc.
It's clearly not offensive and if that upset you, you clearly have thin skin and can't take the blandest of jokes. Histrionic.
>I no longer endorse its use for any purpose and I seek to replace every piece of it that I was using.
I will also tell others not to use Nix after reading that. The community is indeed too toxic.
>I still hear about the ways they continue to royally fuck up their governance and make negligible progress on detoxifying the culture.
They won't detoxify until they remove all the weak neurotic leftist activists with weird fetishes for "underrepresented minorities."
>It took them until last fucking week to ban Anduril from making hiring posts on the same official forum.
I'm not sure who that is or why it's an issue, but I assume it's something only leftists cry about.
There are too many distributions / formats. However distribution package are much better than snap/flatpack/docker for most uses, the only hard part is there are so many that no program can put "and is integrated into the package manager" in their release steps. You can ship a docker container in your program release - it is often done but rarely what should be done.
For some cases maybe that makes sense, but in a very large percentage it does not. As example, what if I want to build and use/deploy a Python app that needs the latest NumPy, and the system package manager doesn’t have it. It would be hard to justify for me to figure out and build a distro specific package for this rather than just using the Python package on PyPI.
They do not. Even the vast majority of Arch users thinks the policy of "integrate breakage anyway and post the warning on the web changelog" is pants-on-head insane, especially compared to something like SUSE Tumbleweed (also a rolling distro) where things get tested and will stay staged if broken.
You're the one who has added "breaking." The original claim was just that integrators get hate for not keeping up with upstreams, which is true. Many changes aren't breaking; and users don't know which ones are or aren't anyway.
No, your claim was that integrators get hate for not immediately integrating upstream changes. Which is patently false, because users would rather have the packages tested for breaking changes and either patched or have a deprecation warning, rather than having the changes blindly integrated and having their legs pulled out from under them by a breaking change. No one is hating on distro packagers for taking a slight delay to verify and test things.
I brought up Arch because they get a lot of hate for exactly doing that and consequently pulling people's legs out from under them.
The existence of some users desiring stability does not in any way contradict the claim that maintainers get hate for not immediately integrating upstream changes. In particular, distros have more than 2 users -- they can do different things.
I recently re-installed and instead of installing rustup I thought I'd give the Arch rust package a shot. Supposedly it's not the "correct" way to do it but so far it's working great, none of my projects require nightly. Updates have come in maybe a day after upstream. One less thing to think about, I like it.
> Shipping what randos push works great for iOS and Android too.
App store software is excruciatingly vetted, though. Apple and Google spend far, far, FAR more on validating the software they ship to customers than Fedora or Canonical, and it's not remotely close.
It only looks like "randos" because the armies of auditors and datacenters of validation software are hidden behind the paywall.
It's really not. At least on Android they have some automated vetting but that's about it. Any human vetting is mostly for business reasons.
Also Windows and Mac have existed for decades and there's zero vetting there. Yeah malware exists but its easy to avoid and easily worth the benefit of actually being able to get up-to-date software from anywhere.
The vetting on Windows is that basically any software that isn't signed by an EV certificate will show a scary SmartScreen warning on users' computers. Even if your users aren't deterred by the warning, you also have a significant chance of your executable getting mistakenly flagged by Windows Defender, and then you have to file a request for Microsoft to whitelist you.
The vetting on Mac is that any unsigned software will show a scary warning and make your users have to dig into the security options in Settings to get the software to open.
This isn't really proactive, but it means that if you ship malware, Microsoft/Apple can revoke your certificate.
If you're interested in something similar to this distribution model on Linux, I would check out Flatpak. It's similar to how distribution works on Windows/Mac with the added benefit that updates are handled centrally (so you don't need to write auto-update functionality into each program) and that all programs are manually vetted both before they go up on Flathub and when they change any permissions. It also doesn't cost any money to list software, unlike the "no scary warnings" distribution options for both Windows and Mac.
Most users only have an handful of applications installed. They may not be the same one, but flatpak is easy to setup if you want the latest version. And you're not tied to the system package manager if you want the latest for CLI software (nix, brew, toolbox,...)
The nice thing about Debian is that you can have 2 full years of routine maintenance while getting reading for the next big updates. The main issue is upstream developer having bug fixes and features update on the same patch.
>App store software is excruciatingly vetted, though. Apple and Google spend far, far, FAR more on validating the software they ship to customers than Fedora or Canonical, and it's not remotely close.
Hahah! ...they don't. They really don't, man. They do have procedures in place that makes them look like they do, though; I'll give you that.
I often see initiatives and articles like this but no mention of Nix. Is it just not well known enough for comparison? Because to me that’s the standard.
I use Nix extensively, but the Nix daemon doesn't do much of use that can't be achieved by building your code from a fixed OCI container with internet turned off. The latter is certainly more standard across the industry, and sadly a lot easier too. Nix is not a revolutionary containerisation technology, nor honestly a very good one.
The value in Nix comes from the package set, nixpkgs. What is revolutionary is how nixpgks builds a Linux distribution declaratively, and reproducibly, from source through purely functional expressions. However, nixpkgs is almost an entire universe unto itself, and it is generally incompatible with the way any other distribution would handle things, so it would be no use to Fedora, Debian, and others
At work we went back to a Docker build to make reproducible images. The primary reason is poor cross-compilation support in Nix on Arm when developers needed to compile for an amd64 service and derive image checksums that are put into tooling that are run locally for service version verification and reproducibility.
With Docker it turned out relatively straightforward. With Nix even when it runs in Linux Arm VM we tried but just gave up.
In the near term it makes more sense to position nix as a common interface between app developers and distro maintainers and not as a direct-to-user way to cut their distro maintainers out of the loop entirely (although it is quite useful for that).
Ideally, a distro maintainer would come across a project packaged with nix and think:
> Oh good, the app dev has taken extra steps to make life easy for me.
As-is, I don't think that's the case. You can add a flake output to your project which builds an .rpm or a .deb file, but it's not commonly done.
I'm guessing that most of the time, distro maintainers would instead hook directly into a language specific build-tool like cmake or cargo and ignore the nix stuff. They benefit from nix only indirectly in cases where it has prevented the app dev from doing crazy things in their build (or at least has made that crazyness explicit, versus some kind of works-on-my-machine accident or some kind of nothing-to-see here skulduggery).
If we want to nixify the world I think we should focus less on talking people out of using package managers which they like and more on making the underlying packages more uniform.
Oh, I assure you, it's hard to escape knowing about Nix if you write about this sort of thing. Someone will be along almost immediately to inform you about it.
Nix wasn't mentioned (I'm the author) because it really isn't relevant here -- the comparable distributions, when discussing what Fedora is doing, are Debian and other distributions that use similar packaging schemes and such.
I agree that NixOS/nixpkgs would not be a good a basis for comparison. Do you have an opinion about the use of nix by app devs to specify their builds, i.e. as a make alternative, not as a Fedora alternative?
Quoting the article:
> Irreproducible bits in packages are quite often "caused by an error or sloppiness in the code". For example, dependence on hardware architecture in architecture-independent (noarch) packages is "almost always unwanted and/or a bug", and reproducibility tests can uncover those bugs.
This is the sort of thing that nix is good at guarding against, and it's convenient that it doesn't require users to engage with the underlying toolchain if they're unfamiliar with it.
For instance I can use the command below to build helix at a certain commit without even knowing that it's a rust package. Although it doesn't guarantee all aspects of repeatability, it will fail if the build depends on any bits for which a hash is not known ahead of time, which gets you half way there I think.
Used in this way, can nix help Fedora's reproducibility efforts? Or does it appear to Fedora as a superfluous layer to be stripped away so that they can plug into cargo more directly?
Used in this way, can nix help Fedora's reproducibility efforts? Or does it appear to Fedora as a superfluous layer to be stripped away so that they can plug into cargo more directly?
A lot of Nix-based package builds will burn Nix store paths directly into the binary. If you are lucky it's only the rpath and you can strip it, but in some cases other Nix store paths end up in the binary. Seems pretty useless to Fedora.
Besides many of the difficult issues are not solved by Nix either. (E.g. build non-determinism by ordering differences due to the use of a hashmap somewhere in the build.)
> A lot of Nix-based package builds will burn Nix store paths directly into the binary
I didn't know that, sounds like a bug. Maybe something can be done to make it easier to know that this is the case for your build.
I'd still think that by refusing to build things with unspecified inputs, nix prunes a whole category of problems away which then don't bite the distro maintainers, but maybe that's wishful thinking.
I'll continue to use it because it's nice to come to a project I haven't worked on in a few years and not have to think about whether it's going to now work on this machine or figure out what the underlying language-specific commands are--but if there were ways to tweak things so that others have this feeling also, I'd like to know them.
I didn't know that, sounds like a bug. Maybe something can be done to make it easier to know that this is the case for your build.
It's a feature. E.g. if a binary needs to load data files, it needs to know the full path, or you are back to an FHS filesystem layout (which has a lot of issues that Nix tries to solve).
I'd still think that by refusing to build things with unspecified inputs,
I haven't followed development of traditional Linux distributions, but I am pretty sure that they also build in minimal sandboxes that only contain specified dependencies. See e.g. Mock: https://github.com/rpm-software-management/mock
Oh I'm sure they do build in sandboxes... my point is that those builds are more likely to succeed on the first try if the app dev also built in a sandbox. If you let nix scold you for tempting fate and relying on the output of an unchecked curl, that's one fewer headache for whoever later tries to sandbox your stuff.
That's such a weird characterization of this article, which (in contrast to other writing on this subject) clearly concludes (a) that Nix achieves a very high degree of reproducibility and is continuously improving in this respect, and (b) Nix is moreover reproducible in a way that most other distros (even distros that do well in some measures of bitwise reproducibility) are not (namely, time traveling— being able to reproduce builds in different environments, even months or years later, because the build environment itself is more reproducible).
The article you linked is very clear that both qualitatively and quantitatively, NixOS has made achieved high degrees of reproducibility, and even explicitly rejects the possibility of assessing absolute reproducibility.
NixOS may not be the absolute leader here (that's probably stagex, or GuixSD if you limit yourself to more practical distros with large package collections), but it is indeed very good.
> NixOS may not be the absolute leader here (that's probably stagex, or GuixSD if you limit yourself to more practical distros with large package collections), but it is indeed very good.
Could you comment on how stagex is? It looks like it might indeed be best in class, but I've hardly heard it mentioned.
The Bootstrappable Builds folks created a way to go from only an MBR of (commented) machine code (plus a ton of source) all the way up to a Linux distro. The stagex folks built on top of that towards OCI containers.
And with even a little bit of imagination, it's easy to think of other possible measures of degrees of reproducibility, e.g.:
• % of deployed systems which consist only of reproducibly built packages
• % of commonly downloaded disk images (install media, live media, VM images, etc.) consist only of reproducibly built packages
• total # of reproducibly built packages available
• comparative measures of what NixOS is doing right like: of packages that are reproducibly built in some distros but not others, how many are built reproducibly in NixOS
• binary bootstrap size (smaller is better, obviously)
It's really not difficult to think of meaningful ways that reproducibility of different distros might be compared, even quantitatively.
Sure, but in terms of absolute number of packages that are truly reproducible, they outnumber Debian because Debian only targets reproducibility for a smaller fraction of total packages & even there they're not 100%. I haven't been able to find reliable numbers for Fedora on how many packages they have & in particular how many this 99% is targeting.
By any conceivable metric Nix really is ahead of the pack.
Disclaimer: I have no affiliation with Nix, Fedora, Debian etc. I just recognize that Nix has done a lot of hard work in this space & Fedora + Debian jumping onto this is in no small part thanks to the path shown by Nix.
> Disclaimer: I have no affiliation with Nix, Fedora, Debian etc. I just recognize that Nix has done a lot of hard work in this space & Fedora + Debian jumping onto this is in no small part thanks to the path shown by Nix
This is completely the wrong way around.
Debian spearheaded the Reproducible Builds efforts in 2016 with contributions from SUSE, Fedora and Arch. NixOS got onto this as well but has seen less progress until the past 4-5 years.
The NixOS efforts owes the Debian project all their thanks.
> Arch Linux is 87.7% reproducible with 1794 bad 0 unknown and 12762 good packages.
That's < 15k packages. Nix by comparison has ~100k total packages they are trying to make reproducible and has about 85% of them reproducible. Same goes for Debian - ~37k packages tracked for reproducible builds. One way to lie with percentages is when the absolute numbers are so disparate.
> This is completely the wrong way around. Debian spearheaded the Reproducible Builds efforts in 2016 with contributions from SUSE, Fedora and Arch. NixOS got onto this as well but has seen less progress until the past 4-5 years. The NixOS efforts owes the Debian project all their thanks.
Debian organized the broader effort across Linux distros. However the Nix project was designed from the ground up around reproducibility. It also pioneered architectural approaches that other systems have tried to emulate since. I think you're grossly misunderstanding the role Nix played in this effort.
> That's < 15k packages. Nix by comparison has ~100k total packages they are trying to make reproducible and has about 85% of them reproducible. Same goes for Debian - ~37k packages tracked for reproducible builds. One way to lie with percentages is when the absolute numbers are so disparate.
That's not a lie. That is the package target. The `nixpkgs` repository in the same vein package a huge number of source archives and repackages entire ecosystems into their own repository. This greatly inflates the number of packages. You can't look at the flat numbers.
> However the Nix project was designed from the ground up around reproducibility.
It wasn't.
> It also pioneered architectural approaches that other systems have tried to emulate since.
This has had no bearing, and you are greatly overestimating the technical details of nix here. It's fundamentally invented in 2002, and things has progressed since then. `rpath` hacking really is not magic.
> I think you're grossly misunderstanding the role Nix played in this effort.
I've been contributing to the Reproducible Builds effort since 2018.
I think people are generally confused the different meanings of reproducibility in this case. The reproducibility that Nix initially aimed at is: multiple evaluations of the same derivations will lead to the same normalized store .drv. For a long time they were not completely reproducible, because evaluation could depend on environment variables, etc. But flakes have (completely ?) closed this hole. So, the reproducibility in Nix means that evaluating the same package set will lead to the same set of build recipes (.drvs).
However, this doesn't say much about build artifact reproducibility. A package set could always evaluate to the same drvs, but if all the source packages choose what to build based on random() > 0.5, then there is no of build artifacts at all. This type of reproducibility is spearheaded by Debian and Arch more than Nix.
Different notions of reproducible. This project cares specifically about bit-for-bit identical builds (e.g. no time stamps, parallel compile artifacts etc). Nix is more about being declarative and "repeatable" or whatever a good name for that would be.
Both notions are useful for different purposes and nix is not particularly good at the first one.
It's very, very complicated. It's so far past the maximum effort line of most linux users as to be in its own class of tools. Reproducibility in the imperative package space is worth a lot. Lots of other tools are built on RPM/DEB packages that offer similar advantages of Nix -- Ansible, for one. This is more of a "rising tide raises all boats" situation.
Yes, I know that. But when talking about reproducible packages, we can and should learn from existing techniques. Is Fedoras system capable of this goal? Is it built the right way? Should it adopt an alternate package manager to achieve this with less headache?
That's not even really true. Hacker News readers seem to fixate heavily on the most popular desktop versions of Linux distros, but that is far from all they make. SUSE with SLE Micro and Edge Image Builder is a declarative build system for creating your own installer images. If you don't want SLE Micro, they even give you the Elemental Toolkit, which can be used to build your own custom distro in a declarative manner. I don't know much about Debian and Red Hat's approaches to this, but I know they have similar tooling. The "problem," if you want to call it that, is developers, if I may be uncharitable for a brief moment here, are amazingly solipsistic. The major Linux vendors are targeting enterprise edge computing use cases, whereas developers only ever think about the endpoint PCs they themselves use for development. Plenty of distro tooling will give you declaratively built, transactionally updated, immutable server images, but if they don't give you the same for your personal laptop, those efforts are invisible to the modal Hacker News commenter.
For what it's worth, there's also Guix, which is literally a clone of Nix but part of the GNU project, so it only uses free software and opts for Guile instead of a custom DSL for configuration. It wasn't a pre-existing distro that changed, of course.
This goal feels like a marketing OKR to me. A proper technical goal would be "all packages, except the ones that have a valid reason, such as signatures, not to be reproducible".
```This definition excludes signatures and some metadata and focuses solely on the payload of packaged files in a given RPM:
A build is reproducible if given the same source code, build environment and build instructions, and metadata from the build artifacts, any party can recreate copies of the artifacts that are identical except for the signatures and parts of metadata.```
> The contents, however, should still be "bit-by-bit" identical, even though that phrase does not turn up in Fedora's definition.
So, according to the literal interpretation of the article, signatures inside the payload (e.g., files that are signed using an ephemeral key during the build, NOT the overall RPM signature) are still a self-contradictory area and IMHO constitute a possibly-valid reason for not reaching 100% payload reproducibility.
At Google SRE we often had very technical OKRs that were formulated with some 'number of 9s'. Like 99.9999% uptime or something like that. So getting two 9s of reproducibility seems like a reasonable first goal. I hope they will be adding more nines later.
I learned this from an ansible molecule test env setup script for use in containers and VMs years ago; because `which` isn't necessarily installed in containers for example:
For Debian, Ubuntu, Raspberry Pi OS and other dpkg .deb and apt distros:
man sources.list
man sources.list | grep -i keyring -C 10
# trusted:
# signed-by:
# /etc/apt/ trusted.gpg.d/
man apt-secure
man apt-key
apt-key help
less "$(type -p apt-key)"
> In Ubuntu 24.04, APT will require repositories to be signed using one of the following public key algorithms: [ RSA with at least 2048-bit keys, Ed25519, Ed448 ]
> This has been made possible thanks to recent work in GnuPG 2.4 82 by Werner Koch to allow us to specify a “public key algorithm assertion” in APT when calling the gpgv tool for verifying repositories.
This is a waste of time compared to investing in sandboxing which will actually protect users as opposed to stopping theoretical attacks. Fedora's sandbox capabilities for apps is so far behind other operating systems like Android that it is much more important of an area to address.
Defaulting to Android-style nanny sandboxing ("you can't grant access to your Downloads folder because we say so" etc.) is unlikely to go over well with the average Linux distro userbase.
Also, maximally opt-in sandboxes for graphical applications have been possible for a while. Just use Podman and only mount your Wayland socket + any working files.
>Defaulting to Android-style nanny sandboxing ("you can't grant access to your Downloads folder because we say so" etc.) is unlikely to go over well with the average Linux distro userbase.
If you market it that way. Plenty of Linux users say they care about security, don't want maleware, etc. This is a step towards those desires. Users have been conditioned to use tools badly to designed for security for decades so there will be some growing pains, but it will get worse the longer people wait.
>Just use Podman and only mount your Wayland socket + any working files.
This won't work for the average user. Security needs to be accessible.
All files pokes a big hole, really you want portals. You probably also want audio. And perhaps graphics acceleration depending on the software, and other limited hardware access. And dbus (but sandboxed proxy access). Probably some global config from the host as well (fonts, themes, cursor). And…well maybe there’s more but the point is it depends on the software. Maybe it is that simply sometimes. Often it’s not for non-trivial applications.
That assumes that there is a fixed amount of effort and resources to be split between the two things—that there is always opportunity cost.
When it comes to community efforts, it’s rarely the case that all things have opportunity cost—people who contribute effort for X would not have necessarily done so for Y.
The Fedora Project is of course not a purely community effort, so I don’t know exactly how that applies here. But just wanted to point out that prioritization and opportunity cost don’t always work like you suggested.
If the apps work together, they typically belong to the same security domain / trust level. Do you have examples when you still have to isolate them from each other?
Sure, that is one solution, but it still needs a lot of work both to patch up holes in it and to fix apps to be better designed in regards to security.
Don't let perfect be the enemy of good? Flatpak is not perfect, but it has steadily improved security over the years. When I first used Flatpak, most applications had their sandboxes wide-open (because they were not compatible with sandboxing). Nowadays a lot of Flatpaks have much more limited sandboxes.
The real treasure was the friend I found along the way
https://github.com/keszybz/add-determinism
Which is I guess the NIH version of https://salsa.debian.org/reproducible-builds/strip-nondeterm... ...
Rather, it's the version without a dependency on PERL.
That's mentioned at the bottom of the README.
…and in the article:
> The Fedora project chose to write its own tool because it was undesirable to pull Perl into the build root for every package.
I kind of wonder if this or something similar could somehow nullify timestamps so you could compare two logfiles...
further would be the ability to compare logfiles with pointer addresses or something
I'm not confident that I understand what you're asking for, but couldn't you just sed off the timestamp from every line? Or for a more extreme example, I have occasionally used ... tr, I think? to completely remove all numbers from logs in order to aggregate error messages without worrying about the fact that they kept including irrelevant changing numbers (something like tail -5000 logfile | tr -d [0-9] | sort | uniq -c | sort -n or so).
how would you do it if your logs were printed on paper with a printer, each line printed with stochastic timing (due to a bug), with an ink containing a chemical tracer with halflife `h` (after being put to paper), but the ink is randomly sampled from several (`m`) inks of different halflives `h1`, h2`,... `hn`? assume `p` different printers scattered across the 10 most populous US cities. you may use standard unix utilities.
Have interns shovel it all into scanners, run the result through tesseract, then do the thing I said before. Nonetheless, I don't think your question is sincere; what point are you actually trying to get at?
Sorry, I was just trying to make a joke (about both insane systems and interview questions) since the question you answered was a bit unclear. Guess it didn't land, haha.
Ah, that makes much more sense. It read as somewhat aggressive in a way that I couldn't quite make sense of; my best guess was that you were insinuating that the unix tools I was reaching for were arcane and unwieldy. Thanks for clarifying.
You could just trap time systems calls to always return zero via LD_PRELOAD
A different but more powerful method of ensuring reproducibility is more rigorous compilation using formally verifiable proofs.
That’s what https://pi2.network/ does. It uses K-Framework, which is imo very underrated/deserves more attention as a long term way of solving this kind of problem.
nice to see they're in this too.
https://news.opensuse.org/2025/02/18/rbos-project-hits-miles...
Another thing I'd love to see is more statically linked binaries. Something like Python, for instance, is a nightmare to install and work with
I think general consensus is against you. Fedora packaging policy [1]:
> Packages including libraries should exclude static libs as far as possible (eg by configuring with --disable-static). Static libraries should only be included in exceptional circumstances. Applications linking against libraries should as far as possible link against shared libraries not static versions.
[1]: https://docs.fedoraproject.org/en-US/packaging-guidelines/
I'd far rather a static binary than a bundled vm for a single app which produces all the same bad points of a static binary plus 900 new bad points on top.
Packaging guidelines from a distros docs like this are not any kind of counter argument to that comment.
This is the current orthodoxy, so obviously all docs say it. We all know the standard argument for the current standard. Their comment was explicitly "I'd like to see a change from the current orthodoxy". They are saying that maybe that argument is not all it promised to be back in the 90's when we started using dynamic libs.
So instead of 1000 programs and 1000 libraries, you’d rather 1000 programs and 1,000,000 libraries?
Given that the comment is talking about python he probably already has those 1.000.000 libraries.
The common thing to do for python programs that are not directly bundled with the os is to set up a separate virtual environment for each one and download/compile the exact version of each dependency from scratch.
That's already the result from most of these container formats, just messier
Baffled how you got there, but not interested.
Their point is that if 1000 programs use the same 1000 libraries, static linking duplicates all those libraries across each binary, taking that much more storage and memory (which can hurt performance as well), effectively making 1000000 libraries in use.
Dynamic linking gives you M binaries + N libraries. Static linking is M * N.
But there are not 1000 programs being proposed. No one said every binary in a system. Only some binaries are a problem. That is silly hyperbole that isn't useful or any kind of valid argument.
What I said specifically is I'd rather a static binary than a flatpak/snap/appimage/docker/etc. That is a comparison between 2 specific things, and neither of them is "1000 programs using 1000 libraries"
And some binaries already ship with their own copies of all the libraries anyway, just in other forms than static linking. If there are 1000 flatpaks/snaps/docker images etc, then those million libraries are already out there in an even worse form than if they were all static binaries. But there are not, generally, on any give single system, yet, though the number is growing not shrinking.
For all the well known and obvious benefits of dynamic linking, there are reasons why sometimes it's not a good fit for the task.
And in those cases where, for whatever reason, you want the executable to be self-contained, there are any number of ways to arrange it, from a simple tar with the libs & bin in non-conflicting locations and a launcher script that sets a custom lib path (or bin is compiled with the lib path), to appimage/snap/etc, to a full docker/other container, to unikernel, to simple static bin.
All of those give different benefits and incur different costs. Static linking simply has the benefit of being dead simple. It's both space and complexity-efficient compared to any container or bundle system.
I don't know why you'd go straight to a VM as the alternative when containers are the obvious choice.
A literal vm is hyperbole and sloppy language on my part. I meant all the different forms of containerising or bundling apps with all of their dependencies and os environment.
not much difference from a dependency management point of view
For Python, take a look at the musl builds in python-build-standalone[1], which are statically linked.
I also have a tiny collection of statically linked utilities available here[2].
[1] https://github.com/astral-sh/python-build-standalone
[2] https://github.com/supriyo-biswas/static-builds
What do you mean with “a nightmare to install and work with” exactly?
They use Windows. \s
Python has official installers for windows, is distributed in the Microsoft store and can also be pulled out by UV which works a breeze in Powershell.
What is UV?
https://github.com/astral-sh/uv
Due to the python reference I think you mean "compiles into a single binary", not necessarily "static linking".
This binary may be statically linked, or link to system libraries. Quite a few times the only system library being linked is libc though.
But yes, I also hope this gets more prevalent instead of the python approach.
Were stuck with a computing paradigm from 50 years ago.
Ideally everything would be statically linked but thr sections would be marked and deduped by the filesystem.
Even though the idea is much older, shared libraries were only introduced on Unix systems on SunOS 4.x and System V release 3 (?). Sun paper from 1988: https://www.cs.cornell.edu/courses/cs414/2001FA/sharedlib.pd...
They'd still be duplicated in RAM in the page cache… which is the real problem with static linking.
As a user of fedora what does this actually get me? I mean I understand it for hermetic builds but why?
Reproducible builds can improve software quality.
If we believe we have a reproducible build, that's constitutes a big test case which gives us confidence in the determininism of the whole software stack.
To validate that test case, we actually have to repeat the build a number of times.
If we spot a difference, something is wrong.
For instance, suppose that a compiler being used has a bug whereby it is relying on the value of an unitialized variable somewhere. That could show up as a difference in the code it generates.
Without reproducible builds, of course there are always differences in the results of a build: we cannot use repeated builds to discover that something is wrong.
(People do diffs between irreproducible builds anyway. For instance, disassemble the old and new binaries, and do a textual diff, validating that only some expected changes are present, like string literals that have embedded build dates. If you have reproducible builds, you don't have to do that kind of thing to detect a change.
Reproducible builds will strengthen the toolchains and surrounding utilities. They will flush out instabilities in build systems, like parallel Makefiles with race conditions, or indeterminate orders of object files going into a link job, etc.
I don't know this area, but it seems to me it might be a boon to security? So that you can tell if components have been tampered with?
That's already been a thing in all the Redhat variants. RPM/DNF have checksums of the installed binaries and there is GPG signing of packages and repositories. The only part of that ecosystem I've always had a gripe with is putting the GPG public keys in the mirrors. People should have to grab those from non mirrors or any low skilled attacker can just replace the keys and sign everything again. It would be caught but not right away.
Changes can also be caught using bolt on tools like Tripwire, OSSEC and it's alternatives or even home grown tools that build signed manifests of approved packages usually for production approval.
Yes! The attack on SolarWinds Orion was an attack on its build process. A verified reproducible build would have detected the subversion, because the builds would not have matched (unless the attackers managed to detect and break into all the build processes).
Exact same binary can be hacked in exact same way on all platforms.
If I understand correctly that would require releasing the publisher’s private key, though,correct?
What?
Bingo. We caught a virus tampering with one of our code gens this way.
for related, see Ken Thompson:
http://genius.cat-v.org/ken-thompson/texts/trusting-trust/
You know what does not give me confidence? Updating software, but whats that, its still printing the same build date? Of course hours later tens of files deep I found out some reproducability goof just hardcoded it.
So far, reproducible builds are heavy on the former, zero on these bugs you mention and zero on supply chain attacks.
Some of these supposed use cases make no sense. You update the compiler. Oh no, all the code is different? Enjoy the 16h deep dive to realize someone tweaked code generation based on the cycle times given on page 7893 of the Intel x64 architecture reference manual.
They should be setting the build days for a package from say the commit date of the top commit of the branch that's being built. It can't be something that doesn't change when the next version is spun. If you see a behavior like that in anybody's reproducible package system or distro, you have a valid complaint.
I don't think it is that unlikely that build hosts or some related part of the infrastructure gets compromised.
My impression is that reproducible builds improve your security by helping make it more obvious that packages haven't been tampered with in late stages of the build system.
* Edit, it's quoted in the linked article:
> Jędrzejewski-Szmek said that one of the benefits of reproducible builds was to help detect and mitigate any kind of supply-chain attack on Fedora's builders and allow others to perform independent verification that the package sources match the binaries that are delivered by Fedora.
The supply chain attacks you have to most worry about are not someone breaking into Fedora build machines.
It's the attacks on the upstream packages themselves.
Reproducible builds would absolutely not catch a situation like the XZ package being compromised a year ago, due to the project merging a contribution from a malicious actor.
A downstream package system or OS distro will just take that malicious update and spin it into a beautifully reproducing build.
Don't let the perfect be the enemy of the good; this doesn't prevent upstream problems but it removes one place for compromises to happen.
I'm not saying don't have reproducible builds; it's just that this is an unimportant justification for them, almost unnecessary.
Reproducible builds are such an overhelmingly good and obvious thing, that build farm security is just a footnote.
Your mere footnote is my soft, soft underbelly.
Any hardening is still hardening.
Reproducible builds COULD fix the xz issues. The current level would not, but github could do things to make creating the downloadable packages scrip table and thus reproducible. Fedora could checkout the git hash instead of downloading the provided tarball and again get reproducible builds that bypass this.
The above are things worth looking at doing.
However I'm not sure what you can code that tries to obscure the issues while looking good.
And anything designed to catch upstream problems like the XZ compromise will not detect a compromise in the Fedora package build environment. Kinda need both.
When builds are reproducible, one thing a distro can do is have multiple build farms with completely different operators, so there's no shared access and no shared secrets. Then the results of builds of each package on each farm can be compared, and if they differ, you can suspect tampering.
So it could help you detect tampering earlier, and maybe even prevent it from propagating depending on what else is done.
Bingo.
Better security! A malicious actor only needs to change a few bytes in either the source or binary of OpenSSL to break it entirely (i.e. disable certificate checking).
Reproducible builds remove a single point of failure for authenticating binaries – now anyone can do it, not just the person with the private keys.
It's one tool of many that can be used to prevent malicious software from sneaking in to the supply chain.
Keep in mind that compilers can be backdoored to install malicious code. Bitwise/signature equivalency does not imply malware-free software.
True, but every step we add makes the others harder too. It is unlikely Ken Thompson's "trusting trust" compiler would detect modern gcc, much less successfully introduce the backdoor. Even if you start with a compromised gcc of that type there is a good chance that after a few years it would be caught when the latest gcc fails to build anymore for someone with the compromised compiler. (now add clang and people using that...)
We may never reach perfection, but the more steps we make in that direction the more likely it is we reach a point where we are impossible to compromise in the real world.
In this attack, the compiler is not a reproducible artifact? Or does backdooring use another technique?
Their point is that you'd need something like https://bootstrappable.org/ (which does exist).
Can someone provide a brief clarification about build reproducibility in general?
The stated aim is that when you compile the same source, environment, and instructions the end result is bit identical.
There is, however; hardware specific optimizations that will naturally negate this stated aim, and I don't see how there's any way to avoid throwing out the baby with the bathwater.
I understand why having a reproducible build is needed on a lot of fronts, but the stated requirements don't seem to be in line with the realities.
At its most basic, there is hardware, where the hardware may advertise features it doesn't have, or doesn't perform the same instructions in the same way, and other nuances that break determinism as a property, and that naturally taints the entire stack since computers rely heavily on emergent design.
This is often hidden in layers of abstraction and/or may be separated into pieces that are architecture dependent vs independent (freestanding), but it remains there.
Most if not all of the beneficial properties of reproducible builds rely on the environment being limited to a deterministic scope, and the reality is manufacturers ensure these things remain in a stochastic scope.
> hardware specific optimizations that will naturally negate this stated aim
Distro packages are compiled on their build server and distributed to users with all kinds of systems; therefore, by nature, it should not use optimizations specific to the builder's hardware.
On source-based distros like Gentoo, yes, users adding optimization flags would get a different output. But there is still value in having the same hardware/compilation flags result in the same output.
Well the point is that if N of M machines produce the same output, it provides the opportunity to question why it is different on the others. If the build is not reproducible then one just throws up their arms.
It’s not clear if you’re also talking about compiler optimizations—a reproducible build must have a fixed target for that.
I was thinking more from a reference point along the lines of LLVM type performance optimizations when I was speaking about optimizations, if that sufficiently clarifies.
> There is, however; hardware specific optimizations that will naturally negate this stated aim
These are considered to be different build artifacts, which are also reproducible.
Reproducibility is at odds with Profile-Guided-Optimization. Especially on anything that involves networking and other IO that isn't consistent.
from Go documentation[0]:
> Committing profiles directly in the source repository is recommended as profiles are an input to the build important for reproducible (and performant!) builds. Storing alongside the source simplifies the build experience as there are no additional steps to get the profile beyond fetching the source.
I very much hope other languages/frameworks can do the same.
[0]: https://go.dev/doc/pgo#building
The Performant claim there is counter to research I have heard. Plus as the PGO profile data is non-deterministic in most cases, even when compiled on the same hardware as the end machine "Committing profiles directly in the source repository" is the reason why they are deleted or at least excluded from the comparison.
A quote from the paper that I remember on the subject[1] as these profiles are just about as machine dependent as you can get.
> Unfortunately, most code improvements are not machine independent, and the few that truly are machine independent interact with those that are machine dependent causing phase-ordering problems. Hence, effectively there are no machine-independent code improvements.
There were some differences between various Xeon chip's implementations of the same or neighboring generations that I personally ran into when we tried to copy profiles to avoid the cost of the profile runs that may make me a bit more sensitive to this, but I personally saw huge drops in performance well into the double digits that threw off our regression testing.
IMHO this is exactly why your link suggested the following:
> Your production environment is the best source of representative profiles for your application, as described in Collecting profiles.
That is very different from Fedora using some random or generic profile for x86_64, which may or may not match the end users specific profile.
[1] https://dl.acm.org/doi/10.5555/184716.184723
If those differences matter so much for your workloads, treat your different machine types as different different architectures, commit profiling data for all of them and (deterministically) compile individual builds for all of them.
Fedora upstream was never going to do that for you anyway (way too many possible hardware configurations), so you were already going be in the business of setting that up for yourself.
That's only the case if you did PGO with "live" data instead of replays from captured runs, which is best practice afaik.
It's not at odds at all but it'll be "Monadic" in the sense that the output of system A will be part of the input to system A+1 which is complicated to organize in a systems setting, especially if you don't have access to a language that can verify. But it's absolutely achievable if you do have such a tool, e.g. you can do this in nix.
This is one of the "costs" of reproducible builds, just like the requirement to use pre-configured seeds for pseudo random number generators etc.
It does hit real projects and may be part of the reason that "99%" is called out but Fedora also mentions that they can't match the official reproducible-builds.org meaning in the above just due to how RPMs work, so we will see what other constraints they have to loosen.
Here is one example of where suse had to re-enable it for gzip.
https://build.opensuse.org/request/show/499887
Here is a thread on PGO from the reproducible-builds mail list.
https://lists.reproducible-builds.org/pipermail/rb-general/2...
There are other costs like needing to get rid of parallel builds for some projects that make many people loosen the official constraints. The value of PGO+LTO being one.
gcda profiles are unreproducible, but the code they produce is typically the same. If you look into the pipeline of some projects, they just delete the gcda output and then often try a rebuild if the code is different or other methods.
While there are no ideal solutions, one that seems to work fairly well, assuming the upstream is doing reproducible builds, is to vendor the code, build a reproducible build to validate that vendored code, then enable optimizations.
But I get that not everyone agrees that the value of reproducibility is primarily avoiding attacks on build infrastructure.
However reproducible builds as nothing to do with MSO model checking etc... like some have claimed. Much of it is just deleting non-deterministic data as you can see here with debian, which fedora copied.
https://salsa.debian.org/reproducible-builds/strip-nondeterm...
As increasing the granularity of address-space randomization at compile and link time is easier than at the start of program execution, obviously there will be a cost (that is more than paid for by reducing supply chain risks IMHO) of reduced entropy for address randomization and thus does increase the risk of ROP style attacks.
Regaining that entropy at compile and link time, if it is practical to recompile packages or vendor, may be worth the effort in some situations, probably best to do real PGO at that time too IMHO.
Yo, the attacker has access to the same binaries, so only runtime address randomization is useful.
The problem is that a common method is to seed the linkers symbol generation with a relative file name.
This reduces entropy across binaries and may enable reliable detection of bas addresses or to differentiate gadgets in text regions.
It is all tradeoffs.
But think about how a known phrase is how enigma was cracked.
Why should it be?
Does the profiler not output a hprof file or whatever, which is the input to the compiler making the release binary? Why not just store that?
> For example, Haskell packages are not currently reproducible when compiled by more than one thread
Doesn't seem like a big issue to me. The gcc compiler doesn't even support multithreaded compiling. In the C world, parallelism comes from compiling multiple translation units in parallel, not any one with multiple threads.
"trust in the contents of both source and binary packages is low." - I wonder will this convince organisations to adopt proper artifact management processes? If supply chain attacks are on the rise, than surely its more imperative than ever for businesses to adopt secure artifact scanning with tools like Cloudsmith or jFrog?
Amazing to see this progress! Cudos to everyone who put in the effort.
Related news from March https://news.ycombinator.com/item?id=43484520 (Debian bookworm live images now fully reproducible)
YES! I want more tools to be deterministic. My wish-list has Proxmox config at the very top.
Want to give this a try and see if it works? https://github.com/SaumonNet/proxmox-nixos?tab=readme-ov-fil...
Interesting take. I'm building something related to zk systems — will share once it's up.
[dead]
[dead]
[dead]
[flagged]
99%? Debbie Downer says it only takes 1 package to screw the pooch
There's a long tail of obscure packages that are rarely used, and almost certainly a power law in terms of which packages are common. Reproducibility often requires coordination between both the packagers and the developers, and achieving that for each and every package is optimistic.
If they just started quarantining the long tail of obscure packages, then people would get upset. And failing to be 100% reproducible will make a subset of users upset. Lose-lose proposition there, given that intelligent users could just consciously avoid packages that aren't passing reproducibility tests.
100% reproducibility is a good goal, but as long as the ubiquitous packages are reproducible then that is probably going to cover most. Would be interesting to provide an easy way to disallow non-reproducible packages.
I'm sure one day they will be able to make it a requirement for inclusion into the official repos.
There's an interesting thought - in addition to aiming for 99% of all packages, perhaps it would be a good idea to target 100% of the packages that, say, land in the official install media? (I wouldn't even be surprised if they already meet that goal TBH, but making it explicit and documenting it has value)
I would still much prefer playing 100:1 Russian roulette than 1:1, if those are my options.
"All I see is 1% of complete failure" --Bad Dads everywhere
Linux folks continue with running away with package security paradigms while NPM, PyPI, cargo, et. al. (like that VSCode extension registry that was on the front page last week) think they can still get away with just shipping what some rando pushes.
I have observed a sharp disconnect in the philosophies of 'improving developer experience' and 'running a tight ship'.
I think the last twenty years of quasi-marketing/sales/recruiting DevRel roles have pushed a narrative of frictionless development, while on the flip side security and correctness have mostly taken a back seat (special industries aside).
I think it's a result of the massive market growth, but I so welcome the pendulum swinging back a little bit. Typo squatting packages being a concern at the same time as speculative execution exploits shows mind bending immaturity.
I think that's a consequence of programmers making tools for programmers. It's something I've come to really dislike. Programmers are used to doing things like editing configurations, setting environment variables, or using custom code to solve a problem. As a result you get programs that can be configured, customized through code (scripting or extensions), and little tools to do whatever. This is not IMHO how good software should be designed to be used. On the positive side, we have some really good tooling - revision control in the software world is way beyond the equivalent in any other field. But then git could be used in other fields if not for it being a programmers tool designed by programmers... A lot of developers even have trouble doing things with git that are outside their daily use cases.
Dependency management tools are tools that come about because it's easier and more natural for a programmer to write some code than solve a bigger problem. Easier to write a tool than write your own version of something or clean up a complex set of dependencies.
"Security" and "Convenience" is always a tradeoff, you can never have both.
I've seen this more formalized as a triangle, with "functionality" being the third point: https://blog.c3l-security.com/2019/06/balancing-functionalit...
You can get secure and easy-to-use tools, but they typically have to be really simple things.
Ture. Then the Convenience folks don't understand why the rest of us don't want the things they think are so great.
There are good middle grounds, but most package managers don't even acknowledge other concerns as valid.
It's not quite a straight trade; IIRC the OpenBSD folks really push on good docs and maybe good defaults precisely because making it easier to hold the tool right makes it safer.
This is obvious, the question here is why everybody traded security for convenience and what else has to happen for people to start taking security seriously.
Regarding "what else has to happen": I would say something catastrophic. Nothing comes to mind recently.
Security is good, but occasionally I wonder if technical people don't imagine fantastic scenarios of evil masterminds doing something with the data and manage to rule the world.
While in reality, at least the last 5 years there are so many leaders (and people) doing and saying so plainly stupid that I feel we should be more afraid of stupid people than of hackers.
In the last 5 years several major medical providers have had sensitive person data of nearly everyone compromised. The political leaders are biggest problem today, but that could change again.
And what is the actual impact? Don't get me wrong, I don't think it is not bad, but then again abusing information could be done already by the said providers (ex: hike insurance rates based on previous conditions, taking advantage of vulnerable people).
Society works by agreements and laws, not by (absolute) secrecy.
There are of course instances like electrical grid stopping for days, people being killed remotely in hospitals, nuclear plants exploding, that would have a different impact and we might get there, just that it did not happen yet.
The actual impact is that your private medical data is in the hands of thieves, which most people don’t want.
It’s similar to how most people are distressed after a break-in, because they considered their home to be a private space, even though the lock manufacturer never claimed 100% security (or the thieves simply bypassed the locks by smashing a window).
Agreements and laws don’t solve that problem, because thieves already aren’t stopped by those.
>the question here is why everybody traded security for convenience
I don't think security was traded away for convenience. Everything started with convenience, and security has been trying to gain ground ever since.
>happen for people to start taking security seriously
Law with enforced and non-trivial consequences are the only thing that will force people to take security seriously. And even then, most probably still wont.
I think the opposite is mostly true. Linux packaging folks are carefully sculpting their toys, while everyone else is mostly using upstream packages and docker containers to work around the beautiful systems. For half the software I care about on my Debian system, I have a version installed either directly from the web (curl | bash style), from the developer's own APT repo, or most likely from a separate package manager (be it MELPA, pypi, Go cache, Maven, etc).
I use nix package manager on 3 of the systems I'm working daily (one of them HPC cluster) and none of them run NixOS. It's possible to carefully sculpt one's tools and use latest and greatest.
That sounds like an incredibly annoying way to manage software. I don't have a single thing installed that way.
distros get unbelievable amounts of hate for not immediately integrating upstream changes, there's really no winning
Rightly so. The idea that all software should be packaged for all distros, and that you shouldn't want to use the latest version of software is clearly ludicrous. It only seems vaguely reasonable because it's what's always been done.
If Linux had evolved a more sensible system and someone came along and suggested "no actually I think each distro should have its own package format and they should all be responsible for packaging all software in the world, and they should use old versions too for stability" they would rightly be laughed out of the room.
> The idea that […] you shouldn't want to use the latest version of software is clearly ludicrous.
To get to that world, we developers would have to give up making breaking changes.
We can’t have any “your python 2 code doesn’t work on python 3” nonsense.
Should we stop making breaking changes? Maybe. Will we? No.
You can have both python 2 and python 3 installed. Apps should get the dependencies they request. Distros swapping out dependencies out from under them has caused numerous issues for developers.
We can’t have any “your python 2 code doesn’t work on python 3” nonsense
This only happens because distros insit on shipping python and then everyone insisted on using that python to run their software.
In an alternate world everybody would just ship their own python with their own app and not have that problem. That's how windows basically solves this
Which sounds good until you run out of disk space.
Of course I grew up when hard drives were not affordable by normal people - my parents had to save for months to get my a floppy drive.
What is a distribution but a collection of software packaged in a particular way?
a miserable pile of software packages </sotn>
nixpkgs packages pretty much everything I need. It’s a very large package set and very fresh. It’s mostly about culture and tooling. I tried to contribute to Debian once and gave up after months. I was contributing to nixpkgs days after I started using Nix.
Having every package as part of a distribution is immensely useful. You can declaratively define your whole system with all software. I can roll out a desktop, development VM or server within 5 minutes and it’s fully configured.
> nixpkgs packages pretty much everything I need.
Yeah, because they allow anyone to contribute with little oversight. As Lance Vick wrote[1], "Nixpkgs is the NPM of Linux." And Solène Rapenne wrote[2], "It is quite easy to get nixpkgs commit access, a supply chain attack would be easy to achieve in my opinion: there are so many commits done that it is impossible for a trustable group to review everything, and there are too many contributors to be sure they are all trustable."
[1] https://news.ycombinator.com/item?id=34105784
[2] https://web.archive.org/web/20240429013622/https://dataswamp...
Pretty much every PR does get reviewed before merging (especially of non-committers) and compromises would be easy to detect in the typical version bump PR. At least it's all out in the open in a big monorepo. E.g. in Debian, maintainers could push binaries directly to the archive a few years ago (I think this is still true for non-main) and IIRC even for source packages people upload them with little oversight and they are not all in open version control.
Of course, Debian developers/maintainers are vetted more. But an intentional compromise in nixpkgs would be much more visible than in Debian, NPM, PyPI or crates.io.
The problem with NixOS (Nix?) is that it is currently embroiled in a culture war. Just to give an example, someone made an April's Fools joke about Elon Musk sponsoring NixOS and they got their post muted and (almost?) caught a suspension.
There is currently a gazillion forks, some being forks of forks because they weren't considered culturally pure enough for the culturally purged fork.
Hopefully Determinate Systems or Ekela can get some real maturity and corporate funding into the system and pull the whole thing out of the quagmire.
Yes Nix is forked. There were some attempts to fork nixpkgs, but they failed because it's far too much work to maintain something like nixpkgs. Nix being forked is less of an issue, because at the very least everyone needs to be compatible with nixpkgs. I think it's ok to have multiple implementations, they lead to exploration of new directions (e.g. one of the projects attempts to reimplement Nix in Rust). Well and of course, Guix started as a fork.
I agree that the infighting is not nice. But to be honest, when you just use NixOS and submit PRs, you do not really notice them. It's not like people are fighting them in the actual PRs to nixpkgs.
Some of the more drawn-out arguments are in or about the RFCs repo, though, so it's not just chat platforms or forum where the fights break out. To your point the RFC repo is also something that not every would-be contributor would touch or need to.
I agree, but it makes me feel supremely uneasy to use something where the main team does Soviet-style purges where they mute/exile/ban community members not because they were being offensive but because they committed wrongthink. Even worse is that they're often prolific devs.
Ironically enough the closest comparison I could make is driving a Tesla. Even if the product is great, you're supporting an organisation that is the opposite.
I think the Nix team will continue to slowly chase away competent people until the rot makes the whole thing wither, at which point everyone switches their upstream over to Determinate Systems' their open core. Although I'm hoping DS will ultimately go the RHEL-Fedora route.
>Just to give an example, someone made an April's Fools joke about Elon Musk sponsoring NixOS and they got their post muted and (almost?) caught a suspension.
This can't be real. Are you sure it was something innocuous and not something bigoted?
Real as you or me, though I disagree with almost everything else the quoted poster expressed other than that forks are happening and the dividing lines include cultural norms rather than strictly technical disagreements.
https://discourse.nixos.org/t/breaking-doge-to-recommend-nix...
Same author quoted the original text on their Reddit thread and was mostly uncriticized there:
https://old.reddit.com/r/NixOS/comments/1joshae/breaking_dog...
I personally found it incredibly distasteful and also fairly representative of the quality of conversation you often get from some of the Nix community. I'm not offensive, you're just thin skinned, can't you take a joke, etc. is extremely common. You'll have to judge for yourself whether it's bigoted or dog whistle or neither.
I'm a former casual community member with modest open source work in that ecosystem (projects and handful of nixpkgs PRs) before I left permanently last spring. I no longer endorse its use for any purpose and I seek to replace every piece of it that I was using.
I still hear about the ways they continue to royally fuck up their governance and make negligible progress on detoxifying the culture. It took them until last fucking week to ban Anduril from making hiring posts on the same official forum.
Am I supposed to seriously believe people found that so offensive that the poster was banned for that? Were those crying, feigning offence, being satirical themselves? Truly Poe's Law at work here.
>I personally found it incredibly distasteful
How? Why? It's clearly satire, written in the style of The Onion.
>and also fairly representative of the quality of conversation you often get from some of the Nix community
Good satire? At least some members aren't brainrotted out to the point of no return.
> I'm not offensive, you're just thin skinned, can't you take a joke, etc.
It's clearly not offensive and if that upset you, you clearly have thin skin and can't take the blandest of jokes. Histrionic.
>I no longer endorse its use for any purpose and I seek to replace every piece of it that I was using.
I will also tell others not to use Nix after reading that. The community is indeed too toxic.
>I still hear about the ways they continue to royally fuck up their governance and make negligible progress on detoxifying the culture.
They won't detoxify until they remove all the weak neurotic leftist activists with weird fetishes for "underrepresented minorities."
>It took them until last fucking week to ban Anduril from making hiring posts on the same official forum.
I'm not sure who that is or why it's an issue, but I assume it's something only leftists cry about.
There are too many distributions / formats. However distribution package are much better than snap/flatpack/docker for most uses, the only hard part is there are so many that no program can put "and is integrated into the package manager" in their release steps. You can ship a docker container in your program release - it is often done but rarely what should be done.
For some cases maybe that makes sense, but in a very large percentage it does not. As example, what if I want to build and use/deploy a Python app that needs the latest NumPy, and the system package manager doesn’t have it. It would be hard to justify for me to figure out and build a distro specific package for this rather than just using the Python package on PyPI.
The point is the distro should provide the numpy you need.
And what happens when they can't do that because you need the latest major version with specific features?
They do not. Even the vast majority of Arch users thinks the policy of "integrate breakage anyway and post the warning on the web changelog" is pants-on-head insane, especially compared to something like SUSE Tumbleweed (also a rolling distro) where things get tested and will stay staged if broken.
> They do not.
I present to you sibling comment posted slightly before yours: https://news.ycombinator.com/item?id=43655093
They do.
No, they do not. No sane person would say "I would rather integrate a package with a breaking change without explicit build warning / deprecation".
You're the one who has added "breaking." The original claim was just that integrators get hate for not keeping up with upstreams, which is true. Many changes aren't breaking; and users don't know which ones are or aren't anyway.
No, your claim was that integrators get hate for not immediately integrating upstream changes. Which is patently false, because users would rather have the packages tested for breaking changes and either patched or have a deprecation warning, rather than having the changes blindly integrated and having their legs pulled out from under them by a breaking change. No one is hating on distro packagers for taking a slight delay to verify and test things.
I brought up Arch because they get a lot of hate for exactly doing that and consequently pulling people's legs out from under them.
The existence of some users desiring stability does not in any way contradict the claim that maintainers get hate for not immediately integrating upstream changes. In particular, distros have more than 2 users -- they can do different things.
Distros get real hate for being so out of date that upstream gets a stream of bug reports on old and solved issues.
Prime example of this is what the Bottles dev team as done.
It isnt an easy problem to solve.
That's, in my experience, mostly Debian Stable.
The future is not evenly distributed.
Shipping what randos push works great for iOS and Android too.
System perl is actually good. It's too bad the Linux vendors don't bother with system versions of newer languages.
System rustc is also good on Arch Linux. I think system rustc-web is also fine on Debian.
I recently re-installed and instead of installing rustup I thought I'd give the Arch rust package a shot. Supposedly it's not the "correct" way to do it but so far it's working great, none of my projects require nightly. Updates have come in maybe a day after upstream. One less thing to think about, I like it.
Sure, there's never malware on the stores...
> Shipping what randos push works great for iOS and Android too.
App store software is excruciatingly vetted, though. Apple and Google spend far, far, FAR more on validating the software they ship to customers than Fedora or Canonical, and it's not remotely close.
It only looks like "randos" because the armies of auditors and datacenters of validation software are hidden behind the paywall.
It's really not. At least on Android they have some automated vetting but that's about it. Any human vetting is mostly for business reasons.
Also Windows and Mac have existed for decades and there's zero vetting there. Yeah malware exists but its easy to avoid and easily worth the benefit of actually being able to get up-to-date software from anywhere.
The vetting on Windows is that basically any software that isn't signed by an EV certificate will show a scary SmartScreen warning on users' computers. Even if your users aren't deterred by the warning, you also have a significant chance of your executable getting mistakenly flagged by Windows Defender, and then you have to file a request for Microsoft to whitelist you.
The vetting on Mac is that any unsigned software will show a scary warning and make your users have to dig into the security options in Settings to get the software to open.
This isn't really proactive, but it means that if you ship malware, Microsoft/Apple can revoke your certificate.
If you're interested in something similar to this distribution model on Linux, I would check out Flatpak. It's similar to how distribution works on Windows/Mac with the added benefit that updates are handled centrally (so you don't need to write auto-update functionality into each program) and that all programs are manually vetted both before they go up on Flathub and when they change any permissions. It also doesn't cost any money to list software, unlike the "no scary warnings" distribution options for both Windows and Mac.
> Also Windows and Mac have existed for decades and there's zero vetting there.
Isn't that only for applications? All the system software are provided and vetted by the OS developer.
Yeah but distro package repositories mostly consist of applications and non-system libraries.
Most users only have an handful of applications installed. They may not be the same one, but flatpak is easy to setup if you want the latest version. And you're not tied to the system package manager if you want the latest for CLI software (nix, brew, toolbox,...)
The nice thing about Debian is that you can have 2 full years of routine maintenance while getting reading for the next big updates. The main issue is upstream developer having bug fixes and features update on the same patch.
>App store software is excruciatingly vetted, though. Apple and Google spend far, far, FAR more on validating the software they ship to customers than Fedora or Canonical, and it's not remotely close.
Hahah! ...they don't. They really don't, man. They do have procedures in place that makes them look like they do, though; I'll give you that.
I often see initiatives and articles like this but no mention of Nix. Is it just not well known enough for comparison? Because to me that’s the standard.
I use Nix extensively, but the Nix daemon doesn't do much of use that can't be achieved by building your code from a fixed OCI container with internet turned off. The latter is certainly more standard across the industry, and sadly a lot easier too. Nix is not a revolutionary containerisation technology, nor honestly a very good one.
The value in Nix comes from the package set, nixpkgs. What is revolutionary is how nixpgks builds a Linux distribution declaratively, and reproducibly, from source through purely functional expressions. However, nixpkgs is almost an entire universe unto itself, and it is generally incompatible with the way any other distribution would handle things, so it would be no use to Fedora, Debian, and others
At work we went back to a Docker build to make reproducible images. The primary reason is poor cross-compilation support in Nix on Arm when developers needed to compile for an amd64 service and derive image checksums that are put into tooling that are run locally for service version verification and reproducibility.
With Docker it turned out relatively straightforward. With Nix even when it runs in Linux Arm VM we tried but just gave up.
Funny, I had that experience with Docker - mostly due to c++ dependencies - that were fine in Nix
In the near term it makes more sense to position nix as a common interface between app developers and distro maintainers and not as a direct-to-user way to cut their distro maintainers out of the loop entirely (although it is quite useful for that).
Ideally, a distro maintainer would come across a project packaged with nix and think:
> Oh good, the app dev has taken extra steps to make life easy for me.
As-is, I don't think that's the case. You can add a flake output to your project which builds an .rpm or a .deb file, but it's not commonly done.
I'm guessing that most of the time, distro maintainers would instead hook directly into a language specific build-tool like cmake or cargo and ignore the nix stuff. They benefit from nix only indirectly in cases where it has prevented the app dev from doing crazy things in their build (or at least has made that crazyness explicit, versus some kind of works-on-my-machine accident or some kind of nothing-to-see here skulduggery).
If we want to nixify the world I think we should focus less on talking people out of using package managers which they like and more on making the underlying packages more uniform.
Oh, I assure you, it's hard to escape knowing about Nix if you write about this sort of thing. Someone will be along almost immediately to inform you about it.
Nix wasn't mentioned (I'm the author) because it really isn't relevant here -- the comparable distributions, when discussing what Fedora is doing, are Debian and other distributions that use similar packaging schemes and such.
I agree that NixOS/nixpkgs would not be a good a basis for comparison. Do you have an opinion about the use of nix by app devs to specify their builds, i.e. as a make alternative, not as a Fedora alternative?
Quoting the article:
> Irreproducible bits in packages are quite often "caused by an error or sloppiness in the code". For example, dependence on hardware architecture in architecture-independent (noarch) packages is "almost always unwanted and/or a bug", and reproducibility tests can uncover those bugs.
This is the sort of thing that nix is good at guarding against, and it's convenient that it doesn't require users to engage with the underlying toolchain if they're unfamiliar with it.
For instance I can use the command below to build helix at a certain commit without even knowing that it's a rust package. Although it doesn't guarantee all aspects of repeatability, it will fail if the build depends on any bits for which a hash is not known ahead of time, which gets you half way there I think.
Used in this way, can nix help Fedora's reproducibility efforts? Or does it appear to Fedora as a superfluous layer to be stripped away so that they can plug into cargo more directly?Used in this way, can nix help Fedora's reproducibility efforts? Or does it appear to Fedora as a superfluous layer to be stripped away so that they can plug into cargo more directly?
A lot of Nix-based package builds will burn Nix store paths directly into the binary. If you are lucky it's only the rpath and you can strip it, but in some cases other Nix store paths end up in the binary. Seems pretty useless to Fedora.
Besides many of the difficult issues are not solved by Nix either. (E.g. build non-determinism by ordering differences due to the use of a hashmap somewhere in the build.)
> A lot of Nix-based package builds will burn Nix store paths directly into the binary
I didn't know that, sounds like a bug. Maybe something can be done to make it easier to know that this is the case for your build.
I'd still think that by refusing to build things with unspecified inputs, nix prunes a whole category of problems away which then don't bite the distro maintainers, but maybe that's wishful thinking.
I'll continue to use it because it's nice to come to a project I haven't worked on in a few years and not have to think about whether it's going to now work on this machine or figure out what the underlying language-specific commands are--but if there were ways to tweak things so that others have this feeling also, I'd like to know them.
I didn't know that, sounds like a bug. Maybe something can be done to make it easier to know that this is the case for your build.
It's a feature. E.g. if a binary needs to load data files, it needs to know the full path, or you are back to an FHS filesystem layout (which has a lot of issues that Nix tries to solve).
I'd still think that by refusing to build things with unspecified inputs,
I haven't followed development of traditional Linux distributions, but I am pretty sure that they also build in minimal sandboxes that only contain specified dependencies. See e.g. Mock: https://github.com/rpm-software-management/mock
Oh I'm sure they do build in sandboxes... my point is that those builds are more likely to succeed on the first try if the app dev also built in a sandbox. If you let nix scold you for tempting fate and relying on the output of an unchecked curl, that's one fewer headache for whoever later tries to sandbox your stuff.
Contrary to popular opinion, Nix builds aren't reproducible: https://luj.fr/blog/is-nixos-truly-reproducible.html
That's such a weird characterization of this article, which (in contrast to other writing on this subject) clearly concludes (a) that Nix achieves a very high degree of reproducibility and is continuously improving in this respect, and (b) Nix is moreover reproducible in a way that most other distros (even distros that do well in some measures of bitwise reproducibility) are not (namely, time traveling— being able to reproduce builds in different environments, even months or years later, because the build environment itself is more reproducible).
The article you linked is very clear that both qualitatively and quantitatively, NixOS has made achieved high degrees of reproducibility, and even explicitly rejects the possibility of assessing absolute reproducibility.
NixOS may not be the absolute leader here (that's probably stagex, or GuixSD if you limit yourself to more practical distros with large package collections), but it is indeed very good.
Did you mean to link to a different article?
> NixOS may not be the absolute leader here (that's probably stagex, or GuixSD if you limit yourself to more practical distros with large package collections), but it is indeed very good.
Could you comment on how stagex is? It looks like it might indeed be best in class, but I've hardly heard it mentioned.
The Bootstrappable Builds folks created a way to go from only an MBR of (commented) machine code (plus a ton of source) all the way up to a Linux distro. The stagex folks built on top of that towards OCI containers.
https://stagex.tools/ https://bootstrappable.org/ https://lwn.net/Articles/983340/
What does high degree mean? Either you achieve bit for bit reproducibility of your builds or not.
If 99% of the packages in your repos are bit-for-bit reproducible, you can consider your distro to have a high degree of reproducibility.
And with even a little bit of imagination, it's easy to think of other possible measures of degrees of reproducibility, e.g.:
It's really not difficult to think of meaningful ways that reproducibility of different distros might be compared, even quantitatively.Sure, but in terms of absolute number of packages that are truly reproducible, they outnumber Debian because Debian only targets reproducibility for a smaller fraction of total packages & even there they're not 100%. I haven't been able to find reliable numbers for Fedora on how many packages they have & in particular how many this 99% is targeting.
By any conceivable metric Nix really is ahead of the pack.
Disclaimer: I have no affiliation with Nix, Fedora, Debian etc. I just recognize that Nix has done a lot of hard work in this space & Fedora + Debian jumping onto this is in no small part thanks to the path shown by Nix.
They are not.
Arch hovers around 87%-90% depending on regressions. https://reproducible.archlinux.org/
Debian reproduces 91%-95% of their packages (architecture dependent) https://reproduce.debian.net/
> Disclaimer: I have no affiliation with Nix, Fedora, Debian etc. I just recognize that Nix has done a lot of hard work in this space & Fedora + Debian jumping onto this is in no small part thanks to the path shown by Nix
This is completely the wrong way around.
Debian spearheaded the Reproducible Builds efforts in 2016 with contributions from SUSE, Fedora and Arch. NixOS got onto this as well but has seen less progress until the past 4-5 years.
The NixOS efforts owes the Debian project all their thanks.
From your own link:
> Arch Linux is 87.7% reproducible with 1794 bad 0 unknown and 12762 good packages.
That's < 15k packages. Nix by comparison has ~100k total packages they are trying to make reproducible and has about 85% of them reproducible. Same goes for Debian - ~37k packages tracked for reproducible builds. One way to lie with percentages is when the absolute numbers are so disparate.
> This is completely the wrong way around. Debian spearheaded the Reproducible Builds efforts in 2016 with contributions from SUSE, Fedora and Arch. NixOS got onto this as well but has seen less progress until the past 4-5 years. The NixOS efforts owes the Debian project all their thanks.
Debian organized the broader effort across Linux distros. However the Nix project was designed from the ground up around reproducibility. It also pioneered architectural approaches that other systems have tried to emulate since. I think you're grossly misunderstanding the role Nix played in this effort.
> That's < 15k packages. Nix by comparison has ~100k total packages they are trying to make reproducible and has about 85% of them reproducible. Same goes for Debian - ~37k packages tracked for reproducible builds. One way to lie with percentages is when the absolute numbers are so disparate.
That's not a lie. That is the package target. The `nixpkgs` repository in the same vein package a huge number of source archives and repackages entire ecosystems into their own repository. This greatly inflates the number of packages. You can't look at the flat numbers.
> However the Nix project was designed from the ground up around reproducibility.
It wasn't.
> It also pioneered architectural approaches that other systems have tried to emulate since.
This has had no bearing, and you are greatly overestimating the technical details of nix here. It's fundamentally invented in 2002, and things has progressed since then. `rpath` hacking really is not magic.
> I think you're grossly misunderstanding the role Nix played in this effort.
I've been contributing to the Reproducible Builds effort since 2018.
I think people are generally confused the different meanings of reproducibility in this case. The reproducibility that Nix initially aimed at is: multiple evaluations of the same derivations will lead to the same normalized store .drv. For a long time they were not completely reproducible, because evaluation could depend on environment variables, etc. But flakes have (completely ?) closed this hole. So, the reproducibility in Nix means that evaluating the same package set will lead to the same set of build recipes (.drvs).
However, this doesn't say much about build artifact reproducibility. A package set could always evaluate to the same drvs, but if all the source packages choose what to build based on random() > 0.5, then there is no of build artifacts at all. This type of reproducibility is spearheaded by Debian and Arch more than Nix.
Different notions of reproducible. This project cares specifically about bit-for-bit identical builds (e.g. no time stamps, parallel compile artifacts etc). Nix is more about being declarative and "repeatable" or whatever a good name for that would be.
Both notions are useful for different purposes and nix is not particularly good at the first one.
https://reproducible-builds.org/citests/
It's very, very complicated. It's so far past the maximum effort line of most linux users as to be in its own class of tools. Reproducibility in the imperative package space is worth a lot. Lots of other tools are built on RPM/DEB packages that offer similar advantages of Nix -- Ansible, for one. This is more of a "rising tide raises all boats" situation.
It's an article about Fedora, specifically.
Yes, I know that. But when talking about reproducible packages, we can and should learn from existing techniques. Is Fedoras system capable of this goal? Is it built the right way? Should it adopt an alternate package manager to achieve this with less headache?
"What about Nix tho?" is the "Rewrite it in Rust" of reproducible builds.
No other currently distro whether it be debian, fedora, or arch, or opensuse seems to want to switch up their design to match nix's approach.
They have too many people familiar with the current approaches.
That's not even really true. Hacker News readers seem to fixate heavily on the most popular desktop versions of Linux distros, but that is far from all they make. SUSE with SLE Micro and Edge Image Builder is a declarative build system for creating your own installer images. If you don't want SLE Micro, they even give you the Elemental Toolkit, which can be used to build your own custom distro in a declarative manner. I don't know much about Debian and Red Hat's approaches to this, but I know they have similar tooling. The "problem," if you want to call it that, is developers, if I may be uncharitable for a brief moment here, are amazingly solipsistic. The major Linux vendors are targeting enterprise edge computing use cases, whereas developers only ever think about the endpoint PCs they themselves use for development. Plenty of distro tooling will give you declaratively built, transactionally updated, immutable server images, but if they don't give you the same for your personal laptop, those efforts are invisible to the modal Hacker News commenter.
For what it's worth, there's also Guix, which is literally a clone of Nix but part of the GNU project, so it only uses free software and opts for Guile instead of a custom DSL for configuration. It wasn't a pre-existing distro that changed, of course.
Because Nix is a huge pain ramp up on and to use for anyone who is not an enthusiast about the state of their computer.
What will happen is concepts from Nix will slowly get absorbed into other, more user-friendly tooling while Nix circles the complexity drain
Nix is to Linux users what Linux is to normies.
This goal feels like a marketing OKR to me. A proper technical goal would be "all packages, except the ones that have a valid reason, such as signatures, not to be reproducible".
If you'd bothered to read:
```This definition excludes signatures and some metadata and focuses solely on the payload of packaged files in a given RPM:
The same LWN article says:
> The contents, however, should still be "bit-by-bit" identical, even though that phrase does not turn up in Fedora's definition.
So, according to the literal interpretation of the article, signatures inside the payload (e.g., files that are signed using an ephemeral key during the build, NOT the overall RPM signature) are still a self-contradictory area and IMHO constitute a possibly-valid reason for not reaching 100% payload reproducibility.
At Google SRE we often had very technical OKRs that were formulated with some 'number of 9s'. Like 99.9999% uptime or something like that. So getting two 9s of reproducibility seems like a reasonable first goal. I hope they will be adding more nines later.
As someone who dabbles a bit in the RHEL world, IIRC all packages in Fedora are signed. In additional the DNF/Yum meta-data is also signed.
IIRC I don't think Debian packages themselves are signed themselves but the apt meta-data is signed.
I learned this from an ansible molecule test env setup script for use in containers and VMs years ago; because `which` isn't necessarily installed in containers for example:
dnf reads .repo files from /etc/yum.repos.d/ [1] which have various gpg options; here's an /etc/yum.repos.d/fedora-updates.repo: From the dnf conf docs [1], there are actually even more per-repo gpg options: 1. https://dnf.readthedocs.io/en/latest/conf_ref.html#repo-opti...2. https://docs.ansible.com/ansible/latest/collections/ansible/... lists a gpgcakey parameter for the ansible.builtin.yum_repository module
For Debian, Ubuntu, Raspberry Pi OS and other dpkg .deb and apt distros:
signing-apt-repo-faq: https://github.com/crystall1nedev/signing-apt-repo-faqFrom "New requirements for APT repository signing in 24.04" (2024) https://discourse.ubuntu.com/t/new-requirements-for-apt-repo... :
> In Ubuntu 24.04, APT will require repositories to be signed using one of the following public key algorithms: [ RSA with at least 2048-bit keys, Ed25519, Ed448 ]
> This has been made possible thanks to recent work in GnuPG 2.4 82 by Werner Koch to allow us to specify a “public key algorithm assertion” in APT when calling the gpgv tool for verifying repositories.
This is a waste of time compared to investing in sandboxing which will actually protect users as opposed to stopping theoretical attacks. Fedora's sandbox capabilities for apps is so far behind other operating systems like Android that it is much more important of an area to address.
Defaulting to Android-style nanny sandboxing ("you can't grant access to your Downloads folder because we say so" etc.) is unlikely to go over well with the average Linux distro userbase.
Also, maximally opt-in sandboxes for graphical applications have been possible for a while. Just use Podman and only mount your Wayland socket + any working files.
>Defaulting to Android-style nanny sandboxing ("you can't grant access to your Downloads folder because we say so" etc.) is unlikely to go over well with the average Linux distro userbase.
If you market it that way. Plenty of Linux users say they care about security, don't want maleware, etc. This is a step towards those desires. Users have been conditioned to use tools badly to designed for security for decades so there will be some growing pains, but it will get worse the longer people wait.
>Just use Podman and only mount your Wayland socket + any working files.
This won't work for the average user. Security needs to be accessible.
> Just use Podman and only mount your Wayland socket + any working files.
If only it was that simple…
Why isn't it?
All files pokes a big hole, really you want portals. You probably also want audio. And perhaps graphics acceleration depending on the software, and other limited hardware access. And dbus (but sandboxed proxy access). Probably some global config from the host as well (fonts, themes, cursor). And…well maybe there’s more but the point is it depends on the software. Maybe it is that simply sometimes. Often it’s not for non-trivial applications.
I am yet to see a form of sandboxing for the desktop that is not:
a) effectively useless
or b) makes me want to throw my computer through the window and replace it with a 1990's device (still more useful than your average Android).
I think you have to do both sandboxing and this.
Both are good for security, but prioritization is important. The areas that are weakest in terms of security should get the most attention.
That assumes that there is a fixed amount of effort and resources to be split between the two things—that there is always opportunity cost.
When it comes to community efforts, it’s rarely the case that all things have opportunity cost—people who contribute effort for X would not have necessarily done so for Y.
The Fedora Project is of course not a purely community effort, so I don’t know exactly how that applies here. But just wanted to point out that prioritization and opportunity cost don’t always work like you suggested.
Flatpak, which Fedora Workstation uses by default, is already very similar in capabilities to Android's sandboxing system.
If you want security through compartmentalization, you should consider Qubes OS, my daily driver, https://qubes-os.org.
This only secures between vms. This side steps the problem and people can still easily run multiple applications in the same qube.
It's impossible to isolate applications inside one VM as securely as with Qubes virtualization. You should not rely on intra-VM hardening if you really care about security. Having said that, Qubes does provide ways to harden the VMs: https://forum.qubes-os.org/t/hardening-qubes-os/4935/3, https://forum.qubes-os.org/t/replacing-passwordless-root-wit....
People may want to have multiple apps work together. It makes sense to have security within a qube itself than to just declare it a free for all.
If the apps work together, they typically belong to the same security domain / trust level. Do you have examples when you still have to isolate them from each other?
Even if things are on the same trust level that doesn't mean that if one gets compromised I don't care that it affects the 2nd.
So just run them in different VMs?
Apart from that, any hardening in Fedora can be utilized inside a Fedora VM on Qubes. Qubes doesn't force you to use VMs with no isolation inside.
But then the files can't be shared.
Qubes has such functionality.
And you just deduced why sandboxing as it is implemented today is really pointless for the desktop .
I'm using Qubes as my daily driver on my desktop, so no.
What do you get from it? Specially considering that from above "programs that work together go in the same context"..
What I get from it: https://forum.qubes-os.org/t/whats-your-system-layout-like/8...
Two official examples of how one could benefit from Qubes:
https://www.qubes-os.org/news/2022/10/28/how-to-organize-you...
and
https://blog.invisiblethings.org/2011/03/13/partitioning-my-...
See also: https://forum.qubes-os.org/t/how-to-pitch-qubes-os/4499/15
> Fedora's sandbox capabilities for apps
Do you mean Flatpaks or something else?
Sure, that is one solution, but it still needs a lot of work both to patch up holes in it and to fix apps to be better designed in regards to security.
Don't let perfect be the enemy of good? Flatpak is not perfect, but it has steadily improved security over the years. When I first used Flatpak, most applications had their sandboxes wide-open (because they were not compatible with sandboxing). Nowadays a lot of Flatpaks have much more limited sandboxes.