Zigler: Zig NIFs in Elixir

229 points by ksec 8 months ago

ihumanable 8 months ago

For anyone mystified about what a NIF is that doesn't want to go read the docs.

The BEAM VM (which is the thing that runs erlang / elixir / gleam / etc) has 3 flavors of functions.

- BIFs - Built-in functions, these are written in C and ship with the VM

- NIFs - Natively implemented functions, these are written in any language that can speak the NIF ABI that BEAM exposes and allows you to provide a function that looks like a built-in function but that you build yourself.

- User - User functions are written in the language that's running on BEAM, so if you write a function in erlang or elixir, that's a user function.

NIFs allow you to drop down into a lower level language and extend the VM. Originally most NIFs were written in C, but now a lot more languages have built out nice facilities for writing NIFs. Rust has Rustler and Zig now has Zigler, although people have been writing zig nifs for a while without zigler and I'm sure people wrote rust nifs without rustler.

hinkley 8 months ago

It’s important to note that while Erlang has protections against user code crashing an Erlang process and recovering, a faulty NIF can take down the entire virtual machine.
- kristoff_it 8 months ago
  
  There's a series of things that a NIF must do to be a good citizen. Not crashing is a big one, but also not starving the VM by never yielding (in case the NIF is long-running) is important, plus a few secondary things like using the BEAM allocator so that tooling that monitors memory consumption can see resources consumed by the NIF.
  The creator of Zigler has a talk from ElixirConf 2021 on how he made Zig NIFs behave nicely:
  https://www.youtube.com/watch?v=lDfjdGva3NE
- alberth 8 months ago
  
  Hence why Rustler is of so much interest since it provides more protections against this happening.
  Discord is a big Erlang + Rustler user.
  - johnisgood 8 months ago
    
    What kind of protections as opposed to Zigler?
    
    alberth 8 months ago
    
    Rust comes with memory safety.
    It's one less potential cause that might bring down the entire Erlang VM.
    
    hinkley 8 months ago
    
    SIGSEGV is a pretty common failure mode alright.
    
    el_oni 8 months ago
    
    Rustler catches panics before they crash the VM and raises them on the elixir side as an exception. So your process might crash but the vm wont
    
    kristoff_it 8 months ago
    
    That's a neat way to get corrupted state in your application, especially when users of said language don't realize that their language has exceptions.
    I wrote this recently about Go, but it equally applies to any Rust application that tries to recover from a panic.
    https://kristoff.it/blog/go-exceptions-unconvinced/
    
    ellroy 8 months ago
    
    I don't think this is right. The process will crash, and the Supervision strategy you are using will determine what happens from there. This is what the BEAM is all about. The thing with NIFs is that they can crash the entire VM if they error.
    
    MarcusE1W 8 months ago
    
    Erlang's (Elixirs) error management approach is actually "Let it crash"
    This is based on the acknowledgment that if you have a large number of longer running processes at some point something will crash anyway, so you may quite as well be good at managing crashes ;-)
    https://dev.to/adolfont/the-let-it-crash-error-handling-stra...
    
    foldr 8 months ago
    
    Yes, but that's not Rust's error management strategy. Most Rust code isn't written with recovery from panics in mind, so it can have unintended consequences if you catch panics and then retry.
    
    throwawaymaths 8 months ago
    
    This is terrible, actually. And I've run into it, causing a memory leak.
    
    filmor 8 months ago
    
    How so? The whole point of unwinding is to gracefully clear up on panics, how did it peak for you?
    It's also not like there is much of a choice here. Unwinding across FFI boundaries (e.g. out of the NIF call) is undefined behaviour, so the only other option is aborting on panics.
    
    throwawaymaths 8 months ago
    
    Yes. Abort early in unit tests, core dump so it never makes it to prod
    
    filmor 8 months ago
    
    The panic is converted to an Erlang error exception. You have to explicitly ignore it to make unit tests pass in spite of it.
    I am still interested in the situation you observed.
  - depr 8 months ago
    
    Are they really? Their projects don't look so active
    
    sodapopcan 8 months ago
    
    It’s pretty common in the Elixir ecosystem for these types of libraries to not change very much. Elixir itself doesn’t change too much so these libraries stay solid without needing frequent updates. It doesn’t mean people aren’t using them. Some libraries even put disclaimers that they are actively maintained even if they haven’t seen an update in a long time. It’s something that takes some getting used to for some people (including myself at one point).
    
    ellroy 8 months ago
    
    I will second this. I've been using multiple libraries in our production Elixir app that haven't been updated in the last five years. Elixir itself was declared as "stable" feature-wise years ago. It may be argued that the type system being introduced is not in-keeping with that, but not sure. Jose is a very cautious and diligent "benevolent dictator" and you get a lot of backward compatibility guarantees. Erlang is the same. Compared to what some people might be used to with churn in Node/React etc it is apples and oranges.
    
    sodapopcan 8 months ago
    
    The semantics can certainly be argued, but a type system is sort of on its own tier of as far as language features go. Most importantly, there is only going to be one backward incompatible change which is the spec syntax, otherwise it is just leveraging how we already write Elixir.
    
    ellroy 8 months ago
    
    Yes, I'm not worried about it. I've not been following it as closely as I'd like, but from what I've read the core team seems to be taking a very measured incremental approach with the type system.
    
    sodapopcan 8 months ago
    
    Yep! It's leveraging pattern-matching and guards as well as looking at code itself for inference.
    
    lamuswawir 8 months ago
    
    Erlang is not very backward compatible. At least not at OTP 22.
    
    photonthug 8 months ago
    
    > It’s pretty common in the Elixir ecosystem for these types of libraries to not change very much.
    This is kind of fascinating and seems worthy of more detailed study. I'm sure almost anything looks stable compared to javascript/python ecosystems, but would be interesting to see how other ecosystems with venerable old web-frameworks or solid old compression libraries compare. But on further reflection.. language metrics like "popularity" are also in danger of just quantifying the churn that it takes to keep working stuff working. You can't even measure strictly new projects and hope that helps, because new projects may be a reaction to perceived need to replace other stuff that's annoyingly unstable over periods of 5-10 years, etc.
    Some churn is introduced by trying to keep up with a changing language, standard lib, or other dependencies, but some is just adding features forever or endlessly refactoring aesthetics under different management. Makes me wish for a project badge to indicate a commitment like finished-except-for-bugfixes.
    
    toast0 8 months ago
    
    Erlang (and friends) are built with a goal of stability. Operational stability is part of that, but it also comes into play with code and architectural stability.
    Maybe it's the functionalness, maybe it's the problem domains, but a lot of the modules have clear boundaries and end up with pretty small modules where the libraries end up having a clear scope and a small code base that moves towards being obviously correct and good for most and then doesn't have much changes after that. It might not work for everyone, but most modules don't end up with lots of options to support all the possible use cases.
    The underlying bits of OTP don't tend to churn too much either, so old code usually continues to work, unless you managed to have a dependency on something that had a big change. I recall dealing with some changes in timekeeping and random sources, but otherwise I don't remember having to change my Erlang code for OTP updates.
    It helps that the OTP team is supporting several major versions (annual releases) simultaneously, so if there's a lot of unneccessary change, that makes their job harder as well as everyone else's.
    
    alberth 8 months ago
    
    Elixir itself is "feature complete" as of 2019 (5-years now).
    https://elixir-lang.org/blog/2019/06/24/elixir-v1-9-0-releas...
    It does get the occasional updates, but it's mainly related to developer tooling than language enhancements.
    
    hmmokidk 8 months ago
    
    with the maybe exception of the type system
    
    the_duke 8 months ago
    
    Same in Java.
    You can find libraries that haven't been updated in 10 years and yet are still the best solution.
    
    sbuttgereit 8 months ago
    
    Yep. This is one reason I choose Elixir for a project. For a variety of use cases, long term stability is a big plus.
    
    depr 8 months ago
    
    That is not what I meant. I looked at sorted_set_nif which doesn't seem to compile on OTP 26 (we're at 27 now), and fastglobal which has a very old PR with 3 approvals has not been merged. Elixir libraries may not change _much_ but core libraries like telemetry, Ecto, ExDoc, Jason, still get either minor or patch releases all the time.
    If libraries get regular updates even if they are minor, it indicates they are in use. If they have inactive repositories and low hex.pm download numbers, they may have been abandoned which can mean you have to maintain it yourself in the future, or the people behind the library found it's not such a good idea after all. This doesn't have to be the case, which is why I asked.
    
    sodapopcan 8 months ago
    
    Ah ya, I do see how the optics of this could give off that impression. I don't use this library myself, but the issue is with Elixir 1.15.7 & OPT 26.1.26 which is VERY different than "It doesn't work on OTP 26." Certain patch versions of Elixir and OTP have caused problems before (sorry, I don't have a citation) and this particular issue looks like it's related to dependencies not syncing up on the config change?
    I do think more libraries should give that little "We're still maintained" notice as people not totally ingrained in this might not realize. To some, the fact that there have been no issues reported now that we're on OPT 27 and Elixir 17 would be an indicator that all is well.
    
    filmor 8 months ago
    
    Rustler wasn't properly forward compatible (only with regard to the build process, a compiled library will work just fine on any newer OTP) until 0.29. They are using 0.22, upgrading Rustler will be enough to get rid of this issue for all future OTP versions.
    
    sodapopcan 8 months ago
    
    Thank you for the full story here as I just gave the issue a cursory glance. As someone quite ingrained in Elixir, I see an issue referencing specific patch versions of Elixir and OTP and immediately understand it's very specifically targeting that specific Elixir/OTP combo. But depr brings up a good point that not everyone is immediately going to understand this, especially newcommers to the language and it’s generally hard not to just read the headline.
    
    andy_ppp 8 months ago
    
    Yeah I was trying to explain this to another developer that packages end up being “finished” eventually and seem to continue to work exceptionally well without updates for a really long time.
    Something about immutability and the structure of Elixir leads to surprisingly few bugs.
    
    ihumanable 8 months ago
    
    I wrote sorted_set_nif, the lack of activity isn’t a lack of care about the library but more just a reflection that the library is done.
    With data structures that have some definite behavior unless someone finds a defect there isn’t going to be much activity.
    
    depr 8 months ago
    
    I looked at that library and I found https://github.com/discord/sorted_set_nif/issues/30 which made me think it doesn't work on recent Erlang versions and is therefore not used anymore.
    
    johnisgood 8 months ago
    
    I wish more people knew this, but I feel like junior developers are just chasing stars on GitHub and wherever else.
  - drawnwren 8 months ago
    
    Is any of this code open source? As an outsider, I'm kind of at a loss for why anyone wants this or what you kids are doing over there and how offended I should be by it.
    
    jhgg 8 months ago
    
    https://github.com/discord/sorted_set_nif
    
    drawnwren 8 months ago
    
    Awesome! Thank you! My sarcasm got downvoted heavily (poe's law), but I was genuinely interested.
    
    alberth 8 months ago
    
    Do you mean Rustler?
    Yes, it's Apache 2.0
    https://github.com/rusterlium/rustler
    
    sodapopcan 8 months ago
    
    TL;DR: Erlang/Elixir/etc are high level languages and the virtual machine they run on, the BEAM, is optimized for speedy IO but is not so great when it comes to intensive CPU tasks. You'll want to write the latter in a good systems language which is what libraries like this provide (you get C bindings out of the box, I believe).
- jlkjfuwnjalfw 8 months ago
  
  Don't be like me and do a 20ms page fault in a NIF
bmitc 8 months ago

It's also important to point out ports, because as you mention, NIFs are a way to integrate external code. But as someone else points out, NIFs can crash the entire BEAM VM. Ports are a safer way to integrate external code because they are just another BEAM process that talks to an external program. If that program crashes, then the port process crashes just like any other BEAM process but it won't crash the entire BEAM VM.
- gioazzi 8 months ago
  
  And then there are port drivers which are the worst of both worlds! Can crash the BEAM and need much more ceremony than NIF to set up but they’re pretty nice to do in Zig[1] as well
  [1]: https://github.com/borgoat/er_zig_driver
  - bmitc 8 months ago
    
    That's true. Haha!
    There's another option and that's setting up an Erlang node in the other language. The Erlang term format is relatively straightforward. But I'm honestly not sure of the benefit of a node versus just using a port.
    
    throwawaymaths 8 months ago
    
    Node:
    - can "easily" send beam terms back and forth
    - if you want it to be os-supervised separately (systemd, kubernetes, e.g.)
    - pain in the ass
    Port:
    - easy
    - usually the only choice if you're not the software author
    - really only communicates via stdio bytestreams
    - risk of zombies if... Iirc the stdout is not closed properly?
    - kind of crazy how it works, Erlang VM spawns a separate process as a middleman
    
    jerf 8 months ago
    
    The Erlang term format is straightforward, but if you want to set up another node in another language you need to correctly implement/emulate process linking, binaries, and some other stuff too, it's not just a matter of writing a socket to accept and emit Erlang terms.
    It's not impossibly large but it's not something one does on a lark either; if there isn't support in your language already it's hard to justify this over any of the many, many message busses supported by both Erlang and other languages that don't have so many requirements.
- abrookewood 8 months ago
  
  Why would anyone use a NIF instead of a Port then?
  - toast0 8 months ago
    
    NIFs are great for things that really feel like a relatively quick function call.
    If you've got some mathematical/crypto function, chances are you don't want that to go through a command queue to an external port, because that's too much overhead. If it's a many round crypto function like bcrypt or something, you do need to be a bit careful doing it as a NIF because of runtime. But you wouldn't want to put a sha256 through an external program and have to pass all that data to it, etc.
    Something that you might actually want queueing for and is likely to have potential for memory unsafety like say transcoding with ffmpeg, would be a good fit as an external Port rather than a NIF or a linked in Port driver.
  - ellroy 8 months ago
    
    Ports are generally great, but you are running multiple apps and communicating between them using STDIN/STDOUT etc. There are certain corner cases where they might not be suitable. I had been using an OPCUA library where the logging had to be turned off because otherwise it was sending the logs back to our Elixir app and we were expecting Elixir terms. Also the shutdown of the remote end of a port can stop the data getting back to Elixir. There are ways around all of this but it's slightly annoying. In general though, ports work 80% of the time and are really convenient.
  - Cyph0n 8 months ago
    
    IPC/shared memory overhead?
    
    polvalente 8 months ago
    
    Yeap, this is a big one. In Nx we have some facilities for doing zero-copy stuff that only really work if you have, say, Evision and EXLA running on the same OS process.
    We do have IPC handles that could enable this over, say, ports, but then there's a whole other discussion on pointers vs ipc handles
cooljacob204 8 months ago

Do nifs have the equal process time stuff that regular elixir processes have? Where the BEAM will move the scheduler into another process if it's taking too long?
Forgive me if I'm mixing up my terminology it's been a bit since I have poked at Elixir.
- throwawaymaths 8 months ago
  
  You can write nifs that way but it seems like a pain in the ass
  https://www.erlang.org/doc/apps/erts/erl_nif#enif_schedule_n...
  After all, many of the BIFs have been replaced internally by NIFs
  And there's this, which would scare me:
  https://erlang.org/documentation/doc-15.0-rc3/erts-15.0/doc/...
- b3orn 8 months ago
  
  BEAM can't preempt native code, that's why NIFs should either be fast/low-latency to not excessively block the scheduler or be put in what's called a dirty scheduler which just means to run it in a separate thread.
- rubyn00bie 8 months ago
  
  Nope, at least not by default or like one would expect from pure Erlang (when it comes to preempting). Been a while since I dug into this admittedly but I write Elixir daily for work (and have for about ten years now). They don’t do the record keeping necessary for the BEAM to interrupt. You need to make sure the “dirty scheduler” is enabled or you can end up blocking other processes on the same scheduler.
  Here’s a link I found talking about using the dirty scheduler with Rust(ler): https://bgmarx.com/2018/08/15/using-dirty-schedulers-with-ru...
tommica 8 months ago

Unfortunately Haskell will never be able to have their version of "Zigler"...

derefr 8 months ago

Does anyone actually enjoy using these systems that encourage you to embed programming-language X code in programming-language Y heredocs?

I always find actually doing that — and then maintaining the results over time — to be quite painful: you don't get syntax highlighting inside the string; you can no longer search your worktree reliably using extension-based filtering; etc.

I personally find the workflow much more sane if/when you just have a separate file (e.g. `foo.zig`) for the guest-language code, and then your host-language code references it.

toast0 8 months ago

I've done some assembly in C, and for big functions, yeah, I want it in its own file, but smaller things often make sense to embed. I'm not sure if I'd like my nif code embedded into my erl files (assuming this works for Erlang as well), but it could conceivably make the nasty bit of boilerplate around ERL_NIF_INIT in the NIF (which I have to do in C anyway) and exit(nif_library_not_loaded) in the erl go away, which would be nice.
It's certainly possible to get syntax highlighting on the embedded code, but you'll need to work with your syntax highlighter; it certainly helps if you're not the only person using it.
But then again, I worked without syntax highlighting for years, so I'm happy when it works, but when it doesn't, I'm ok with that too.
devjab 8 months ago

I’m not too familiar with Elixir, but I definitely prefer building libraries in Zig and then consuming them in Python, TS, whatever over embedding them inside another language directly.
That being said, you can get IDE language support for embedded code if you use eMacs or vim (and probably other editors as well). As I mentioned I still vastly prefer separating it personally, especially if you don’t necessarily expect your Python or Typescript programmers to be knowledgeable about Zig (or C).
harrisi 8 months ago

Syntax highlighting here can work correctly, actually.
Also, I'm not sure why it's not better documented in Zigler, but you can also write the code in a separate file just fine.
- h0l0cube 8 months ago
  
  Links for anyone curious.
  > Syntax highlighting here can work correctly, actually.
  Highlighting shown here in the 2021 ElixirConf talk posted elsewhere in the comments:
  https://youtu.be/lDfjdGva3NE?t=2064
  > I'm not sure why it's not better documented in Zigler
  Here's the docs for it (though buried in the 'advanced' section)
  https://hexdocs.pm/zigler/Zig.html#module-importing-external...
systems 8 months ago

I initially agree
But, if all you do is write elixir wrappers around the zig function, to completely hide the foreign language functions, keeping both the wrapper and implementation in the same file, even if two different languages doesn't seem horrible, but again, keeping them in two file doesn't seem like a huge difference too
I think its really a matter of taste, both options viable
zelphirkalt 8 months ago

Actually literate programming might be a tool to get you syntax highlighting back. You could write one block of code in one language and the other one in another language and make one include the other in some place. Both blocks annotated to be their specific language, inside the prose. Emacs for example syntax highlights each block according to its corresponding programming language. It also allows you to edit blocks in separate buffers. Another way could be to switch the syntax highlighting of ones editor temporarily, but then syntax of the surrounding prose and other block might interfere.
travisgriggs 8 months ago

> Does anyone actually enjoy using these systems that encourage you to embed programming-language X code in programming-language Y heredocs?
Isn’t that essentially any web application?

lionkor 8 months ago

Completely lacking a description that made it clear, but basically, from what I can tell, this lets you embed Zig code inside Elixir code

harrisi 8 months ago

Zig is also used in an excellent way by burrito[0]. I've also used zig for compiling NIFs written in C/C++/Objective-C, since `zig cc` makes cross-compiling much nicer.

I wish zig got more use and attention in the Erlang ecosystem, but rustler seems more popular.

OkayPhysicist 8 months ago

Rustler is more popular because Rust solves one of the scarier bits about NIFs, the fact that irresponsible memory management in a NIF can kill the entire Erlang VM.
I can appreciate Zig for entire projects that would otherwise be written in C, but for the lengths of code that make sense for a NIF as opposed to a port, Zig seems like a strange point of failure to add to my system. If it's simple enough that I can be confident in my flawless manual memory management, I'd just use C, and for anything else, Rust is the far safer choice.
kansi 8 months ago

[0] https://github.com/burrito-elixir/burrito

kuon 8 months ago

I use zig a lot in elixir nif, for things like audio and video processing, it works great. But I do not use zigler as I prefer the code to live in their own codebases. But zigler is really nice and it provides an easy way to do computational heavy tasks in elixir.

kansi 8 months ago

> I use zig a lot in elixir nif, for things like audio and video processing

Sounds interesting, is it open source? I am interested in seeing how the code layout looks like when mixing Zig and Elixir

kuon 8 months ago

I don't have open source code base to share but here it how it looks like:

        // the_nif.zig

        fn init_imp(
            env: ?*erl.ErlNifEnv,
            argc: c_int,
            argv: [*c]const erl.ERL_NIF_TERM,
        ) !erl.ERL_NIF_TERM {
            if (argc != 0) {
                return error.BadArg;
            }

            return try helpers.make("Hello world");
        }

        export fn media_tools_init(
            env: ?*erl.ErlNifEnv,
            argc: c_int,
            argv: [*c]const erl.ERL_NIF_TERM,
        ) erl.ERL_NIF_TERM {
            return init_imp(env, argc, argv) catch |err|
                return helpers.make_error(env, err);
        }


        var funcs = [_]erl.ErlNifFunc{ erl.ErlNifFunc{
            .name = "init",
            .arity = 1,
            .fptr = media_tools_init,
            .flags = erl.ERL_NIF_DIRTY_JOB_CPU_BOUND, 
        } };

        var entry = erl.ErlNifEntry{
            .major = erl.ERL_NIF_MAJOR_VERSION,
            .minor = erl.ERL_NIF_MINOR_VERSION,
            .name = "Elixir.MediaTools.Stream",
            .num_of_funcs = funcs.len,
            .funcs = &funcs,
            .load = load,
            .reload = null,
            .upgrade = null,
            .unload = null,
            .vm_variant = "beam.vanilla",
            .options = 0,
            .sizeof_ErlNifResourceTypeInit = @sizeOf(erl.ErlNifResourceTypeInit),
            .min_erts = "erts-10.4",
        };

        export fn nif_init() *erl.ErlNifEntry {
            return &entry;
        }

        # the_exlixir_file.ex

        assert "Hello world" == MediaTools.Stream.init()

The "helpers" library is used to convert types to and from erlang, I plan on open sourcing it but it is not ready now. In the above example, the code is explicit but "entry" can be created with an helper comptime function. erl is simply the erl_nif.h header converted by zig translate-c.

I wrote a piece back in 2022, but things evolved a lot since then: https://www.kuon.ch/post/2022-11-26-zig-nif/

filmor 8 months ago

This won't work on Windows as the BEAM uses a slightly different NIF initialisation method there.
kansi 8 months ago

Thanks for sharing the post, it was intriguing. The detailed comments mentioned in `main.zig` and `build.zig` towards the end helped a lot.

Dowwie 8 months ago

Understand NIF risks: they can crash your entire Elixir Application, beyond their immediate supervision tree, because they operate in the same memory space as the BEAM itself.

NIF responsibly. :)

tayloramurphy 8 months ago

This is an incredibly off-topic, but hopefully a fun fact for folks here.

There was a popular motivational speaker in the 80's and 90's named Zig Ziglar[0]. He was influential on Tony Robbins and his career.

Just shows even how a randomly generated name [1] may not be so unique!

[0] https://en.wikipedia.org/wiki/Zig_Ziglar [1] https://en.wikipedia.org/wiki/Zig_(programming_language)#Ori...

psychoslave 8 months ago

Great! But, what is a nifs, please? :'D

jameskilton 8 months ago

Natively Implemented Functions
https://www.erlang.org/doc/system/nif.html
sangnoir 8 months ago

It's Elixir's[1] equivalent of a Foreign Function Interface.
1. More accurately, NIFs sre BEAM's take on FFI functions, and Elixir is a BEAM language.

rkangel 8 months ago

Isn't the "cimport" stuff in Zig going away at some point? Making this somewhat less useful.

G4BB3R 8 months ago

Are sigils (~) restricted to one char? To me seems ~Zig would be more clear and short enough.

Miner49er 8 months ago

Erlang sigils are not, they can be any length, limited to characters allowed in atoms.
Elixir sigils also allow multiple characters in the name, but chars after the first must be upper case, according to the docs.
So for Elixir, it would have to be something like ~zIG
- throwawaymaths 8 months ago
  
  According to the docs, must be all upper case:
  > Custom sigils may be either a single lowercase character, or an uppercase character followed by more uppercase characters and digits.
  https://hexdocs.pm/elixir/sigils.html
  - Miner49er 8 months ago
    
    Ah yeah, you're right.
- Muromec 8 months ago
  
  Wait, erlang has sigils?
  - com 8 months ago
    
    Yeah, there’s a bit of a developer experience push going on in erlang world, which is great!

nine_k 8 months ago

(Yo dawg, we put a niche language into a niche language so that...)

I wonder if the Zig code can be not written inline, as an option. With anything larger than a few lines, I'd want syntax highlighting, LSP support, navigation, etc. It's easier to achieve with one language per file.

harrisi 8 months ago

Yes, you can put Zig code in a separate file.