What does HackerNews think of CompCert?

The CompCert formally-verified C compiler

Language: Coq

#13 in C
#12 in Compiler
This is a common property for proof-oriented languages. Coq shares this property for instance, and you can write an optimizing C compiler in Coq: https://github.com/AbsInt/CompCert .
Depends what you mean by "popular". seL4 [1] is formally verified, open source and is used as baseband firmware in millions of phones, among other uses. Google just recently started an embedded OS based on it as well.

CompCert [2] is a formally verified optimizing C compiler.

Most formal verification happens in compilers or low-level mission critical systems due to the cost.

If you want to write some formally verified C, you can check out Frama-C [3].

[1] https://sel4.systems/

[2] https://github.com/AbsInt/CompCert

[3] https://frama-c.com/

I'm a bit surprised by what you write.

Until recently I was heading a compiler team. We found it impossible to hire experienced compiler engineers despite paying very well, so we essentially switched to hiring promising university graduates and trained them on the job. My peers in other companies are all in the same situation.

Note that there is a mismatch between academic and industrial compiler work. A university compilers course is necessary, but not sufficient to be considered experienced, you should have good knowledge of things like ABI design, ELF, DWARF, LLVM's tablegen, C++, compilation of varargs etc. This stuff is not typically taught in compiler courses (for good reason, since it is a bit arbitrary) but it is absolutely vital to producing a working, industrial strength compiler. You should also have some idea about computer architecture, so you can read ISA manuals and understand what individual machine instructions do, and what performance characteristics they have. Otherwise you won't be able to write good instruction selection and scheduling. Given the increasing prevalence of JIT compilers (eg eBPF in the Linux kernel), it's also useful to know about that. Another dimension is security: modern compilers to need to worry about stack canaries, shadow stacks etc. This too is not taught at universities.

Regarding what tool/library/framework, skills are rather transferrable, so I would not worry too much about this. LLVM is by far the most widely used open framework and is easier to get into than other open stuff. If you are knowledgable about security and verification, doing hacking on e.g. CakeML [1] or CompCert [2] might be interesting.

A good way of being hired is to go to e.g. LLVM or GCC meetup, and make it clear that you are looking for a job. Put it on your Github that you are looking for a job. Put a link to an LLVM or GCC (or similar) compiler phase that you have written. The next LLVM dev meeting is November 8-9 in San Jose, CA, hang out there, rest assured there will be many companies seeking to hire. Bonus points if you give a presentation, even if it's whimsy like compiling Brainfuck to Intel 4004 ... Another avenue is RISCV startups which are currently 10-a-penny, they all need and hire compiler engineers. Cold-contact them ...

PS, since you are a PhD student. I suggest finishing your PhD in some form or other. A lot of companies are loath to hire PhD students before they finish, or make the job offer contingent upon successful graduation. That certainly was our policy. If you are not yet near graduation, I suggest to do a compiler related internship.

[1] https://cakeml.org/

[2] https://github.com/AbsInt/CompCert

I've used SPARK with GNAT Studio, I've used Frama-C, and I'm also experimenting with CBMC. Frama-C is way more powerful than SPARK, but also consequently has a much deeper learning curve.

SPARK does not allow doubly linked lists:

https://blog.adacore.com/pointer-based-data-structures-in-sp...

  For example, singly-linked lists and trees are supported whereas
  doubly-linked lists are not.
With Frama-C you can prove doubly linked lists and all manner of complicated pointer manipulating graph algorithms. It does not impose a Rust-like pointer ownership policy as does SPARK.

However, for embedded development, SPARK's restrictions are a good trade-off, as the more restrictive rules allow more proofs to be fully automated than with Frama-C and simplify diagnostic messages. A fly-by-wire avionics computer doesn't need to dynamically allocate a billion graph nodes. But SPARK is not "general purpose" like C with Frama-C is.

AdaCore's SPARK tool stack is not actually written in SPARK as far as I can see, much of it is actually OCaml and Coq/Gallina for the Why3 component also used by Frama-C. See all the .ml OCaml and .v Gallina source code for yourself:

https://github.com/AdaCore/why3

And of course the compiler backend for Ada/SPARK is GNU GCC, written in unverified C:

https://github.com/gcc-mirror/gcc/tree/master/gcc/config

Compare with CompCert, the formally verified C compiler:

https://github.com/AbsInt/CompCert

Frama-C unfortunately requires a user to be mathematician-logician logic programming expert to fully utilize. One can begin training in Coq/Gallina with the large free online Software Foundations course:

https://softwarefoundations.cis.upenn.edu/

OCaml is becoming a must learn language for those who want to take their C programming to the highest levels, because:

1. Frama-C, which enables writing bug-free C programs, is implemented using a combination of OCaml and the Coq proof assistant (which is itself implemented in OCaml):

https://frama-c.com/

2. CompCert, a formally verified C compiler, is implemented using OCaml and Coq:

https://github.com/AbsInt/CompCert

Nothing at this level yet exists in the Rust and Zig ecosystems, for example. Rust is a very complicated language with many features which makes formalization much harder than with C and OCaml, which both had mathematical ideas of simplicity, analyzability, and minimalism inspire their designs, even if they continue to grow into more complicated monsters, which, incidentally this OCaml 5.0 release will contribute towards. :-)

Disclaimer: I am not a domain expert in type theory and my background is algebraic combinatorics, not computer science. I also hate online courses, don’t like collaborative learning, and tend to read books and papers by myself. So I’m not a great source of advice for most people.

Unfortunately I don’t have many more suggestions that would be good pedagogically. Benjamin Pierce (who wrote Software Foundations) has another very famous intro book, Types and Programming Languages: https://www.cis.upenn.edu/~bcpierce/tapl/ which is regarded as a “Bible” of type theory for computer scientists. Note: I haven’t personally read this, but I have read the sequel, Advanced Types and Programming Languages. Both books have exercises and most exercises have solutions.

Simon Peyton-Jones wrote a very good book, the Implementation of Functional Programming Languages. It is old (1987), but free, a good read, and covers most of the “important” topics without assuming too much background: https://www.microsoft.com/en-us/research/publication/the-imp...

For formal verification - there are definitely better-qualified people than me. I found a lot of the source code to CompCert C (a verified C compiler) illuminating: https://github.com/AbsInt/CompCert However, it will be difficult to understand without doing Software Foundations first (I still find personally tactics-style proofs in Coq confusing and don’t like Coq as much as Idris or Agda).

I am very shy and can’t help you with the online communities :) The type theory subreddit is pretty active and they seem nice.

> Such as, in their draft goals [0], under S30200, they outline a memory allocator goal by referring to a proof.

By that argument every implementation of quicksort (say) would be using formal methods. But they aren't. There are some definitions of formal methods allowing this kind of "weaker" form, but for the sake of this discussion such methods do not prevent a particular class of bugs. Explicit goal statements are, while being a very good direction, informal.

> Though, if the parents goal was to imply that the whole of SQLite should be verified... I'm not convinced its entirely possible. Quite a lot of SQLite revolves around things around where the halting problem may come into play.

This is probably correct and I don't recommend to do so. Even CompCert [1], probably the single most verified C compiler, is not absolutely fully verified and of course there had been bugs in the unverified modules (codegen AFAIK). But formal methods can be partially applied to produce provably more correct code (in some metrics).

[1] https://github.com/AbsInt/CompCert

It was more like they are from a strange haskell realm.

Take a look at the ideal mathematical (verified) C compiler written in Coq. I find it neat:

https://github.com/AbsInt/CompCert

Oh, another C compiler! I like collecting these - thanks! :)

Question for anyone handing out unofficial legal opinions:

This one is dual-licensed. There are a small kerfuffle of custom licenses that say "no commercial use", but everything is also variously covered by the GPL (v2 or later), LGPL v3, and BSD 3-clause.

Ref: https://github.com/AbsInt/CompCert/blob/master/LICENSE

I'm curious if I could use this in a commercial setting if I perfectly comply with the various GPL and BSD licenses.

If I could, that could be a problem for this group, because the repo readme (at https://github.com/AbsInt/CompCert) explicitly states "this is not free software."

>or know of any large cleany written codebases from which to study?

* unison (a great file sync tool for win/posix from Benjamin Pierce, the author of Software foundations and TAPL)

https://github.com/bcpierce00/unison

* mirage (which also has a bunch of interesting subprojects, like irmin)

https://github.com/mirage

* coq

https://github.com/coq/coq

* flow

https://github.com/facebook/flow

* CompCert

https://github.com/AbsInt/CompCert

* frama-c

https://github.com/Frama-C

* 0install

https://github.com/0install/0install

The bug is in the compiler, not OpenSSH. The OpenSSH developers explicitly erased memory with memset. GCC decided to remove those lines without any warning. There's the bug. (Here come the language lawyers...)

What's the difference between GCC deleting parts of your code, and an attacker hacking into the source code repository and deleting those parts of the code?

The C standards committee addressed the problem in 2009 with memset_s. [0] The GNU developers reject patches and state they hope this feature is never implemented. [1]

The bug fix is to stop using GCC for sensitive code. Use CompCert instead. https://github.com/AbsInt/CompCert

[0] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1381.pdf

[1] https://sourceware.org/ml/libc-alpha/2014-12/threads.html#00...

It exists. It's called CompCert.

https://github.com/AbsInt/CompCert

It's free for non-commercial use. I've used it for several months now to build things like Tor. I haven't noticed any disadvantage compared to GCC or Clang.

Curious if the folks at OpenBSD have ever thought about giving CompCert-C a shake. I realize it's no gcc or llvm, but OpenBSD is no run of the mill *nix, either.

http://compcert.inria.fr/compcert-C.html

EDIT: See also https://github.com/AbsInt/CompCert