so what's the difference with QA+pentesting+tdd?

what do you verify?

I'm not OP, and my comment is 7 days late, but I'll answer.

With formal verification, you make a specification, and then prove that your program adheres to the specification as written. This is usually done with a proof assistant to ensure the proof is sound. The sel4 microkernel is a good example. The kernel is written in C, but the specification and proofs are written in Isabelle, a proof assistant.

Specification and proofs: https://github.com/seL4/l4v

The microkernel itself: https://github.com/seL4/seL4

The main practical difference between testing and formal verification is that testing alone cannot guarantee correctness, i.e., the absence of errors in a program; it can only reveal errors, with no guarantee of revealing all of them. The only way to prove correctness is to write a proof of adherence to a specification.

I found a good example of this from an excerpt of the book Program Construction by Roland Backhouse. The code is in Pascal.

The problem the students had been set was to write a program that would compare two strings for equality. One student's solution was to assign the value true or false to a boolean equal as follows:

  equal := (string1.length = string2.length)
  if equal
  then for i := 1 to string1.length
      do equal := (string1.character[i] = string2.character[i])
The problem with this code is that it returns the value true whenever the two strings have equal length and their last characters are identical. For example, the two strings 'cat' and 'mat' would be declared equal because they both have length three and end in 't'.

The student was quite taken aback when I demonstrated the error. Indeed, upon further questioning, it emerged that the program had been tested quite systematically. First, it had been tested on several pairs of identical strings, and then on several pairs of strings of unequal length. Both these tests had produced satisfactory results. The final test had been to input several pairs of equal length but unequal strings, such as 'cat' and 'dog', or 'house' and 'river'.

This final test is interesting because it is possible to use simple probability theory to make a rough estimate of the chances of discovering the programming error. The details are not relevant here. Suffice it to say that, assuming pairs of words of equal length are generated with equal letter frequencies (that is, each letter in the alphabet is chosen with a probability of 1/26), there is only a one in three chance of discovering the error after ten tests; increasing the number of tests to 20 would still only yield a one in two chance, and one would need to perform over 50 tests to achieve a 90% chance of discovering the error. Finally, and most importantly, there is no certainty that one would ever discover the error, no matter how many tests have been performed.

This is a great illustration of how testing can't guarantee correctness. It's easy for someone who isn't a beginner programmer to see why the code is incorrect, but it wasn't obvious to the student; this shows that, like the beginner, if you ever write code that is hard for one of your skill level to write, and which is subtly incorrect, you may never actually realize it without extensive testing, and if it's not extensive enough, you may still never find the bug.