What advantage does a dataflow language have over a functional language? And how is laziness handled (i.e., does it happen that in a dataflow language, tokens are being sent unnecessarily)? Is it easy to perform memoization in a dataflow language (to avoid performing the same computation twice)?

I don't want to speak for all of dataflow-dom, but the main differences that I see (and exploit) are that data-parallel dataflow languages isolate control flow into small independent regions, making the larger computation data-driven. This does mean things are eager rather than lazy, but it also makes things much easier to parallelize (because of the independence) and much easier to incrementalize.[0]

[0]: https://github.com/frankmcsherry/differential-dataflow