> aes(x='np.log(B - A)')

I never understood the pattern of putting source code in a string. Why not just use np.log(B - A) directly and configure the function to accept columns? With strings you lose highlighting, semantic analysis from editors, as well as the ability to know what computations are happening when and where. There seems to be no point and significant drawback to this, what's the rationale?

It's because R is lazily evaluated and Python is eagerly evaluated [1].

Eager evaluation is when the interpreter evaluates the argument np.log(B-A) BEFORE passing it into aes(). aes() can only see the resulting VALUE, not the EXPRESSION itself.

In contrast, lazy evaluation means that aes() gets the raw expression as an argument. It can evaluate it, pass it on to another function, or otherwise manipulate it (e.g. serialize it back to a string).

In the case of ggplot, this is used to actually evaluate the expression at DIFFERENT values, so you can plot it. Suppose you want to plot f(x) = square(x). It doesn't make any sense to write plot(square(x)), because if x = 5.0, you will get plot(25.0), and you can't make the plot. plot(lambda x: square(x)) make more sense, because then the plot() function can evaluate it with 100 different values of x to get 100 pixel values.

And it's also used to print the expression on the axes. That is, you actually want to print the expression "square(x)" on the graph. You don't want print "25.0" on the graph -- that makes no sense.

This is related to the concept of quotations in Lisp. Quotations are UNEVALUATED program fragments. In Lisp it's an AST, but in Python or any other dynamic language, it has to be a string. In C there is no way to do this (short of shelling out to the C compiler at runtime, which some people have actually done ...)

[1] R has crazy caveats, but that's beyond the scope of this post...

quantumtremor

So why not accept a lambda? I saw that http://stackoverflow.com/questions/334851/print-the-code-whi... gives you the source, though I'm not sure if it'll always work. This does work though:

  ~ λ echo "a = lambda x: x * 2" > test.py
  ~ λ py -i test.py               
  IPython 4.2.0 -- An enhanced Interactive Python.
  In [1]: a(2)
  Out[1]: 4
  In [2]: import inspect
  In [3]: inspect.getsource(a)
  Out[3]: 'a = lambda x: x * 2\n'

I'm definitely a big fan of using Lisp's quoting for code-as-data, but I feel stringifying it makes the problem even worse.

chubot

Yeah that's a good point. I haven't used this ggplot library, but it seems like it could use lambdas. And then you don't break syntax highlighting.

One other place I've seen this done is in the numexpr for Python.

https://github.com/pydata/numexpr

It does seem like this

    ne.evaluate('a*b-4.1*a > 2.5*b')

could be

    ne.evaluate(lambda a, b: a*b - 4.1*a > 2.5*b)

The lambda is never executed because it compiles to machine code and not Python byte code, but that shouldn't make a difference. You should still be able to use the AST of the body as input to the compiler.

And then a and b have to be pulled out of locals() automatically or something.