https://github.com/src-d/guide https://github.com/Datomic/codeq
But work here goes back at least to the 1970s.
1. Clojure, top level forms, trees, and evaluation
Agree in part about top level evaluation rules in Clojure. It does seem, for instance, like the ns form should enclose the rest of the forms that comprise that namespace, rather than essentially doing something that is unusual for Clojure- silently changing what seems to be a global context.
When one digs a little deeper, however, there is a logic to those semantics. The main reason comes from the problem that the compiler faces in having to reconcile the use of a program thing- a symbol or name or variable or whatever-one-calls-the-named-elements that are used in a program- with the defining of that thing.
There are basically two approaches to this problem. The compiler can read all the source code, find all the definitions, then reread all the source code, and match the uses to those definitions- and only then inform the programmer if there is some problem where a use doesn't match or doesn't have a definition. Even with modern computers, for large programs, this is too expensive and time consuming.
What many compilers do instead is to require that any used names are defined "first". Clojure does this- it reads files from top to bottom, and it requires that any used names are defined earlier in the file.
This notion of earlier- this notion that things defined in a program have an ordering to them, not just in their execution but also in their composition- this is deep and pervasive, and puts the lie in the idea that a program is just a big tree.
One branch off this tree, so to speak, where the ordering in a file doesn't correspond to the compositional or execution ordering, is in the functional programming concept of monads.
2. Alternatives to file storage for program modules
It is a pretty old idea that files are a poor way of storing source code (sorry). There is a long train of work that Wikipedia summarizes poorly with almost no references under Source Code In Database: https://en.wikipedia.org/wiki/Source_Code_in_Database. The idea here is that persisting code in a data structure and providing more "intuitive" tools for editing is better than requiring humans to work in files.
(Microsoft even tried in the 1990s to roll out a version of Windows that used a database for pervasive structured storage, rather than a file system. This was a failure, and a lot has been written about it- google WinFS).
Plain text files have a lot of underappreciated ergonomic properties. Their use doesn't keep tools from utilizing clever data structures to assist in the management and authoring of code in files. The SCID work has ultimately found its way into the cool incremental helpers and structural editors that most IDEs use now (Cursive/Paredit for Clojure: https://cursive-ide.com/userguide/paredit.html)
Forgoing the text editing paradigm altogether takes you into the world of visual programming editors, which also have a long history. An influential player in the space from the early rise of personal computers was a product called ProGraph. This technique is also now pervasive in tools like Scratch, but also in big data where flow graphs for processing immense streams of data are often constructed using visual tools, for instance, Nifi.
3. Literate programming
Another thread is Literate Programming, originally invented by Don Knuth. The idea is that the most important consumers of a program are other humans, not the computer, so one should author using tools that create both an artifact that a human can read, with both prose and code interspersed- as well as the code itself for a compiler to consume. But the combined prose/code artifact is a better way for communicating to other humans about the semantics of a program, than just the code.
This is a particular endearing thread, and the tool called Marginalia in the Clojure world provides something of the experience that Knuth intended.
4. Version control and program semantics
Yet another relevant thread is in version control, where the inability of files to keep the history of a programming authoring process is addressed. Early in Clojure's life Rich Hickey created a tool called Codeq: https://github.com/Datomic/codeq that loaded a git repo- essentially a graph of changes to program files- into a Datomic database- where Datomic can be seen as a graph db.
There has been some more recent work to be able to run semantic queries on those graphs. This is immensely useful for looking for patterns of code that may have security problems. A company doing a lot of work in this space is called Source(d).
Another set of tools for mining version control comes from a company called Empear, started by a programmer named Adam Tornhill. His work- originally in Clojure- looks at things like patterns of paired changes across files. Cases where the same sections of code in the same files are changed in the same commits demonstrate high "coupling" and are a "code smell."
==
All of this is really about building and maintaining a semantic model from the syntactic artifacts, which is what I read you as being ultimately interested in. There's a lot more, but that's all I have time for now. Hope that's helpful.
Backstory
Programmer Sally: "So, what are you going to do today Bob?"
Programmer Bob: "I'm not happy with the file baz.clj residing in my/ns. So I'm going to go to line 96 and change 2 to 42. I've been thinking about deleting line 124. If I have time, I'm also going to insert some text I've been working on at line 64."
Programmer Sally: (what's wrong with Bob?)
Short Story
codeq ( 'co-deck') is a little application that imports your Git repositories into a Datomic database, then performs language-aware analysis on them, extending the Git model down from the file to the code quantum (codeq) level, and up across repos. By doing so, codeq allows you to:
- Track change at the program unit level (e.g. function and method definitions)
- Query your programs and libraries declaratively, with the same cognitive units and names you use while programming
- Query across repos
I never got to use it, though, and it seems that there have been no more updates in the repo [1] since 6 years ago.[0] https://blog.datomic.com/2012/10/codeq.html [1] https://github.com/Datomic/codeq
Thanks for the links, this was the first time I've heard of intentional programming. Neat idea!
That said, I do see where you're coming from. Part of the issue is that we (well, most of us) don't have good ways of searching old code. There is Codeq ( https://github.com/Datomic/codeq ), which is prettydamncool™ ...hopefully we'll start to see more systems like it.
RE: GitHub projects tagged "Clojure": I think exploring Codeq (https://github.com/Datomic/codeq) would be a cool way to go about doing that kind of thing.