What does HackerNews think of codeq?

Creates Datomic dbs from git repos

Language: Clojure

Codebase as Database: Turning the IDE Inside Out with Datalog | Oct 2022

This is a well composed idea. This reminds me slightly of (Rich's?) Codeq https://github.com/Datomic/codeq although codeq is only outlining code/scm relationships and not syntax trees etc. I think I was always hoping codeq would add something like this (for doing what you are doing to validate forms) but the input mechanism probably needed more hammock time

The Database Inside Your Codebase | Feb 2021

Good piece, tho not new ideas. Ton of references, two recent that immediately come to mind

https://github.com/src-d/guide https://github.com/Datomic/codeq

But work here goes back at least to the 1970s.

What comes after Git? | Dec 2020

Expand Context ↕

He is referring to codeq: https://github.com/Datomic/codeq which has a last commit on 2014. But there have been recent rumours (https://medium.com/@sfyire/can-codeq-2-solve-clojures-weakne...) that Hickey is working in a new version of it. Fingers crossed!

A Little Clojure | Apr 2020

Expand Context ↕

Yes, much of this space is fairly well trod, though completely agree the threads are hard to find. I'll try to point some out:

1. Clojure, top level forms, trees, and evaluation

Agree in part about top level evaluation rules in Clojure. It does seem, for instance, like the ns form should enclose the rest of the forms that comprise that namespace, rather than essentially doing something that is unusual for Clojure- silently changing what seems to be a global context.

When one digs a little deeper, however, there is a logic to those semantics. The main reason comes from the problem that the compiler faces in having to reconcile the use of a program thing- a symbol or name or variable or whatever-one-calls-the-named-elements that are used in a program- with the defining of that thing.

There are basically two approaches to this problem. The compiler can read all the source code, find all the definitions, then reread all the source code, and match the uses to those definitions- and only then inform the programmer if there is some problem where a use doesn't match or doesn't have a definition. Even with modern computers, for large programs, this is too expensive and time consuming.

What many compilers do instead is to require that any used names are defined "first". Clojure does this- it reads files from top to bottom, and it requires that any used names are defined earlier in the file.

This notion of earlier- this notion that things defined in a program have an ordering to them, not just in their execution but also in their composition- this is deep and pervasive, and puts the lie in the idea that a program is just a big tree.

One branch off this tree, so to speak, where the ordering in a file doesn't correspond to the compositional or execution ordering, is in the functional programming concept of monads.

2. Alternatives to file storage for program modules

It is a pretty old idea that files are a poor way of storing source code (sorry). There is a long train of work that Wikipedia summarizes poorly with almost no references under Source Code In Database: https://en.wikipedia.org/wiki/Source_Code_in_Database. The idea here is that persisting code in a data structure and providing more "intuitive" tools for editing is better than requiring humans to work in files.

(Microsoft even tried in the 1990s to roll out a version of Windows that used a database for pervasive structured storage, rather than a file system. This was a failure, and a lot has been written about it- google WinFS).

Plain text files have a lot of underappreciated ergonomic properties. Their use doesn't keep tools from utilizing clever data structures to assist in the management and authoring of code in files. The SCID work has ultimately found its way into the cool incremental helpers and structural editors that most IDEs use now (Cursive/Paredit for Clojure: https://cursive-ide.com/userguide/paredit.html)

Forgoing the text editing paradigm altogether takes you into the world of visual programming editors, which also have a long history. An influential player in the space from the early rise of personal computers was a product called ProGraph. This technique is also now pervasive in tools like Scratch, but also in big data where flow graphs for processing immense streams of data are often constructed using visual tools, for instance, Nifi.

3. Literate programming

Another thread is Literate Programming, originally invented by Don Knuth. The idea is that the most important consumers of a program are other humans, not the computer, so one should author using tools that create both an artifact that a human can read, with both prose and code interspersed- as well as the code itself for a compiler to consume. But the combined prose/code artifact is a better way for communicating to other humans about the semantics of a program, than just the code.

This is a particular endearing thread, and the tool called Marginalia in the Clojure world provides something of the experience that Knuth intended.

4. Version control and program semantics

Yet another relevant thread is in version control, where the inability of files to keep the history of a programming authoring process is addressed. Early in Clojure's life Rich Hickey created a tool called Codeq: https://github.com/Datomic/codeq that loaded a git repo- essentially a graph of changes to program files- into a Datomic database- where Datomic can be seen as a graph db.

There has been some more recent work to be able to run semantic queries on those graphs. This is immensely useful for looking for patterns of code that may have security problems. A company doing a lot of work in this space is called Source(d).

Another set of tools for mining version control comes from a company called Empear, started by a programmer named Adam Tornhill. His work- originally in Clojure- looks at things like patterns of paired changes across files. Cases where the same sections of code in the same files are changed in the same commits demonstrate high "coupling" and are a "code smell."

All of this is really about building and maintaining a semantic model from the syntactic artifacts, which is what I read you as being ultimately interested in. There's a lot more, but that's all I have time for now. Hope that's helpful.

How different are different diff algorithms in Git? | Mar 2020

Expand Context ↕

This reminds me of codeq, a clojure+datomic project that intended to move source control from lines of text to functions and expressions. From their introduction [0]:

  Backstory

  Programmer Sally: "So, what are you going to do today Bob?"
  Programmer Bob: "I'm not happy with the file baz.clj residing in my/ns. So I'm going to go to line 96 and change 2 to 42. I've been thinking about deleting line 124. If I have time, I'm also going to insert some text I've been working on at line 64."
  Programmer Sally: (what's wrong with Bob?)

  Short Story

  codeq ( 'co-deck') is a little application that imports your Git repositories into a Datomic database, then performs language-aware analysis on them, extending the Git model down from the file to the code quantum (codeq) level, and up across repos. By doing so, codeq allows you to:
  - Track change at the program unit level (e.g. function and method definitions)
  - Query your programs and libraries declaratively, with the same cognitive units and names you use while programming
  - Query across repos

I never got to use it, though, and it seems that there have been no more updates in the repo [1] since 6 years ago.

[0] https://blog.datomic.com/2012/10/codeq.html [1] https://github.com/Datomic/codeq

Ask HN: What will coding be like in 25 years? | May 2017

Expand Context ↕

If you were going to take a new try at building a system like this, do you think something like Datomic's codeq[1] would be useful? It allows for navigating code semantics across multiple projects using a database query language as the interface. I've never actually used it but maybe it could help for data storage of an intentional programming system?

Thanks for the links, this was the first time I've heard of intentional programming. Neat idea!

[1] https://github.com/Datomic/codeq

-2000 lines of code | Apr 2014

Expand Context ↕

I can almost guarantee that, if your code bases version history is too messy to find old code, then it will be even worse if you were to favor commenting out to deleting.

That said, I do see where you're coming from. Part of the issue is that we (well, most of us) don't have good ways of searching old code. There is Codeq ( https://github.com/Datomic/codeq ), which is prettydamncool™ ...hopefully we'll start to see more systems like it.

GetClojure: Tons of Searchable Clojure Examples | Jun 2013

Expand Context ↕

IRC logfiles, mostly. You're seeing examples of code snippets typed into IRC from the last 4-5 years run in a sandbox under Clojure 1.5.1. I also ran it over the ClojureDocs s-expressions. There are a lot of examples missing (defn, def, etc. for instance) due to the fact that I didn't want to inadvertently run something evil, but FWIW there are over 30k examples of sequence functions, JVM interop, etc. In a previous comment I mentioned that my plan is to add the ability to submit, edit, bookmark, and rate examples.

RE: GitHub projects tagged "Clojure": I think exploring Codeq (https://github.com/Datomic/codeq) would be a cool way to go about doing that kind of thing.

Codeq. Code quality as a service | Feb 2013

'codeq' is also the name of the static analysis tool released late last year

https://github.com/Datomic/codeq