I tried the demo back when this was still called Coati. Very impressive, but not quite suitable for my current job. I didn't have time to fully evaluate it at the time, but these are my impressions from trying to use it on our codebase (I can't say exactly who we are due to NDAs).

My first criticism would be that the UI wastes a lot of space with all the curves and thick borders. Might be due to me using Vim most of the time, but I'm used to a higher text density. It also felt sluggish, but this might have been improved in the meantime, and also I used it on Linux which is notoriously bad for graphical responsiveness.

Second, the indexing was kind of slow. Our codebase is very far from the size of Chrome's, but it is a big commercial C++ project - something like 2M sloc (with comments). This was compounded by the fact that switching branches often led to nearly rebuilding a lot of files due to changes in often-used headers.

Third, it can only index one build configuration. Our single source tree is used to build several products and indexing only one build configuration is helpful, but often I need to know if changes necessary for product A will impact product B. It would be really nice if the indexing was split off into its own daemon and I could e.g. have three daemons looking over three build configurations on Linux, an additional remote daemon indexing the MacOS source, and further two running on a Windows host (possibly a VM). This might sound extreme and convoluted, but the kind of large C++ project where Sourcetrail would shine is the kind that has its own very opinionated idiosyncrasies.

Finally, it doesn't actually solve my most frequent use-case. Sourcetrail is great for browsing and understanding OOP structure of code. However, I am most often interested in exploring dataflow. I really want to know where a particular member of a struct is set, where the values in the expression come from, where those values are set, where the values in those expressions are from, etc. This can be accomplished with pretty much the same hierarchical interface that Sourcetrail currently has, but instead of classes and methods, the basic units should be expressions and values. Another useful feature would be "where is this value used" - say you have a member of a struct, or a method returning a constant value, and you want to know where the value of the member is used. Not where the member is used, because the value of the member is often copied around, but without being modified. I would really like something that can track through assignments to tell me where this value ends up. Right now Sourcetrail doesn't really cover this usecase better than Vim+ctags+rtags+ripgrep. Yes, this sounds a bit like "Dropbox is just sftp+rsync", but I couldn't make Coati work better for my usecase than my current setup.

I can't demand that dataflow analysis be implemented, because I can't promise that I'll use Sourcetrail even if it is, but a data-oriented view of code might be a worthwhile development to consider.

egraether

Sourcetrail dev here. Thanks for your extensive feedback!

A lot of things have changed since Sourcetrail was called Coati (about 2 years ago). We put a lot of work into improving indexing speed, handling multiple configurations/different languages within one project and reducing "sluggishness" in the UI. Sourcetrail runs now smoothly on code bases with multiple MLoC.

But I agree with your suggestion regarding data-flow analysis. That is what understanding unfamiliar source code often really comes down to. We also had some user requests to go that direction. We never really looked into this area so far, because it is a lot harder to collect the data (dynamic analysis) and it needs a whole new user interface.

While the data collection is solvable (Visual Studio debugger can do it, I think), I'm not sure whether it is really possible to come up with an effective user interface that shows which paths the different values take.

To explain why this is hard, let me use a metaphor: With data-flow you need to deal with a new dimension: time. Sourcetrail can handle dependencies between definitions really well, before the code is executed: space. What you want is a tool that combines the two into a space-time exploration tool of source code. Not sure if possible at all, but very interesting to think about. :)

j88439h84

I just watched your talk, and found it very interesting, thanks for sharing it!

I hope you can do the data flow analysis. That would be so cool and SO useful.

Python certainly has the ability to collect the data. A couple existing tools make use of this.

For example, MonkeyType and Birdseye observe the values that are passed around by tracing execution during test runs (or even during a production run, but the performance impact can be substantial). https://github.com/Instagram/MonkeyType https://github.com/alexmojaki/birdseye

Even more information can be gleaned from the gc module (see https://mg.pov.lt/objgraph/ for a tool using it).

These tools make good progress, but I'd be very interested to see what a software-visualization expert would come up with.

I'd also love to see how a concurrent execution tree can be visualized. For example, the wonderful Trio concurrency library is built on a tree of concurrent tasks. It would be so cool to see which events are happening at the same time. I've never seen a visualization of how it'd work. (The Trio team is also extremely friendly on their Gitter chat.) https://github.com/python-trio/trio

Structured logging is yet another exciting area. Can we generate visualizations from logs in OpenTracing/OpenCensus format? Some existing work is https://github.com/jonathanj/eliottree

Gary Bernhardt's "A whole new world" talk https://www.destroyallsoftware.com/talks/a-whole-new-world proposes extracting data from logs and highlighting important lines from tracebacks and and slow lines from trace timings in the editor.

I haven't used Structurizr, but it seems interesting. Do you have thoughts on it? https://structurizr.com/ has a python port at https://github.com/sixty-north/structurizr-python