I am confused of what is being open sourced here. What I've managed to read from announcements and README I find myself being funnelled to Discord and Sourcegraph cloud services. I'm trying to understand what is what here.

I think there's three components that are needed to have the best (admittedly enticing) experience: Cody itself, Sourcegraph at the background (optionally) and an LLM called Claude from Anthropic. Claude is very much proprietary. Sourcegraph is open core, but to use it as a Cody's "helper" do I need those proprietary features? Without Claude and Sourcegraph Enterprise/Cloud what can Cody do, say with LLama based LLM, should this integration happen?

Again, what I've read, taken at face value, seems really promising. I've used Sourcegraph a few times in the past and sometimes wondered how it would benefit in my commercial work. Having an LLM could make this a next level tool, possibly something that regular chat-type LLM based services don't currently do.

beyang

Cody is being open sourced under Apache 2. The source code is here: https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-.... The analog would be if GitHub open-sourced Copilot but didn't open source GitHub (Sourcegraph is open core, similar to GitLab, with all the code publicly available and the enterprise-licensed code under "enterprise" directories).

The network dependencies are Cody --> Sourcegraph --> Anthropic. Cody does need to talk to a chat-based LLM to generate responses. (It hits other APIs specific to Sourcegraph that are optional.)

We are working on making the chat-based LLM swappable. Anthropic has been a great partner so far and they are stellar to work with. But our customers have asked for the ability to use GPT-4 as well as the ability to self-host, which means we are exploring open source models. Actively working on that at the moment.

Sorry for any lack of clarity here. We would like to have Cody (the 100% open source editor plugin) talk to a whole bunch of dev tools (OSS and proprietary). We think it's totally fine to have proprietary tools in your stack, but would prefer to live in a world where the thing that integrates all that info in your editor using the magic of AI and LLMs to be open source. This fits into our broader principle of selling to companies/teams, and making tools free and open for individual devs.

beyang

I'll add if folks want to submit a PR to turn on other LLMs (or have Cody talk to the base LLM provider directly, sans Sourcegraph), we're happy to accept those. Literally the only thing preventing us from doing that right now is prioritization (our team is 4 people and we're scrambling to improve context fetching and implement autocomplete rn :sweat-laugh-emoji:)

wsxiaoys

That‘ll be super interesting!

Local context is definitely the key factor for small models achieving better quality than copilot (related: [1], [2]) .

One things I’d really wanna have in Sourcegraph: A Search API supports custom retrieval / ranking. Research works (e.g [2]) show simple BoW fetched context is more efficient for code completion tasks.

Disclaimer: I’m building https://github.com/TabbyML/tabby an open source alternative of copilot.

[1]: https://arxiv.org/abs/2206.12839

[2]: https://arxiv.org/abs/2303.12570