What does HackerNews think of hyperdx?
Resolve production issues, fast. An open source observability platform unifying session replays, logs, metrics, traces and errors.
We're building an open source, dev friendly observability tool (think Datadog, but something developers actually love to use and companies can actually afford).
We're in the intersection of needing to build rock solid infrastructure ingesting TBs of data, searching it incredibly quickly and scalably, and layering on top a buttery smooth DX from our language-specific SDKs, APIs and web app.
We're super early and hiring our first founding engineer. We already have a cloud product customers pay for and love, loads of runway regardless of the wider economy, 5k+ Github stars weeks after our OSS launch, and tons of hard technical problems.
The vast majority of our work is open source, so you can get a sense of what you'd be working with here: https://github.com/hyperdxio/hyperdx
Our job listing is here as well: https://www.ycombinator.com/companies/hyperdx/jobs/zFXTbzl-f...
Come by our discord as well if you just want to talk shop: https://discord.gg/FErRRKU78j
I'm Mike, one of the cofounders. If you love shipping quickly and want to help us build from the ground up an open source developer tool (that devs won't hate when they're on-call), let me know! mike [at] hyperdx.io
If this was us [1] (an OSS Datadog alternative posted last week :D ), we do have quite a bit (11, 2 are optional) but we're working on bringing it down.
We currently split our ingestion pipeline into 3 independently scalable bits, but we can probably bring it down into 1 for any small-scale deployment. Otherwise we do have the standard need for a cache (redis), db (mongo), main storage (clickhouse), and then the standard API server + frontend, and a separate task to run alerts.
There's likely a few more things we can merge together, but it comes at the expense of making it unscalable to workloads typical to a company and divergence in the code base, which overall doesn't seem like the right tradeoff.
Imo the real concern as someone that personally owns too many tiny VM instances is resource footprint - which can be tuned down to ~1GB of memory for us (depending on server load).
Feel free to open an issue too if you think there's something we can adjust there as well.
Clickhouse gives users much greater flexibility in tradeoffs than either a time-series or inverted-index based store could offer (along with S3 support). There's nothing like a system that can balance high performance AND (usable) high cardinality.
[1] https://github.com/hyperdxio/hyperdx
disclaimer (in case anyone just skimmed): I'm one of the authors of HyperDX
"HyperDX helps engineers figure out why production is broken faster by centralizing and correlating logs, metrics, traces, exceptions and session replays in one place. An open source and developer-friendly alternative to Datadog and New Relic."
Just perfect. Bravo.
--
As a merc, I never understood the why of Datadog (or equiv). The teams and projects I rotated thru each embraced the "LOG ALL THE THINGS!" strategy. No guiding purpose, no esthetics. General agreement about need to improve signal to noise ratio. But little courage or gumption to act. And any such efforts would be easily rebuffed by citing the parable of Chesterfordstorm's Fences of Doom and something something about velocity.
Late last century, IT projects, like CRMs and ERPs, were plagued by over collection of data. Opaque provenance, dubious (data) quality, unclear ownership, subtractive value propositions (where the whole is worth less than the parts). No, no, don't remove that field. We might need it some day.
Today's "analytics" projects are the same, right? Every drive-by stakeholder tosses in a few tags, some misc fields, a little extra meta. And before anyone can say "kanban", the stone soup accreted enough mass to become a gravity well threatening implosion dragging the entire org-chart into the gapping maw of our universe's newest black hole.
Am I wrong?
But logging is useful, right? Or at least has that potential.
The last time I designed a system end-to-end, that's kinda what we did. Listed all the kinds of things we wanted to log. Sorta settled on formats and content (never really ever done). Did regular log bashs to explain and clear anomalies. Scripts for grooming and archiving. (For one team I rotated thru, most of their spend was on just cloudwatch. Hysterical.)
But my stuff wasn't B2C, so wasn't tainted by the attention economy, manufactured outrage, or recommenders. No tags, referrers, campaigns, etc. It was just about keeping the system up and true. And resolving customer support incidents asap.
Does any one talk or write about this? (Those SRE themed novels are now buried deep in my to read pile.)
I'd like some cookbooks or blue prints which show some idealized logging strategies, with depictions of common enough troubleshooting scenarios.
Having something authoritative to cite could reduce my semblance to an Eeyore. "Hey, team mates, you know what'd be really great?! Correlation IDs! So we can see how an action percolates thru our system!"
Just curious.
PS- Datadog's server hexagon map/chart thingie is something else. The kind of innovation that wins prizes.