What does HackerNews think of goreplay?

GoReplay is an open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data. It can be used to increase confidence in code deployments, configuration changes and infrastructure changes.

Language: Go

#42 in Go

#6 in Testing

#3 in Testing

Launch HN: Codeparrot (YC W23) – Automated API testing using production traffic | Mar 2023

I love to see more activity in this area!

I'm maintainer of GoReplay https://github.com/buger/goreplay and work in this area for the last 10 years.

It is quite hard problem to solve, because you have to deal with state difference between test and production environments. Love your approach to mocking dependencies, and leveraging OpenTelementry. It potentially can solve some of state issues. But still require modifying user code. I wonder if it can be done purely using OpenTelementry (e.g. you depend on typical OTel setup), and then read the data directly from OTel DB.

Cheers!

Someone attacked our company | Nov 2020

Hey as a (way smaller) competitor of yours, I honestly feel you.

I understand you may not want my advice. But anyways, here it goes :)

I have dealt with similar issues too at my full-time job. We ingest data in the billions range too, and our costs are a lot lower than what you describe.

* On the public endpoint, can you move the first point of contact closer to the edge? You want to block requests before they even reach your ingestion pipeline. If not, add a few geographical Load Balancers with the sole job of accepting requests before forwarding them to your Lambda. Divert traffic at the DNS level using a geo routing policy on Route53.

* Are you also hitting DynamoDB for every request as part of your spam system? I found it quite expensive on-demand, unless you provision capacity. We used to pay tens of thousands per month for DynamoDB in the past, when a single Redis instance would have done it. Specially if it's temporary, non-critical data such as spam detection.

* Do you batch the incoming events before triggering SQS?

* How are your networking costs going? They tend to creep up in the AWS bills too.

It just sounds to me you clearly need something in-between the public endpoint and those Lambdas. But maybe I'm missing some info. Otherwise you're going to keep paying for capacity that a "traditional" server could handle, without costing you more for each request.

If you can't fight back, and don't want real requests to be lost, put a server at the battle-front, and buffer all raw HTTP requests into a queue made for this (like Kinesis). You can then process events and recover the data at your own pace. I did this in the past, and could handle 60k+ req/s with 2 instances on AWS. I used a simple Go tool to capture the traffic https://github.com/buger/goreplay

Also, at my full-time job we use Kinesis to buffer all analytics events, it's cheaper than SQS and handles billions of data points per month. This also keeps the ingest rate constant, but you seem to already do that with your workers.

Full disclosure: I'm Anthony from http://panelbear.com/ , and I just want to offer honest help.

Launch HN: Speedscale (YC S20) – Automatically create tests from actual traffic | Aug 2020

Is this like goreplay[1]? How are you different?

1. https://github.com/buger/goreplay

Ask HN: Best Way to Mock APIs in 2020? | Jun 2020

There are lots of variations depending upon your use case.

For unit testing and CI you may want mock objects that are implemented in the same language as your code. Google search for "mock object ". That's where you'll find Mockito (Java) or Mocha Spy (NodeJS) or Testify (golang). This list never ends.

Specifically for unit testing of a UI, you may want your browser driver to handle this, ex: Cypress has built-in support for mock AJAX endpoints. https://docs.cypress.io/guides/guides/stubs-spies-and-clocks...

If you want an endpoint you can call, Postman has a feature for this, there are several others like this in the comments (JSON Server, mmock, mountebank, etc.). https://learning.postman.com/docs/postman/mock-servers/setti...

If you need to capture traffic, check out goreplay or mitmproxy: https://github.com/buger/goreplay https://docs.mitmproxy.org/stable/

There is a whole class of "VCR" projects for recording traffic, these tend to be language specific (VCR is in Ruby), but there are ports to other languages: https://github.com/vcr/vcr https://github.com/bblimke/webmock

The vendor products tend to be labelled Service Virtualization. I used to work for one of those companies, ITKO, we were acquired by CA Technologies (now Broadcom) in 2011. There are vendor products from Micro Focus, Tricentis, Broadcom, Parasoft, etc.

It's important to think about your use case: local development, unit testing, CI, integration testing, performance testing, recording vs. programming, protocol support, payload support, etc. Many of the tools focus on just a subset of these areas.

Ask HN: How do you do monitoring/observability for dev/staging environment? | Apr 2020

Expand Context ↕

Got it. Have you tried traffic replay tools like https://github.com/buger/goreplay?

TimescaleDB vs. InfluxDB: built differently for time-series data | Aug 2018

InfluxDB isn't as easy as it's sound to operate.

Anything built on top of Postgres probably has year of knowledge to tune the db. but not much on InfluxDB. You are on your own.

You cannot even easily upgrade InfluxDB, especially when you want to use some new feature such as enabling the TSI.

When something is wrong, again, you're on the own.

The question to InfluxDB HA isn't available as well.

Yes, with all of that pain point, I'm still using InfluxDB. I even have to add in https://github.com/buger/goreplay to support replaying traffic to other instance during upgrade.

I have to write a tool to re-read old data and import into new instance instead of using their own import/restore.

Many gotcha with InfluxDB, hard to explain to dev not using high cardinality for tag, or not using too many tag. For example, people get used to `tag` concept of `fluentd` and put stuff like user id, device id into tag...

I want to log slow query time, now, I cannot use whole query as tag because of very high cardinality.

However, I kept using InfluxDB. I want to support it so we have something better than Postgres. I'm sick of SQL query(personaly) and I would like for Flux to be successful.

Similar to MongoDB, it's bad years ago but very good nowadays. I guess same thing will happen with InfluxDB. And indeed, they do improve over the year.

Similar to how we use Ruby vs C. It's about productivity. And despite of above pain points, I always find a way to solve them eventually. And the tooling around InfluxDB is nice, especially Grafana.