This is the hard part... I would say my current company does a poor job, and every other company I've worked for has also done a poor job. They've treated it like an internal playground for devs to validate that their features work, not a representative copy of prod with all the scaling problems and user-data funkiness that come with it. Here are some options I see:

- load a replica of your prod DB to stage daily/weekly and have all the same ETL jobs running

- setup load testing or user behavior regression tests to automatically go through critical pathways like user authentication and registration ("bare essentials" functionality, since writing these is tedious). This might be a good chance to use traffic-capture to at least get started/make setting up these behavior tests easier

- if it's a consumer-facing product, have employees dogfood the product on stage

- if it's a product for businesses, run your business off the stage or a 3rd slightly more stable "internal" environment to create some consequences for not keeping it running smoothly.

My experiences have not had representative load on stage, so the extra billing is proportionally smaller (since you're paying what you use in most cases). I don't know the billing specifics, but you can also consider dropping the log/metric retention window significantly on stage (say 1mo instead of 6mos) to save costs.

Ultimately I don't think you're going to get the same scaling problems to manifest on stage. It's more of a functionality testing ground IME.

I have 3 YOE as a dev so don't base your whole business plan on my ideas