I’d recommend the event producer send a UUID that is then the primary key on the events table. The producer should also send the timestamp the event occurred.
I could be missing something, but that seems to solve both the duplicate event firing (an upsert command based on the UUID makes duplicate event writing a non-issue) and the timing issues.
Though I’m still incredibly skeptical of “real-time analytics.” The number of business cases that require actual real-time analysis are pretty limited. High frequency trading and...?
However, this event ID is not enough to identify and then dedupe all types of duplicate events. This blog post provides more information:
https://snowplowanalytics.com/blog/2015/08/19/dealing-with-d...
Big thanks to pragmacoders for putting this tutorial together! It's awesome seeing what you are doing with the Snowplow platform :-)