Creating a large-scale event processing system can be a daunting task. Especially if you want it “stupid simple” and wrapped around each client’s needs. We built a straightforward solution for this using Python 3 and other open-source tools.
Main issues to solve for a system that needs to be both performant and scalable:
handling a throughput of 1 million events per minute in a 4 cores AWS instance;
following the principle of least astonishment;
data aggregation and how Python’s standard libraries and data structures can help;
failsafe and profiling mechanisms that can be applied to any Linux service in production;
addressing unexpected behaviors of Python’s Standard Library; like reading from a file while it is written;
tackling a sudden spectacular cloud instance failure;
The alternative to this system would be to adopt existing technology stacks that might be too general, add more complexity, bloat, costs and which need extensive work to solve your specific problem. Moreover, our approach resulted in over 85% drop on hardware utilisation.
Context: Production Software – II (where good coding reduces the client’s bill)