Simulating hours of robots' work in minutes
Our product uses a fleet of real (not virtual) robots to perform different tasks in a fulfillment warehouse. Simulation is an essential tool in this kind of product: it allows to perform regression tests and test new features without the need for real and expensive hardware, to compare the impact of different algorithms and optimizations, to inject failures, and more.
Tasks performed by physical robots take time (movement over the warehouse, box lifting, etc.), but in simulation, where virtual robots are used, there is no need to wait all that time. I will describe our implementation of the Discrete-Event Simulation approach which allows us to simulate hours of real-life in minutes.
Shortening simulation time improves the development process by providing faster feedback to developers and quicker CI and testing cycles. Another powerful advantage is a more deterministic simulation - using this approach, each component in the system gets equal opportunity (CPU time) in each time tick, which is not affected by the underlying machine that the simulation is running on. Also, it is possible to simulate any date and hour easily, and by that we wouldn't panic before the "Y2K bug".
I will elaborate on some challenges we encountered: time leak of event-driven components, differences between dev and production environments and running a distributed simulation due to the transition to microservices.