Simulating crowds without crowds

2018 edition

Beatriz Cabrero Daniel

Simulating groups of agents navigating in a human-like fashion has many relevant applications: the film and game industry, 3D web environments, robotics, architectural design, etc. Besides animating the characters and synthesizing the virtual environment, a key aspect of Crowd Simulation (CS) is generating believable trajectories for each agent. Motion models, specifically those with local decisions, try to capture how people move and coordinate without any explicit communication [1]. Current approaches to CS, though, require lots of manual tuning (inefficient and subjective), reference data that is costly to obtain (annotated video tracking), or are too time-consuming in high dimensional spaces [2].
We present a machine learning framework based on synthetic crowd simulations to partly avoid those issues. By defining a simulation quality function (penalize non-human-like behaviors regardless of the efficiency) and running many simulations, it can learn the best parameter values to steer each agent. Thus, an equilibrium between factors such as “acceptable closeness to others”, “desire to move towards the goal”, or “avoidance of sharp turns” is found. The learning algorithm, a simplified version of Iterative Importance Sampling (IIS), consists of running randomly initialized scenarios making informed changes on the parameter values until convergence (i.e. acceptable performance reached).
The Learning Crowds framework, using our version of IIS, a velocity-based motion model, and an arbitrary simulation quality function, can pick the parameter combination that maximizes the performance of all agents. Depending on the simulation quality function (e.g. not penalizing collisions between agents), we can learn different parameter values that lead to different behaviors (e.g. reaching the exit soon is more important than bumping into other agents). Therefore, parameters do not need to be manually tuned to generate believable crowd simulations in specific scenarios. Instead, a quality function must be defined taking into account what is known about human navigation.
This simulation quality could vary according to the environment, changing the priorities of agents depending on the situation they are in (e.g. acceptable closeness to other agents in a concert venue versus strolling through a deserted mall). In a sense, picking which “sensory input” to use means learning the appropriate motion model that agents would use to navigate in a particular scenario. So far, we work with distance to others, sharpness of turns, and goal attraction.
With this, we can avoid any manual parameter tuning, we are totally based on synthetic data (which is less costly to obtain and is flexible to generate), we can study otherwise impossible situations (e.g. evacuations), and the sampling-based learning reduces the time needed to find a good parameter combination.
[1] J. Ondrej et al, ACM Transactions on Graphics, 2010
[2] D. Wolinski et al, Computer Graphics Forum, 2014