As if it weren’t enough to have AI tanning humanity’s hide (figuratively for now) at every board game in existence, Google AI has got one working to destroy us all at ping pong as well. For now they emphasize it is “cooperative,” but at the rate these things improve, it will be taking on pros in no time.
The project, called i-Sim2Real, isn’t just about ping pong but rather about building a robotic system that can work with and around fast-paced and relatively unpredictable human behavior. Ping pong, AKA table tennis, has the advantage of being pretty tightly constrained (as opposed to playing basketball or cricket) and a balance of complexity and simplicity.
“Sim2Real” is a way of describing an AI creation process in which a machine learning model is taught what to do in a virtual environment or simulation, then applies that knowledge in the real world. It’s necessary when it could take years of trial and error to arrive at a working model — doing it in a sim allows years of real-time training to happen in a few minutes or hours.
But it’s not always possible to do something in a sim; for instance what if a robot needs to interact with a human? That’s not so easy to simulate, so you need real-world data to start with. You end up with a chicken and egg problem: You don’t have the human data, because you’d need it to make the robot the human would interact with and generate that data in the first place.
The Google researchers escaped this pitfall by starting simple and making a feedback loop:
[i-Sim2Real] uses a simple model of human behavior as an approximate starting point and alternates between training in simulation and deploying in the real world. In each iteration, both the human behavior model and the policy are refined.
It’s OK to start with a bad approximation of human behavior, because the robot is also only just beginning to learn. More real human data gets collected with every game, improving the accuracy and letting the AI learn more.
The approach was successful enough that the team’s table tennis robot was able to carry out a 340-strong rally. Check it out:
It’s also able to return the ball to different regions, granted not with mathematical precision exactly, but good enough it could begin to execute a strategy.
The team also tried a different approach for a more goal-oriented behavior, like returning the ball to a very specific spot from a variety of positions. Again, this isn’t about creating the ultimate ping pong machine (though that is a likely consequence nevertheless) but finding ways to efficiently train with and for human interactions without making people repeat the same action thousands of times.
You can learn more about the techniques the Google team employed in the summary video below: