Fri, 12 Jul 2019 13:16:27 GMT
It’s December in London. Grzegorz 'MaNa' Komincz, one of the world’s best StarCraft II players, is about to take on a machine learning system called ‘AlphaStar’ in the headquarters of Alphabet’s AI research lab, DeepMind.
MaNa is confident he’ll defeat this new piece of technology: “I’m hoping for a 5-0 win... but I think the realistic goal would be 4-1 in my favour.” He cracks his knuckles, adjusts his mouse and keyboard and it’s on...
A few hours later, AlphaStar had annihilated its human opponent 0:5. What happened?
Enter Reinforcement Learning
Deep reinforcement learning is one of the core ideas behind AlphaStar. A class of machine learning systems that learn on their own from first principles, by interacting with their environment on a trial and error basis, trying to maximise a given reward signal.
Reinforcement learning systems can, for example, learn to play Super Mario at superhuman level from scratch just by trying to maximise the game score.
Now, simple Nintendo games are one thing, StarCraft II is quite another. StarCraft is a complex real time strategy game, that requires long term planning in a vastly complex environment, with no single best strategy and a lot of hidden information.
After his defeat MaNa said something truly humbling for a pro at the top of his game, who has spent his entire life perfecting his skills: ‘I thought I was learning something’. MaNa wasn’t trying to explain why he lost, he was trying to understand how AlphaStar won, and learn from it.
Recent advances in machine learning show that to claim that “only humans can do this” in any field is probably not the smartest idea. Still, that’s what a lot of people in the creative industries do.
Instead everyone that considers themselves creative should really take what MaNa said to heart and try to understand these systems. There is a lot we can learn from them. To understand what I mean by that, let’s look at how AlphaStar won against MaNa.
AlphaStar won not just by controlling the computer better or faster - although in some situations this was definitely the case - it also won by pursuing creative strategies.
There’s a funny moment in game four where AlphaStar is attacking MaNa with a huge army of relatively weak units, called ‘Stalkers’. Their number keeps rising to the point that the commentators are just confused. No human player would hold on to this strategy in this situation. But AlphaStar does, ultimately defeating MaNa’s ‘stronger’ army.
The system also seemingly overproduces worker units, so called ‘Probes’, way beyond the limit humans find useful and it keeps paying off in bizarre ways later in the game.
So, what does all of this have to do with creativity? AlphaStar won using ideas that a human player would think of as wrong or wouldn’t consider at all. It challenged the accepted wisdom that the StarCraft community built up in the 20 years the game has been around.
Another DeepMind system, 'AlphaGo', beat one of the best human players at the ancient Chinese board game Go in 2016 also exhibited creative behaviour. It’s most famous move, which became known as 'Move 37', similarly went against knowledge that human Go players had built up over thousands of years. Commentators thought it was a mistake at first - their reactions say it all - but this exact move is crucial to winning later.
In a recent Interview with MIT Technology Review, David Silver, who co-lead the creation of both AlphaZero (the successor of AlphaGo) and AlphaStar said: “AlphaZero has to figure out everything for itself. Every single step is a creative leap. Those insights are creative because they weren’t given to it by humans. And those leaps continue until it is something that is beyond our abilities and has the potential to amaze us.”
The important part is not whether or not a system can be truly creative - that depends completely on how we define creativity and we don’t fully understand what it is or how it works - it’s that systems like these have the potential to amaze and ultimately inspire us.
Demis Hassabis, the co-founder of DeepMind pretty much summed it up during a talk at the Royal Academy of Arts, describing AlphaZero’s impact on the Go community: “Many of the top Go players that I’ve spoken to said that it’s freed their mind from the shackles of tradition. [...] They’ve come up with their own brilliant new ideas, that [...] they’ve been told as junior Go players not to do. And now they’re able to explore fully their own creativity.”
To Infinity And Beyond
AlphaStar doesn’t make Pro Gamers obsolete, but instead helps them achieve a deeper understanding of the game they love - ultimately making them better players. We need to stop thinking of AI as a thread or just using it as a buzzword and start seeing it as an opportunity. Ultimately, we could design machine learning systems that give us a deeper understanding of our own creativity and make us better creatives.
Currently machine learning is limited to narrow scenarios - asking AlphaStar to drive a car is not a good idea - and needs massive amounts of data to train on. AlphaStar gained hundreds of years of gameplay experience in the weeks leading up to the matchup with MaNa.
One of the biggest hurdles for reinforcement learning is that it needs a score to evaluate its actions, the so called ‘reward signal’. Now try to define a clear reward signal for the creative problem that you are trying to crack… not easy, right? But if anyone can help find a solution, it’s us.
Machine learning can take our creativity to places we haven’t been yet. But only if we are part of the development of these systems, can we make sure human creativity is at the heart of them. And only if we stop ignoring machine learning and start trying to really understand it can we be part of this process and help shape the future of creativity.
Gaming communities are leading by example, embracing the disruption that AI has brought to their field and using it to get better. Creatives from all disciplines should embrace developments in AI the same way.
We already can apply a lot of the learnings from AlphaStar and AlphaZero to creativity way beyond gameplay: Let’s expose ourselves to randomness, to new things and new ways of thinking. Put ourselves in situations that we can’t predict and learn from them. Let’s experiment more, challenge what everybody takes for a given and explore new paths. If simple principles like these can give birth to systems like AlphaStar, imagine what they can do to our own brain.