Global media has called inventor and author Ray Kurzweil everything from ‘restless genius’ and ‘the ultimate thinking machine’ to ‘rightful heir to Thomas Edison’ and ‘one of the 16 revolutionaries who made America.’ His list of inventions is long—from flat-bed scanners to synthesisers and reading machines for the visually-impaired. He has over 20 honorary doctorates, and has founded a string of successful companies.
There is an interesting story about how this genius, who was not known to work for any company other than his own, joined Google as director of engineering in 2012. He wanted to start a company to build a truly-intelligent computer and knew that no company other than Google had the kind of resources he needed. When he went to meet Larry Page about getting the resources, Page convinced Kurzweil to join them instead. Considering that Google was already well into deep learning research, Kurzweil agreed. It is said that it is the aura of deep learning that closed the deal.
What is this deep learning all about, anyway? It is machine learning at its best—machine learning that tries to mimic the way the brain works, to get closer to the real meaning of artificial intelligence (AI).
Taking machine learning a bit deeper
Machine learning is all about teaching a machine to do something. Most current methods use a combination of feature extraction and modality-specific machine-learning algorithms, along with thousands of examples, to teach a machine to identify things like handwriting and speech. The process is not as easy as it sounds. It requires a large set of data, heavy computing power and a lot of background work. And, despite tedious efforts, such systems are not fool-proof. These tend to fail in the face of discrepancies. For example, it is easy for a machine-learning system to get confused between a hurriedly written 0 and 6, or vice versa. It can understand brother but not a casually-scribbled bro. How can such machine learning survive in the big, bad, unstructured world?
Deep learning tries to solve these problems and take machine learning one step ahead. A deep learning system will learn by itself, like a child learns to crawl, walk and talk. It is made of multi-layered deep neural networks (DNNs) that mimic the activity of the layers of neurons in the neocortex. Each layer tries to go a little deeper and understand a little more detail.
The first layer learns basic features, like an edge in an image or a particular note of sound. Once it masters this, the next layer attempts to recognise more complex features, like corners or combinations of sounds. Likewise, each layer tries to learn a little more, till the system can reliably recognise objects, faces, words or whatever it is meant to learn.
With the kind of computing power and software prowess available today, it is possible to model many such layers. Systems that learn by themselves are not restricted by what these have been taught to do, so these can identify a lot more objects and sounds, and even make decisions by themselves. A deep learning system, for example, can watch video footage and notify the guard if it spots someone suspicious.
Google has been dabbling with deep learning for many years now. One of its earliest successes was a deep learning system that taught itself to identify cats by watching thousands of unlabelled, untagged images and videos. Today, we find companies ranging from Google and Facebook to IBM and Microsoft experimenting with deep learning solutions for voice recognition, real-time translation, image recognition, security solutions and so on. Most of these work over a Cloud infrastructure, which taps into the computing power tucked away in a large data centre.
As a next step, companies like Apple are trying to figure out if deep learning can be achieved with less computing power. Is it possible to implement, say, a personal assistant that works off your phone rather than rely on the Cloud? We take you through some such interesting deep learning efforts.
From photos to maps, Google uses its Brain
Brain is Google’s deep learning project, and its tech is used in many of Google’s products, ranging from their search engine and voice recognition to email, maps and photos. It helps your Android phone to recognise voice commands, translate foreign language street signs or notice boards into your chosen language and do much more, apart from running the search engine so efficiently.
Google also enables deep learning development through its open source deep learning software stack TensorFlow and Google Cloud Machine Learning (Cloud ML). The Cloud offering is equipped with state-of-the-art machine learning services, a customised neural network based platform and pre-trained models. The platform has powerful application programming interfaces (APIs) for speech recognition, image analysis, text analysis and dynamic translation.