Some time ago there were some very interesting product releases from the likes of Google and Microsoft. While Google announced a new assistant in Google Allo, which is quite friendly, Microsoft announced a puck-like device for artists working on Surface Pro devices. Google’s chat assistant on my phone read my personal messages and gave suggestions in reply.
Is this just a matter of word association, or an invasion of privacy? However, my intrigue began when a friend sent an image of his presence at a club. Allo showed some automatic reply options including, “Have a good time!” To me, it seemed like a fun and convenient way to reduce the hassle of typing out a full message.
Analysing images is a serious task
Photography is one area that employs image analytics heavily. Dr Joseph Reger, chief technology officer, Fujitsu EMEIA, explains, “Modern cameras identify faces to make focusing easier, or these can identify smiling faces or faces of pre-selected people.” Image analysis starts by first identifying edges and moves on to full shapes. However, making changes in certain key areas in the images has successfully fooled such systems. More on that later, first let us look at how a computer recognises and processes images.
Image analysis ranges from finding basic shapes, detecting edges, removing noise and counting objects, to finding anomalies in the routine. Security and surveillance are incomplete without an intelligent system that can detect pedestrians and vehicles, and make a difference. So how does image analytics work? How was Google’s new assistant able to analyse images and suggest possible replies as soon as I received the image?
Uncanny Vision, a startup focused on image analysis, helps clarify how it happens. Their system identifies objects in images using an analytical system powered by deep learning and smart algorithms. The deep learning framework, Caffe, powers this system. The framework managed by Berkeley Vision and Learning Center (BVLC) is capable of processing more than 60 million images in a single day using just one NVIDIA K40 GPU. On an average, this amounts to about 695 images per second, whereas we only need 17 to make a series of images into a video, with HD video going up to 60. Basically, this system can analyse about 11 HD videos at once.
Deep learning is affecting image analytics.
Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, linear and non-linear. It is a representational learning approach ideally suited for image analysis challenges ranging from security and daily-purpose applications to medical problems.
Object detection is considered to be the most basic application of computer vision. Enhancements on top of object detection result in other developments including anomaly mapping and error detection in real-time imaging. However, deep learning is still an expanding topic. Dr Reger adds, “It is still unclear what the optimal depth of a deep learning net should be, or what the ideal component functions in the targeted approximation might be.”
Upgrading neural networks
It is no doubt that neural networks can identify and recognise patterns and do a lot of other interesting stuff. However, when we talk about real-time image analysis from multiple angles and lack of content in the frame, going beyond the capabilities of neural networks is required. One of the most stated advancement in this regard has been convolutional neural networks (CNNs). Self-driving cars, auto-tagging of friends in pictures, facial-security features, gesture recognition and automatic number plate recognition are some of the areas that benefit from CNNs.
Broadly, a CNN consists of three steps that can be divided into more steps. Starting with convolution, we go back to our engineering days, when we convolved a filter with input function to get some output. Padding was then added to make this result similar in size to the original image. You know the basics.
Pooling is done to reduce the size of data, and this makes it easier to sift through. Then, the networks use fully-connected layers where each pixel is considered as a separate neuron similar to a regular neural network.
Batch normalisation was another step in the process, which has become outdated with use. Better results over CNNs result in similar implementation as well.
Where we have seen CNN before.
Video analysis, image recognition and drug discovery are some of the common uses of CNNs around us. However, the one that brought CNNs to light would be AlphaGo programme by Google DeepMind. The Chinese game of Go is a complex board game requiring intuition, creative and strategic thinking. It has been a major subject in artificial intelligence (AI) research. In March 2016, Google’s AlphaGo beat Lee Sedol, a South Korean professional Go player of ‘9 dan’ rank by 4-1. And here we thought recognising and following a person in a crowd was amazing for a neural network.