Computer vision in Uncanny Vision is targeted to and optimised for an ARM processor. A new set of cameras that Uncanny works on is similar to the ones used in smartphones as these are equipped with graphic processors and other processing cores for fulfilling all hardware requirements for evaluating the visual sequences captured by these.
Evaluating visual sequences
Processing of data starts after the images reach the CMOS sensor on the camera. From here, these are taken to an image signal processor that cleans up the image from the sensor and then hands it to the main application running on the CPU. All of this happens inside a system-on-chip like Qualcomm’s Snapdragon chips used in most smartphones these days.
This is where Uncanny Vision’s software takes over and interprets the image using deep learning algorithms and convolutional neural network models. These convolutional neural network software models are able to recognise various useful information like human actions and faces from video input, and provide the analytics needed for a surveillance system.
The convolutional neural network models are first trained on GPU based Cloud computers using a tool called Caffe with labelled sample data. Uncanny Vision’s UncannyDL software uses these models in the camera where the hardware equipped with CPU, GPU and brewed models inside the endpoint device detects any anomalies, and pushes out a photo, video, metadata or a decision to the central Cloud platform.
Caffe is a deep learning framework made with expression, speed and modularity. Uncanny Vision uses Caffe for training convolutional neural network models with maximum accuracy and specificity of whatever it sees and identifies. Once the images from video footages are in, deep learning Caffe models kick in and refer to 1000 or more objects it has been taught to recognise. The system can also be taught to recognise animals, human poses and defects.
For machines, deep learning has just began
It may sound like science fiction, but in its simplistic form, deep learning is a sub-field of machine learning that works on the learning levels of representations. Representation can be of any form of digital data in form of images, audio, text and the like. For machines to better understand this part, an artificial neural network could be employed to identify physical objects in a multi-level hierarchy of features, factors and concepts.
A convolutional neural network is a kind of deep neural network in the deep learning concept used in this innovation. It has the capability to study and identify various objects concurrently.
Uncanny Vision is mimicking ways in which human vision and thoughts work through its IoT-Cloud platform for identifying anomalies rather than performing a direct image-to-image match for decision-making. Using this technology, it could be proved that such systems could be used to make our lives easier and secure.
Use cases in retail surveillance and analytics
Retail space has some interesting use cases for visual analytics based IoT systems. Visual data could be used to track movement of people across the display unit in the store and create heat-maps, paths or position-tracking. Path tracking could help a store manager to know the places where more people would possibly frequent and create a heat-map.
Apart from those, reactions and interests towards specific products could also be studied. This would sound advantageous for the retailer as he or she can position the products giving better visibility and thereby maximising probability of the customer buying the right product. The system could be taught to track multiple people and objects just in case there is a huge rush in the store.
Uncanny Vision’s IoT-Cloud infrastructure is wired to send out decisions and opinions in the form of notifications to a human with the available interfaces and media such as emails or push notifications on a handheld mobile device. A phenomenal amount of time is saved using Uncanny Vision as the end point device is intelligent enough to send out only processed decisions to the IoT-Cloud infrastructure, because 90 per cent of the time nothing interesting happens in front of a surveillance camera.
More than just sensing
In surveillance, most of the current 300 million cameras worldwide are blind and can only record video for post-mortem analysis when unexpected incidents happen. In some cases, there are basic analytics that are performed on servers using traditional image-processing algorithms like motion detection in high-security locations, people detection and so on.
Uncanny Vision’s artificial intelligence/deep learning based algorithms can analyse videos a lot deeper and understand what actions humans perform, unexpected visual anomalies and the like.
In addition, Uncanny Vision’s optimised surveillance software can run on high-performance cameras like Qualcomm’s Snapdragon based wireless IP camera and carry out analytics on the device itself, making the system more scalable.