Convolutional Neural Networks for Autonomous Cars (Part 2 of 2)

V.P. Sampath is presently a technical architect at Adept Chips, Bengaluru. He has published articles in national newspapers, IEEE-MAS Section and international papers on VLSI and networks


Developing an autonomous car requires a lot of training data. To drive using vision, developers need to segment 2D images from cameras into independent objects to create a 3D map of the scene. Other than using parallax, relative motion and two cameras to do that, autonomous cars also need to see patterns in the shapes to classify them when they see them from every distance and angle, and in every lighting condition. That’s been one of the big challenges for vision-you must understand things in sunlight, in diffuse light, in night-time (when illuminated by headlights, for example) when the sun is pointed directly into the camera when complex objects like trees are casting moving shadows on the desired objects and so on. You must be able to identify pavement marking, lane marking, shoulders, debris, other road users, etc in all these situations and you must do it so well that you make a mistake perhaps only once every million miles. A million miles is around 2.7 billion frames of video.

Autonomous driving startups

The goal is to build a hardware and software kit powered by artificial intelligence for carmakers., an AI software startup with expertise in robotics and deep learning, applies machine learning to both driving and human interaction.

Startup FiveAI develops a host of technologies including sensor fusion, computer vision combined with a deep neural network, policy-based behavioural modelling and motion planning. The goal is to design solutions for a Level 5 autonomous vehicle that is safe in complex urban environments.

Nauto offers an AI-powered dual-camera unit that learns from drivers and the road. Using real-time sensors and visual data, it focuses on insight, guidance and real-time feedback for its clients, helping fleet managers detect and understand causes of accidents and reduce false liability claims. The system also helps cities with better traffic control and street design to eliminate fatal accidents.

Oxbotica specialises in mobile autonomy, navigation, perception and research into autonomous robotics. It has been engaged in the development of self-driving features for some time and purports to train their algorithms using AI for complex urban environments by directly mapping sensor data against the actual driving behaviour. Thus far, it has developed an autonomous control system called ‘Selenium,’ which is basically a vehicle agnostic operating system applicable to anything from forklifts and cargo pods to vehicles. It also offers a cloud-based fleet management system to schedule and coordinates a fleet of autonomous vehicles, enabling smartphone booking, route optimisation and data exchange between vehicles without human intervention.

SoCs optimised for neural networks

Chip vendors are using everything from CPUs and GPUs to FPGAs and DSPs to enable CNN on vision SoCs. Currently, CNN-based architectures are mainly mapped on CPU/GPU architectures, which are not suitable for low-power and low-cost embedded products. But this is changing with new embedded vision processors introduced into the market. These vision processors implement CNN on a dedicated programmable multicore engine that is optimised for efficient execution of convolutions and the associated data movement. Such an engine is organised to optimise data flow and performance, using a systolic array of cores called processing elements (PEs). One PE passes its results directly to the other using FIFO buffers, without storing and loading the data to the main memory first, which effectively eliminates shared memory bottlenecks.

Vision processors offer flexibility on how PEs are interconnected, allowing designers to easily create different application-specific processing chains by altering the number and interconnection of cores used to process each CNN stage or layer. This new class of embedded vision processor is not restricted to just object detection with CNN. Other algorithms like a histogram of oriented gradients, Viola-Jones and SIRF can be implemented as well, giving designers the flexibility to tailor the processor to their application.

In parallel processing, an image to which certain parameters are applied produces another image, and as another filter is applied to the image, it produces another image. But the good news is that there are tricks to simplify the process and remove unwanted connections.

The challenge, however, remains in handling a number of different nodes in CNNs, as developers can’t predetermine which node needs to be connected to another node. That’s why they need a programmable architecture and can’t hardwire the connections. Rather than designing a different SoC architecture or optimising it every time new algorithms pop up, a CNN processor only needs a fairly simple algorithm that comes with fewer variables.



Please enter your comment!
Please enter your name here