Convolutional Neural Networks for Autonomous Cars (Part 2 of 2)

V.P. Sampath is presently a technical architect at Adept Chips, Bengaluru. He has published articles in national newspapers, IEEE-MAS Section and international papers on VLSI and networks


OpenVX is a low-level programming framework domain that enables software developers to efficiently access computer-vision hardware acceleration with both functional and performance portability. It has been designed to support modern hardware architectures, such as mobile and embedded SoCs as well as desktop systems. Many of these systems are parallel and heterogeneous, containing multiple processor types including multi-core CPUs, DSP subsystems, GPUs, dedicated vision computing fabrics as well as hardwired functionality. Additionally, vision system memory hierarchies can often be complex, distributed and not fully coherent.

Developers need a set of high-level tools and standard libraries like OpenCV and OpenVX that work in conjunction with and complement the underlying C/C++ toolchain. OpenCV is an open source computer-vision software library that contains 2500 functions which, when used by a high-level application, can facilitate tasks like object detection and tracking, image stitching, 3D reconstruction and machine learning.

OpenCV and OpenVX to accelerate vision system development

Fig. 8: OpenCV and OpenVX to accelerate vision system development

OpenVX contains a library of predefined and customisable vision functions, a graph-based execution model with task- and data-independent execution, and a set of memory objects that abstract the physical memory. It defines a ‘C’ application programming interface (API) for building, verifying and coordinating graph execution, as well as accessing memory objects.

Graph abstraction enables OpenVX implementers to optimise graph execution for the underlying acceleration architecture. It also defines the vxu utility library, which exposes each OpenVX predefined function as a directly callable ‘C’ function, without the need to first create a graph. Applications built using the vxu library do not benefit from optimisations enabled by graphs; however, the vxu library can be useful, as the simplest way to use OpenVX and as the first step in porting existing vision applications. As the computer-vision domain is still rapidly evolving, OpenVX provides an extensibility mechanism for adding developer-defined functions to the application graph.

AutonomouStuff touts its “completely customisable R&D vehicle platforms used for ADAS, advanced algorithm development and automated driving initiatives.” Conventional ADAS technology can detect some objects, do basic classification, alert the driver of hazardous road conditions and, in some cases, slow or stop the vehicle. This level of ADAS is great for applications like blind spot monitoring, lane-change assistance and forward collision warnings.

NVIDIA DRIVE PX 2 AI car computers take driver assistance to the next level. These take advantage of deep learning and include a software development kit (SDK) for autonomous driving called Drive Works. This SDK gives developers a powerful foundation for building applications that leverage computation-intensive algorithms for object detection, map localisation and path planning. With NVIDIA self-driving car solutions, a vehicle’s ADAS can discern a police car from a taxi, an ambulance from a delivery truck, or a parked car from one that is about to pull out into traffic. It can even extend this capability to identify everything from cyclists on the sidewalk to absent-minded pedestrians.


OpenPilot project consists of two component parts: onboard firmware and ground control station (GCS). The firmware is written in ‘C,’ whilst ground control station is written in ‘C++’ utilising Qt. This platform is meant for development only.

OpenPilot is basically a behaviour model based on’s trained network., a startup, used quick hacking of the car’s network bus to simplify having the computer control the car. They did it almost entirely with CNNs. The car feeds images from a camera into the network, and out from the network come commands to adjust the steering and speed to keep a car in its lane. As such, there is very little traditional code in the system, just the neural network and a bit of control logic.

The network is built by training it. As a car is driven around, it learns from the human driving what to do when it sees things in the field of view. Light Detection and Ranging (LIDAR) gives the car an accurate 3D scan of the environment to more absolutely detect the presence of cars and other users of the road. By getting to know during training that there is really something there at these coordinates, the network can learn how to tell the same thing from just the camera images.

When it is time to drive, the network does not get the LIDAR data, however, it does produce outputs of where it thinks the other cars are, allowing developers to test how well it is seeing things. This allows development of a credible autopilot, but, at the same time, developers have minimal information about how it works, and can never truly understand why it is making the decisions it does. If it makes an error, they will generally not know why it made the error, though they can give it more training data until it no longer makes the error.


Please enter your comment!
Please enter your name here