Metric knowledge is the identification of the geometry of static and dynamic objects, which is required to keep the vehicle in its lane and at a safe distance from other vehicles. Symbolic knowledge allows the vehicle to classify lanes and conform to basic rules of the road. Conceptual knowledge allows the vehicle to understand relationships between traffic participants and anticipate the evolution of the driving scene. Conceptual knowledge is the most important aspect for being able to detect specific objects and avoid collisions.

One current method of obstacle detection in autonomous vehicles is the use of detectors and sets of appearance-based parameters. The first step in this method is the selection of areas of interest. This process narrows down areas of the field of vision that contain potential obstacles.

Appearance cues are used by the detectors to find areas of interest. These appearance cues analyse two-dimensional data and may be sensitive to symmetry, shadows, or local texture and colour gradients. Three-dimensional analysis of scene geometry provides greater classification of areas of interest. These additional cues include disparity, optical flow and clustering techniques.

Disparity is the pixel difference for an object from frame to frame. If you look at an object alternately closing one eye after the other, the ‘jumping’ you see in the object is the disparity. It can be used to detect and reconstruct arbitrarily shaped objects in the field.

Optical flow combines scene geometry and motion. It samples the environment and analyses images to determine the motion of objects. Finally, clustering techniques group image regions with similar motion vectors as these areas are likely to contain the same object. A combination of these cues is used to locate all areas of interest.

Image identification in autonomous vehicles

Fig. 6: Image identification in autonomous vehicles

While any combination of cues is attainable, it is necessary to include both appearance cues and three-dimensional cues as the accuracy of three-dimensional cues decreases quadratically with increasing distance. In addition, only persistent detections are flagged as obstacles so as to lower the rate of false alarms.

After areas of interest have been identified, these must be classified by passing them through many filters that search for characteristic features of on-road objects. This method takes a large amount of computation and time. The use of CNNs can increase the efficiency of this detection process. CNN-based detection system can classify areas that contain any type of obstacle. Motion-based methods such as optical flow heavily rely on the identification of feature points, which are often misclassified or not present in the image.

All the knowledge-based methods are for special obstacles (pedestrians, cars, etc) or in special environments (flat road, obstacles differing in appearance from ground). Convolutional neural networks are the most promising for classifying complex scenes because these closely mimic the structure and classification abilities of the human brain. Obstacle detection is only one important part of avoiding a collision. It is also vital for the vehicle to recognise how far away the obstacles are located in relation to its own physical boundaries.

Obstacle detection test results: Input images (top), ground truths with black as positive (middle) and detected obstacles with orange as positive (bottom)

Fig. 7: Obstacle detection test results: Input images (top), ground truths with black as positive (middle) and detected obstacles with orange as positive (bottom)

Depth estimation

Depth estimation is an important consideration in autonomous driving as it ensures the safety of passengers as well as other vehicles. Estimating the distance between an obstacle and the vehicle is an important safety concern.

A CNN may be used for this task as CNNs are a viable method to estimate depth in an image. In a study, researchers trained their network on a large dataset of object scans, which is a public database of over ten thousand scans of everyday 3D objects, focused on images of chairs and used two different loss functions for training. They found that bi-weight trained network was more accurate at finding depth than the L2 norm. With images of varying size and resolution, it had an accuracy between 0.8283 and 0.9720 with a perfect accuracy being 1.0.

While estimating depth on single-frame stationary objects is simpler than on moving objects seen by vehicles, researchers found that CNNs can also be used for depth estimation in driving scenes. They fed detected obstacle blocks to a second CNN programmed to find depth. The blocks were split into strips parallel to the lower image boundary. These strips were weighted with depth codes from bottom to top with the notion that closer objects would normally appear closer to the lower bound of the image. The depth codes went from ‘1’ to ‘6’ with ‘1’ representing the most shallow areas and ‘6’ representing the deepest areas. The obstacle blocks were assigned the depth code for the strip they appeared in.

The CNN then used feature extraction in each block area to determine whether vertically adjacent blocks belonged to the same obstacle. If the blocks were determined to be the same obstacle, they were assigned the lower depth code to alert the vehicle of the closest part of the obstacle. CNN was trained on image block pairs to develop a base for detecting depth and then tested on street images as in the obstacle detection method. The CNN had an accuracy of 91.46 per cent in two-block identification.

Read part 2





Please enter your comment!
Please enter your name here