What is the core technology of autonomous driving?

CoolMotor is a China-based LED headlight and tail light manufacturer and supplier offering superior products made with state-of-the-art technology.

No. 22, Xinxing East Road, Xincheng Street, Ruian City, Wenzhou City, Zhejiang Province

[email protected]

What is the core technology of autonomous driving?

By, CoolMotor

2023-12-28
21 View

It is difficult to have a clear standard for defining autonomous driving at level 4 or 5, and autonomous driving should not be very complicated. Autonomous driving actually involves three questions: First, where am I? Second, where am I going? Third, how to get there? The ability to completely solve these three problems is truly autonomous driving. Therefore, Tesla’s upgraded $8,000 Autopilot 2.0 only has partial wire control functions and cannot be considered true autonomous driving. What companies such as Ford, Baidu and Google are doing is true autonomous driving, which is far superior to Tesla. The difference between the two is huge.

The first problem is positioning. Autonomous driving requires centimeter-level positioning.

The second issue is path planning. The first layer of autonomous driving path planning is point-to-point non-time-correlated topological path planning, and the second layer is real-time millisecond-level obstacle avoidance planning. The third level is to decompose the plan into longitudinal (acceleration) and transverse (angular velocity) plans.

The third problem is that the vehicle actuator performs vertical and horizontal planning, which is the control-by-wire system.

At present, the technology of autonomous driving is basically derived from robots. Autonomous driving can be regarded as a wheeled robot plus a comfortable sofa. Positioning and path planning are a problem in the robot system. Without positioning, the path cannot be planned. Centimeter-level real-time positioning is currently one of the biggest challenges for autonomous driving. For robotic systems, positioning mainly relies on cross-comparison of SLAM and Prior Map. SLAM is the abbreviation of Simultaneous Localization and Mapping, which means "simultaneous positioning and mapping". It refers to the process in which moving objects calculate their own position and build an environmental map based on sensor information. At present, the main application fields of SLAM include robots, virtual reality and augmented reality. Its uses include the positioning of the sensor itself, as well as subsequent path planning and scene understanding.

With different sensor types and installation methods, the implementation and difficulty of SLAM will vary greatly. According to sensors, SLAM is mainly divided into two categories: laser and vision. Among them, laser SLAM research is earlier, and both theory and engineering are relatively mature. The vision solution is currently (2016) still in the laboratory research stage, and commercial products for indoor and low-speed applications are not available, let alone high-speed sports outdoor environments that are far more complex than indoors. From this point alone, lidar is an essential sensor for autonomous driving.

Nearly thirty years have passed since SLAM research was proposed in 1988. Early SLAM research focused on using filter theory. After the 21st century, scholars began to learn from SfM (Structure from Motion) methods to solve SLAM problems based on optimization theory. This method has achieved certain results and has achieved a dominant position in the field of visual SLAM. People sometimes confuse the concepts of SLAM and visual odometry. It should be said that visual odometry is a module of visual SLAM, whose purpose is to incrementally estimate camera motion. However, complete SLAM also includes adding loop closure detection and global optimization to obtain accurate, globally consistent maps. Currently, open source visual sensor SLAM algorithms are mainly divided into three categories: sparse method, also known as feature point method. Dense methods, mainly RGB-D. The semi-dense method is commonly used in monocular and binocular methods, and is currently the hottest field. The main methods of laser SLAM include Hector, Gmapping, and Tiny.

There are three common categories of robot positioning, relative positioning, absolute positioning and combined positioning. Autonomous driving generally uses combined positioning. First, proprioceptive sensors such as odometry and gyroscopes are used to determine the current position of the robot by measuring the distance and direction relative to the robot's initial pose given the initial pose. Attitude, also called trajectory estimation. LiDAR or vision is then used to sense the environment, and active or passive markings, map matching, GPS, or navigation beacons are used for positioning. Position calculation methods include triangulation, trilateration, and model matching algorithms. From this perspective, IMU is also an essential component for autonomous driving.

At the same time, the autonomous positioning of robots is actually a probability problem, so there are two major schools of robot positioning algorithms, one is the Kalman filter and the other is Bayesian inference. Kalman filters include Extended Kalman Filter (EKF), Kalman Filter (KF), and Unscented Kalman Filter (UKF) positioning methods. The other type is positioning methods based on Bayesian inference. Use grids and particles to describe the robot position space, and recursively calculate the probability distribution in the state space, such as Markov Localization (MKV) and Monte Carlo Localization (MCL) positioning methods.

In map matching, there must be a Prior Map to compare with. This map is not necessarily a centimeter-level high-precision map. This requires talking about maps. Maps can be divided into four major categories, namely Metric, Topologic, Sensor, and Semantic. Our most common maps are semantic-level maps. Unmanned driving is not a missile. Generally, the input destination should be semantic-level. After all, human traffic patterns are still semantic-level, not geographical coordinates. This is also one of the differences between robots and driverless cars. Robots generally do not consider semantic meaning. They only need to know their position in the coordinate system. What GPS provides is the Metric of the global coordinate system. Future V2X will also provide a map of specific objects (moving pedestrians and vehicles) beyond radar and visual detection range (NLOS), or it can be called a V2X map. At present, most unmanned vehicles in the domestic research stage use GPS RTK for positioning. GPS RTK must be matched with centimeter-level high-precision maps to obtain semantic information, so it is impossible to truly driverless.

There are currently five main positioning methods, one is using lidar SLAM, the second is using lidar intensity scanning images, the third is using synthetic images, the fourth is using Gaussian mixture maps, and the last one is REM proposed by Mobileye.

The first type, lidar SLAM, uses the vehicle's own GPS and IMU to make an approximate position judgment, and then compares it with a pre-prepared high-precision map (Prior Map) and lidar SLAM cloud point image, or registration. , placed in a coordinate system for registration. Confirm your vehicle's position after successful matching. This is currently the most mature and accurate method.

The second is to use the intensity of lidar to scan the image. Lidar has two most basic imaging methods. One is 3D range imaging, which can be roughly understood as a point cloud; the other is intensity scanning imaging. The laser is reflected by the object. According to the different reflection intensity values, an intensity imaging image can be obtained. . The intensity value is included in the point cloud, which is one of the core technologies of light intensity separation. This positioning method requires the pre-production of a special SLAM system, called pose image SLAM (Pose-GraphSLAM), which can barely be regarded as a high-definition map produced by lidar. There are three constraints (Constraints), one is the scan matching constraint (Z), the other is the odometry constraint (Odometry Constraints, U), and the GPS prior constraint (PriorConstraints). The 3D cloud point map of lidar extracts the intensity value and the real ground (Ground Plane) and converts it into a 2D ground intensity scan image. It can be positioned after being paired with pose image SLAM.

The third type is also called image-enhanced positioning, which usually combines Lidar and visual systems for positioning, using a single eye. This method requires preparing a 3D map made by lidar in advance, using Ground-Plane Sufficient to obtain a 2D pure ground model map, using OpenGL to coordinate the monocular vision image and this 2D pure ground model map, and using regression Normalized mutual information registration. The extended Kalman filter (EKF) is then used to achieve positioning.

The fourth is the Gaussian mixture model, which is actually a supplement to the second method. When encountering harsh environments, such as thick snow, muddy roads with residual snow after snow, old and damaged roads that lack texture, For roads, Gaussian mixture models are used for positioning to improve the robustness of lidar positioning.

The first four are inseparable from lidar, which is quite expensive, but indoor VSLAM has not yet reached practicality, let alone outdoor positioning. Therefore, Mobileye proposes a positioning method that does not require SLAM. This is REM. Although REM does not use visual SLAM, it is obviously just a variant of visual SLAM. Mobileye obtains a simple 3D coordinate data by collecting "landmarks" including traffic signals, direction signs, rectangular signs, street lights and reflective signs; and then through recognition Acquire rich 1D data such as lane line information, curbs, medians, etc. Adding up simple 3D data and rich 1D data, the size is only 10Kb/km. The camera image can be positioned by matching it with this REM map. Mobileye's design is undoubtedly the lowest cost, but the premise is that at least tens of millions of vehicles are equipped with REM systems, which can automatically collect data and upload it to the cloud. In some road sections or non-road areas, there are no vehicles equipped with REM systems. If so, it cannot be positioned. It is impossible for a vehicle equipped with a REM system to cover every inch of land around the world. This may involve privacy issues, as well as data copyright issues. Who owns the copyright of these data? Is it the car owner, the car company, the cloud service provider, or Mobileye? This question is difficult to explain. At the same time, REM data must be updated in a timely manner, almost in a quasi-real-time state. At the same time, light has a significant impact on the data. REM must filter out inappropriate data, so maintaining the effectiveness of this map requires a very large amount of data and calculations. , who will maintain this huge computing system? The most fatal point is that REM is based on vision and can only be used when the weather is fine and the light changes are small, which greatly limits its practical range, while lidar can meet 95% of road conditions.

Centimeter-level positioning is one of the difficulties in driverless driving. It is not only the semantic-level positioning of the vehicle itself, but also the absolute coordinate positioning. At present, the highest accuracy of GPS positioning in urban areas is about 10 meters, and in the suburbs, it is about 5 meters. GPS RTK can only be applied in a small area, with limited coverage and even more limited system bandwidth. If you only run a few vehicles, the system may collapse if it runs hundreds of vehicles. The Beidou ground-based system is mainly used for military purposes. The system bandwidth and refresh frequency are limited, and it cannot be used for large-scale commercial and vehicle use. Japan's quasi-zenith satellite can only cover a few areas in eastern China, and it is not a long-term solution.

Of course, it will be difficult for unmanned vehicles to get rid of centimeter-level maps for positioning in the future, but this is only the first positioning before the vehicle is started. After the vehicle is started, SLAM and obstacle recognition using on-board lidar can completely replace high-precision maps for autonomous navigation. . Therefore, in the future, the main function of high-precision maps is positioning rather than navigation. They do not need to be mounted in the car and can be placed in the cloud.

[email protected]

What is the core technology of autonomous driving?

By, CoolMotor

Category

Newest Posts

Take you to understand the internal stru···

77 tips on using cars, save them now!

How to prevent spontaneous combustion of···

Repair of common faults of automobile el···

All Tag

What's the difference between all-weather tires and winter tires?

What is a plug-in hybrid car?

Leave a comment

"Insights Of Exploring Technology"

What is the core technology of autonomous driving?

By, CoolMotor

Tag:

Share On:

Category

Newest Posts

Take you to understand the internal stru···

77 tips on using cars, save them now!

How to prevent spontaneous combustion of···

Repair of common faults of automobile el···

All Tag

What's the difference between all-weather tires and winter tires?

What is a plug-in hybrid car?

Leave a comment