Optimized Multi-Vehicle Perception with Accuracy, Speed, and Scalability

Location

Golisano Hall (GOL/070) - Atrium 1940

Autonomous Vehicles (AVs) rely on onboard sensors like LiDAR, RADAR, and cameras, but these have limitations such as restricted sensing range (≈80-90m) and line-of-sight constraints, especially problematic at intersections with high vehicle and pedestrian density. Nearly half of all traffic-related injuries in the U.S. occur at intersections (FHWA). Cooperative perception, where AVs share sensor data with other AVs or roadside units (RSUs), can mitigate these challenges. However, ensuring high accuracy, low latency, and scalability remains a challenge. Existing methods rely on indirect alignment, where AVs use GPS and 3D maps to transform sensor data into a common frame. While efficient, indirect alignment suffers from GPS inaccuracies and map imperfections, reducing sensor fusion accuracy. Direct alignment, where sensor data (e.g., LiDAR point clouds) are matched directly to compute transformations, improves accuracy but is computationally expensive and does not scale well with multiple AVs. Another major challenge is excessive network bandwidth consumption due to large-scale data exchange. Current methods attempt to optimize data sharing but face redundancy, inaccurate predictions, increased latency, and limitations in handling partially visible regions. To address these, our approach introduces a fast, point-density-based occlusion estimation strategy that is more bandwidth-efficient. This allows vehicles to request objects in low-density areas of their point clouds, overcoming limitations of prior region-based approaches, which classify point clouds into occupied and occluded regions. Challenges 1. Scalability Issues in Direct Alignment Each AV aligning with multiple others is computationally impractical (≈20-25ms per alignment), especially in dense traffic. 2. High Network Bandwidth Consumption Transmitting entire point clouds quickly saturates bandwidth, increasing delays. 3. Latency vs. Accuracy Trade-off Indirect alignment is faster but less accurate, while direct alignment is precise but slower, making real-time deployment difficult. 4. Computational Overhead in Occlusion Estimation Traditional methods classify only occupied/occluded areas, whereas point-density-based estimation is more precise but computationally expensive (80K-120K points per frame). Contributions 1. Anchor-Based Direct Alignment Vehicles align to a common anchor (moving AV or RSU), improving scalability. 2. CUDA-Based Overlap-Aware Direct Alignment Focuses on overlapping point cloud regions, reducing computational load. 3. GPU-Accelerated Occlusion Estimation Offloads computation to the GPU, estimating occlusions in milliseconds. 4. Optimized Network Communication Reduces bandwidth usage and latency, making cooperative perception viable in real-world deployments. Through extensive experiments, the proposed Optimized Multi-Vehicle Perception with Accuracy, Speed, and Scalability (ASS) system achieves 5 cm fusion accuracy in under 15 ms, significantly outperforming state-of-the-art methods in both accuracy and latency.