In this work, we unleash the potential of the powerful monodepth model in camera-LiDAR calibration and propose CLAIM, a novel method of aligning data from the camera and LiDAR. Given the initial guess and pairs of images and LiDAR point clouds, CLAIM utilizes a coarse-to-fine searching method to find the optimal transformation minimizing a patched Pearson correlation-based structure loss and a mutual information-based texture loss. These two losses serve as good metrics for camera-LiDAR alignment results and require no complicated steps of data processing, feature extraction, or feature matching like most methods, rendering our method simple and adaptive to most scenes.
. we can use
to transform a point in LiDAR coordinate to camera coordinate.
- Ubuntu >= 18.04
- CUDA >= 11.2
- GPU with compute capability >= 7.5
git clone https://github.com/Tompson11/claim.git
cd claim
conda create -n claim python=3.10 -y
conda activate claim
cd claim# install torch & torchvision (Notice that the versions of torch and cuda must match, here we take cuda 11.8 as example, )
pip install torch==2.0.0 torchvision==0.15.1 --index-url https://download.pytorch.org/whl/cu118
# install requirements
pip install -r backend/requirements.txt
# install lidar_image_align
pip install -e backend --no-build-isolationwget -O backend/model/Depth_Anything_V2/ckpt/depth_anything_v2_vitl.pth https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth\?download\=truepython backend/api/calibrate_offline.py --config <CONFIG_FILE> --result_name results --vis_projWe implement 4 useful calibration modes, please replace the <CONFIG_FILE> with the corresponding config file in the following table as you like:
| Mode | Comment | Waymo Config | KITTI config |
|---|---|---|---|
| Default | We add 10 |
backend/config/default_waymo.jsonbackend/config/default_watmo_4frames.json (use 4 frames for a single calibration, i.e. CLAIM-4F* in our paper) |
backend/config/default_kitti.jsonbackend/config/default_kitti_4frames.json (use 4 frames for a single calibration, i.e. CLAIM-4F* in our paper) |
| Finetune Both | We add 1 |
backend/config/finetune_both_waymo.json |
backend/config/finetune_both_kitti.json |
| Finetune Rotation | We add 1 |
backend/config/finetune_rotation_waymo.json |
backend/config/finetune_rotation_kitti.json |
| Finetune Translation | We add |
backend/config/finetune_translation_waymo.json |
backend/config/finetune_translation_kitti.json |
The final results will be saved at example_dataset/Waymo/results or example_dataset/KITTI/results with the following directory structure:
results
โโโ analyzed_results.json
โโโ analyzed_results.png
โโโ results.json
โโโ proj
โ โโโ 00000.jpg
โ โโโ 00010.jpg
โ โโโ 00021.jpg
| ...where each file represent:
results.json: results of each calibration. It records the frame_id, initial guess and the final result of each calibration.analyzed_results.json: statistics ofresults.json, including the mean, median, mode and quantile values. Usually, the mean or median value are recommended to be the final calibration result.analyzed_results.png: visualization image ofresults.jsonandanalyzed_results.json. It shows the histogram of each component and the correponding statistics.
proj/XXXXX.jpg: projection image of each calibration result. It shows the LiDAR depth projection (top) and LiDAR intensity projection (bottom) along with the initial guess and final calibration result.
Organize your dataset like the provided example dataset in example_dataset/KITTI and example_dataset/Waymo. Your dataset should follow the directory structure:
YOUR_DATASET/
โโโ img/ #
โ โโโ 00000.jpg
โ โโโ 00001.jpg
| ....
โโโ pcd/
โ โโโ 00000.pcd
โ โโโ 00001.pcd
| .... - Place your images under
img. The image format should be either ".jpg" or ".png". - Place your point clouds under
pcd. The point cloud format should be either ".pcd" or ".ply". - The image and the point cloud with the same name are considered to be a time-synchronized pair and will be used for calibration.
Create your own config file. You can refer to the corresponding example config according to your calibration situation.
Here, we introduce each parameter in detail:
| Parameter | Meaning | |
|---|---|---|
| base_dir | root path of the dataset | |
| frame_nums_per_batch | frame numbers used for a single calibration | |
| overlap_nums_between_batch | overlapped frame numbers between two calibrations. e.g. If there are 5 frames in the dataset, frame_nums_per_batch=3, overlap_nums_between_batch=1, then the first calibration will use the 0, 1, 2 frame and the second one will use 2, 3, 4 frame | |
| data_params | mono_depth_model | model used to estimate mono depth, currently only "depth_anything_v2" is available. |
| half_resolution | whether to resize the original image to the half of its resolution. It is recommended to set "true" if your image size is larger than 1080p to save GPU memory and accelerate the calibration | |
| points_down_sample_step | step to sample the point cloud. A large step is necessary if the point cloud is too large. e.g. you can set points_down_sample_step to 2 if there are 2e6 points | |
| intensity_equalization | whether to perform equalization on the point cloud intensity. "true" is recommended. | |
| gray_image_equalization | whether to perform histogram equalization on the grayscale image. "true" is recommended. | |
| shuffle | whether to shuffle the dataset. This matters if you use multi frames for calibration. | |
| pipeline_params | mode | calibration mode 0: default 1: finetune both 2: finetune rotation 3: finetune translation |
| patch_size | patch size to divide the image for structure loss calculation. Usually the value of patch_size dividing the image width falls within [20, 40] will be fine. | |
| init_rot_range | rotation search range for initial grid search (unit: degree). Set it to the estimated rotation error of the initial guess. (This parameter is useful only when the mode=0.) | |
| init_rot_resolution | rotation search resolution for initial grid search (unit: degree). Set it to a feasible value according to the init_rot_range. (This parameter is useful only when the mode=0.) e.g. init_rot_range=10, init_rot_resolution=1; or init_rot_range=5, init_rot_resolution=0.5 | |
| coarse_trans_range | translation search range for coarse random search (unit: meter). Set it to the estimated translation error of the initial guess. (This parameter is useful only when the mode=0.) | |
| coarse_iters | iterations for coarse random search. (This parameter is useful only when the mode=0.) | |
| search_mode | search mode for finetune. (This parameter is useful when the mode=2,3.) 0: random search 1: grid search | |
| fine_trans_range | translation search range for fine random search (unit: meter). (This parameter is useful when the mode=0,1,3.) | |
| fine_rot_range | rotation search range for fine random search (unit: meter). (This parameter is useful when the mode=0,1.) | |
| fine_iters | iterations for fine random search. (This parameter is useful only when the search_mode=0.) | |
| fine_trans_resolution | translation search range for finetuning grid search (unit: degree). (This parameter is useful only when the mode=3 & search_mode=1.) | |
| fine_rot_resolution | rotation search range for finetuning grid search (unit: degree). (This parameter is useful only when the mode=2 & search_mode=1.) | |
| intrinsics | K | intrinsic value of the camera. The format must be [fx, fy, cx, cy] |
| D | distortion coefficients of the camera. The format must be [k1, k2, p1, p2, k3] for pinhole or [k1, k2, k3, k4] for fisheye | |
| extrinsics | translation | translation of the initial guess (unit: meter). |
| rotation | rotation of the initial guess (unit: meter). It must be quaternion format: [qw, qx, qy, qz] | |
python backend/api/calibrate_offline.py \
--config <CONFIG_FILE> \
--seed <SEED> \
--result_name <RESULT_NAME> \
--vis_projThe meanings of the parameters are:
- config: Config file.
-
seed: Random seed. You can set a non-negative seed (
$0$ ~$2^{32}-1$ ) for reproducible results, or a negative seed for diverse results. Default seed will be used if the parameter is omitted. -
result_name: Name of the result directory. The final result will be saved at
<YOUR_DATASET>/<RESULT_NAME>. The default value is "result". - vis_proj: whether to save the visualization results.
You can find the results at <YOUR_DATASET>/<RESULT_NAME>, where the content of each file can be refered to contents.
# install requirements
pip install -r frontend/claim_frontend/requirements.txtcd frontend/claim_frontend
python3 manage.py runserver 0.0.0.0:8080If you deploy CLAIM on a server without display, you need a local computer to display the user interface. So, set up port forwarding by executing the following command on your local computer, where <HOSTNAME_OF_SERVER> is the hostname of your server.
ssh -N -L 8080:127.0.0.1:8080 <HOSTNAME_OF_SERVER>Open the browser on your local computer and enter http://127.0.0.1:8080/index/, then you will see the following page.

The common usage is:
- step1 : Upload images and point clouds. Note that their numbers must be equal and the correponding pairs must have the identical index. The supported formats for images are "jpg" and "png" and for point clouds are "ply" and "pcd".
- step2 : Set configuration parameters. Fill in the extrinsics (i.e. initial guess), intrinsics and pipeline parameters according to your situation. The meaning of these parameters can be referred to paramters.
- step3 : Click the
Submit and Calibrate!button and wait for the calibration. When the calibration completes, there will be a notice popping at the top-right of the window and the calibrated extrinsics will replace the initial guess. If you want to retrieve your initial guess or see the previous calibration results, you can click theHistory Resultsbutton. - step4 : View the projection results. Click the
Projectbutton to project LiDAR points onto the image with the current extrinsics. Switch thedepth/intensitybuttion to change the color attribute. Also, you can see the colored point cloud in the black windows. - step5 : Export the results. Click the
Exportbutton to download the current extrinsics as a json file.
Some tips:
- Try Examples : Click the
Try Examples!button and select the provided KITTI/Waymo frames. The extrinsics and intrinsics will be filled with the ground truth and you can add some perturbation on the extrinsics to test. - Tune Extrinsics : You can also use our user interface as a platform for manual calibration by repeatedly tuning the extrinsics and viewing the projection results. You can zoom in/out with the mousewheel when the mouse hovers on the projection picture. You can also drag the pointcloud for better visualization.
If you find CLAIM useful in your research or projects, please cite our work:
@INPROCEEDINGS{11247484,
author={Zhang, Zhuo and Liu, Yonghui and Zhang, Meijie and Tan, Feiyang and Ding, Yikang},
booktitle={2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
title={CLAIM: Camera-LiDAR Alignment with Intensity and Monodepth},
year={2025},
volume={},
number={},
pages={17921-17926},
doi={10.1109/IROS60139.2025.11247484}}- CLAIM draws inspirations from SparseGS and direct_visual_lidar_calibration.
- CLAIM uses DepthAnything-V2 for its excellent performance.
- CLAIM uses these great public datasets: KITTI, Waymo and MIAS-LECE.