Wei Zhang · Qing Cheng · David Skuddis · Niclas Zeller · Daniel Cremers · Norbert Haala
HI-SLAM2 constructs a 3DGS map (a) from monocular input, achieving accurate mesh reconstructions (b) and high-quality renderings (c). It surpasses existing monocular SLAM methods in both geometric accuracy and rendering quality while achieving faster runtime.
Table of Contents
- Clone the repo with submodules
git clone --recursive https://github.com/Willyzw/HI-SLAM2
- Create a new Conda environment and then activate it. Please note that we use the PyTorch version compiled by CUDA 11.8 in the
environment.yaml
file.
conda env create -f environment.yaml
conda activate hislam2
- Compile the CUDA kernel extensions (takes about 10 minutes). Please note that this process assume you have CUDA 11 installed, not 12. To look into the installed CUDA version, you can run
nvcc --version
in the terminal.
python setup.py install
- Download the pretrained weights of Omnidata models for generating depth and normal priors
wget https://zenodo.org/records/10447888/files/omnidata_dpt_normal_v2.ckpt -P pretrained_models
wget https://zenodo.org/records/10447888/files/omnidata_dpt_depth_v2.ckpt -P pretrained_models
Download and prepare the Replica dataset by running
bash scripts/download_replica.sh
python scripts/preprocess_replica.py
where the data is converted to the expected format and put to data/Replica
folder.
Please follow the instructions in ScanNet to download the data and put the extracted color/pose/intrinsic from the .sens files to data/ScanNet
folder as following:
[Folder structure (click to expand)]
scene0000_00
├── color
│ ├── 000000.jpg
│ └── ...
├── intrinsic
│ └── intrinsic_color.txt
└── pose
│ ├── 000000.txt
│ └── ...
Then run the following script to convert the data to the expected input format
python scripts/preprocess_scannet.py
We take the following sequences for evaluation: scene0000_00
, scene0054_00
, scene0059_00
, scene0106_00
, scene0169_00
, scene0181_00
, scene0207_00
, scene0233_00
.
After preparing the Replica dataset, you can run HI-SLAM2 for a demo. It takes about 2 minutes to run the demo on an Nvidia RTX 4090 GPU. The result will be saved in the outputs/room0
folder including the estimated camera poses, the Gaussian map, and the renderings. To visualize the constructing process of the Gaussian map, using the --gsvis
flag. To visualize the intermediate results e.g. estimated depth and point cloud, using the --droidvis
flag.
python demo.py \
--imagedir data/Replica/room0/colors \
--calib calib/replica.txt \
--config config/replica_config.yaml \
--output outputs/room0 \
[--gsvis] # Optional: Enable Gaussian map display
[--droidvis] # Optional: Enable point cloud display
To generate the TSDF mesh from the reconstructed Gaussian map, you can run
python tsdf_integrate.py --result outputs/room0 --voxel_size 0.01 --weight 2
Run the following script to automate the evaluation process on all sequences of the Replica dataset. It will evaluate the tracking error, rendering quality, and reconstruction accuracy.
python scripts/run_replica.py
Run the following script to automate the evaluation process on the selected 8 sequences of the ScanNet dataset. It will evaluate the tracking error and rendering quality.
python scripts/run_scannet.py
HI-SLAM2 supports casual video recordings from smartphone or camera (demo above with iPhone 15). To use your own video data, we provide a preprocessing script that extracts individual frames from your video and runs COLMAP to automatically estimate camera intrinsics. Run the preprocessing with:
python scripts/preprocess_owndata.py PATH_TO_YOUR_VIDEO PATH_TO_OUTPUT_DIR
once the intrinsics are obtained, you can run HI-SLAM2 by using the following command:
python demo.py \
--imagedir PATH_TO_OUTPUT_DIR/images \
--calib PATH_TO_OUTPUT_DIR/calib.txt \
--config config/owndata_config.yaml \
--output outputs/owndata \
--undistort --droidvis --gsvis
there are some other command line arguments you can use:
--undistort
undistort the image if distortion parameters are provided in the calib file--droidvis
visualize the point cloud map and the intermediate results--gsvis
visualize the Gaussian map--buffer
max number of keyframes to pre-allocate memory for (default: 10% of total frames). Increase this if you encounter the error:IndexError: index X is out of bounds for dimension 0 with size X
.--start
start frame index (default: from the first frame)--length
number of frames to process (default: all frames)
We build this project based on DROID-SLAM, MonoGS, RaDe-GS and 3DGS. The reconstruction evaluation is based on evaluate_3d_reconstruction_lib. We thank the authors for their great works and hope this open-source code can be useful for your research.
Our paper is available on arXiv. If you find this code useful in your research, please cite our paper.
@article{zhang2024hi2,
title={HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction},
author={Zhang, Wei and Cheng, Qing and Skuddis, David and Zeller, Niclas and Cremers, Daniel and Haala, Norbert},
journal={arXiv preprint arXiv:2411.17982},
year={2024}
}