SparseDrive运行记录
文章目录一.安装环境二.运行一.安装环境nvcc-V conda create-n sparsedrive python3.8-y conda activate sparsedrive pip install torch2.0.0torchvision0.15.1torchaudio2.0.1--index-url https://download.pytorch.org/whl/cu118 必须cuda11.6及以上我用的cuda11.8pip install mmcv-full1.7.2-f https://download.openmmlab.com/mmcv/dist/cu118/torch2.0/index.htmlpip install-r requirement.txt or服务器的是cuda12.X conda create-n sparsedrive python3.10-y conda activate sparsedrive pip install torch2.1.0torchvision0.16.0torchaudio2.1.0--index-url https://download.pytorch.org/whl/cu121pip install mmcv-full1.7.2-f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html --only-binary mmcv-fullpip install-r requirement.txt个别库版本升级了 numpy1.23.5mmcv_full1.7.2mmdet2.28.2urllib31.26.16pyquaternion0.9.9nuscenes-devkit1.1.10yapf0.33.0tensorboard2.14.0motmetrics1.1.3pandas1.5.3flash-attn2.3.3opencv-python4.8.1.78prettytable3.7.0scikit-learn1.3.0 注flash-attn2.3.3建议本地下载安装 wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3cu122torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whlpip install flash_attn-2.3.3cu122torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl--no-build-isolation python-cimport flash_attn; print(安装成功)二.运行1、编译可变形聚合CUDA算子代码如下示例cd projects/mmdet3d_plugin/ops export CUDA_HOME/usr/local/cuda python3 setup.py develop 出现Installed/home/1/SparseDrive-main/projects/mmdet3d_plugin/ops Processing dependenciesfordeformable-aggregation-ext0.0.0Finished processing dependenciesfordeformable-aggregation-ext0.0.0则安装成功 cd../../../2、配置nuscenes数据集在主目录下创建data文件夹在data创建nuscenes文件夹可以从 https://www.nuscenes.org/nuscenes 下载数据集放置下载的数据实例只下载了v1.0-mini数据集注意can_bus和maps需要下载全的不仅仅只有mini内的数据对nuscenes中的v1.0-mini复制一份并改名为v1.0-trainval。注释另感兴趣需要OCC数据对于Occupancy预测任务还需要下载OCC的GT可以从 https://github.com/CVPR2023-3D-Occupancy-Prediction/CVPR2023-3D-Occupancy-Prediction 下载数据集。 这里建议将下载好的数据集解压到nuscense数据集文件夹中的 occ3d/gts 下本文不涉及OCC。maps文件内容3、创建数据sh scripts/create_data.sh在data/infos/文件夹下生成mini文件夹文件夹内生成以下两个pkl文件4、通过K-means生成锚点采用的是mini所以修改/tools/kmeans/的代码路径#fpdata/infos/nuscenes_infos_train.pklfpdata/infos/mini/nuscenes_infos_train.pkl再运行sh scripts/kmeans.sh采用mini可正常生成det、map和motion.npy但kmeans_plan_6.npy报错对tools/kmeans/kmeans_plan.py代码进行修改# 原始的all数据集#clusters[]#fortrajs in navi_trajs:#trajsnp.concatenate(trajs,axis0).reshape(-1,12)#clusterKMeans(n_clustersK).fit(trajs).cluster_centers_#clustercluster.reshape(-1,6,2)#clusters.append(cluster)#forj inrange(K):#plt.scatter(cluster[j,:,0],cluster[j,:,1])#plt.savefig(fvis/kmeans/plan_{K},bbox_inchestight)#plt.close()#clustersnp.stack(clusters,axis0)#np.save(fdata/kmeans/kmeans_plan_{K}.npy,clusters)# 修改代码适配mini数据集 clusters[]clusters.append(np.zeros((6,6,2)))fortrajs in navi_trajs[1:]:#fortrajs in navi_trajs:trajsnp.concatenate(trajs,axis0).reshape(-1,12)clusterKMeans(n_clustersK).fit(trajs).cluster_centers_ clustercluster.reshape(-1,6,2)clusters.append(cluster)forj inrange(K):plt.scatter(cluster[j,:,0],cluster[j,:,1])plt.savefig(fvis/kmeans/plan_{K},bbox_inchestight)plt.close()clustersnp.stack(clusters,axis0)np.save(fdata/kmeans/kmeans_plan_{K}.npy,clusters)5、下载预训练权重mkdir ckpt wget https://download.pytorch.org/models/resnet50-19c8e357.pth -O ckpt/resnet50-19c8e357.pth6、训练和测试硬件版本太低无法训练#trainsh scripts/train.sh 报错如下 warnings.warn(usage:train.py[-h][--work-dir WORK_DIR][--resume-from RESUME_FROM][--no-validate][--gpus GPUS|--gpu-ids GPU_IDS[GPU_IDS...]][--seed SEED][--deterministic][--options OPTIONS[OPTIONS...]][--cfg-options CFG_OPTIONS[CFG_OPTIONS...]][--dist-url DIST_URL][--gpus-per-machine GPUS_PER_MACHINE][--launcher{none,pytorch,slurm,mpi,mpi_nccl}][--local_rank LOCAL_RANK][--autoscale-lr]config train.py:error:unrecognized arguments:--local-rank0ERROR:torch.distributed.elastic.multiprocessing.api:failed(exitcode:2)local_rank:0(pid:15735)of binary:/home/1/anaconda3/envs/sparsedrive/bin/python3Traceback(most recent call last):File/home/1/anaconda3/envs/sparsedrive/lib/python3.8/runpy.py,line194,in _run_module_as_mainreturn_run_code(code,main_globals,None,File/home/1/anaconda3/envs/sparsedrive/lib/python3.8/runpy.py,line87,in _run_codeexec(code,run_globals)File/home/1/anaconda3/envs/sparsedrive/lib/python3.8/site-packages/torch/distributed/launch.py,line196,inmodulemain()File/home/1/anaconda3/envs/sparsedrive/lib/python3.8/site-packages/torch/distributed/launch.py,line192,in mainlaunch(args)File/home/1/anaconda3/envs/sparsedrive/lib/python3.8/site-packages/torch/distributed/launch.py,line177,in launchrun(args)File/home/1/anaconda3/envs/sparsedrive/lib/python3.8/site-packages/torch/distributed/run.py,line785,in runelastic_launch(File/home/1/anaconda3/envs/sparsedrive/lib/python3.8/site-packages/torch/distributed/launcher/api.py,line134,in __call__returnlaunch_agent(self._config,self._entrypoint,list(args))File/home/1/anaconda3/envs/sparsedrive/lib/python3.8/site-packages/torch/distributed/launcher/api.py,line250,in launch_agent raiseChildFailedError(torch.distributed.elastic.multiprocessing.errors.ChildFailedError:这是因为torch2.01.8及以下不影响不再使用torch.distributed.launch 启动导致自动传入了 --local-rank 参数而 train.py 无法解析。修改scripts/train.sh内容启动方式为#!/bin/bash # 这是适配你需求的 stage1 训练脚本 # 核心禁用分布式用普通 python 命令启动避免 local-rank 参数 export PYTHONPATH$(dirname $0)/..:$PYTHONPATH # 定义训练配置和参数 CONFIG_FILEprojects/configs/sparsedrive_small_stage1.pyGPU_NUM1# 单卡训练 DETERMINISTIC--deterministicWORK_DIR./work_dirs/sparsedrive_stage1# 可选指定日志/模型保存目录 # 关键修改用普通 python 启动而非 dist_train.sh避免分布式参数 python tools/train.py \ ${CONFIG_FILE}\--gpus ${GPU_NUM}\ ${DETERMINISTIC}\--launcher none \--work-dir ${WORK_DIR}# 可选根据需要添加 # 定义训练配置和参数 CONFIG_FILE2projects/configs/sparsedrive_small_stage2.pyGPU_NUM21# 单卡训练 DETERMINISTIC2--deterministicWORK_DIR2./work_dirs/sparsedrive_stage2# 可选指定日志/模型保存目录 # 关键修改用普通 python 启动而非 dist_train.sh避免分布式参数 python tools/train.py \ ${CONFIG_FILE2}\--gpus ${GPU_NUM2}\ ${DETERMINISTIC2}\--launcher none \--work-dir ${WORK_DIR2}# 可选根据需要添加#testsh scripts/test.sh