从Labelme到DOTA：手把手教你搞定遥感图像旋转框标注与mmdetection训练

张

张建站

2026/7/23 9:24:49

10分钟阅读

从Labelme到DOTA：手把手教你搞定遥感图像旋转框标注与mmdetection训练

从Labelme到DOTA遥感图像旋转框标注与mmdetection实战指南遥感图像中的目标检测一直是计算机视觉领域的重要研究方向。与常规水平框检测不同旋转框检测OBB能更精确地定位和识别具有任意方向的目标如建筑物、车辆或飞机。本文将深入探讨如何将Labelme标注的遥感图像转换为DOTA格式并利用mmdetection框架进行高效训练。1. 遥感图像旋转框标注基础旋转框检测的核心在于用旋转矩形框Rotated Bounding Box精确描述目标的空间位置和方向。与水平框相比旋转框能减少背景干扰提升检测精度特别适合遥感图像中密集排列或任意朝向的目标。常见旋转框表示方法五点法(x,y,w,h,θ) - 中心点坐标、宽高和旋转角度八点法(x1,y1,x2,y2,x3,y3,x4,y4) - 四个顶点坐标DOTA格式采用八点表示法要求顶点按顺时针顺序排列# DOTA格式示例 imagesource:GoogleEarth gsd:0.146343 airplane 0 0 50 0 50 50 0 50 1 注意DOTA格式要求第一个顶点(x1,y1)位于旋转框的头部通常选择目标的主方向作为起始点2. Labelme到DOTA的格式转换实战Labelme是常用的图像标注工具但其输出的JSON格式与DOTA不兼容。我们需要编写转换脚本处理多边形标注到旋转框的转换。转换流程关键步骤多边形凸包计算使用OpenCV的convexHull处理复杂多边形最小面积旋转矩形拟合cv2.minAreaRect获取旋转矩形参数顶点排序与格式调整确保顶点符合DOTA的顺时针顺序要求import json import cv2 import numpy as np def labelme_to_dota(labelme_json, output_txt): with open(labelme_json) as f: data json.load(f) shapes data[shapes] with open(output_txt, w) as f: f.write(imagesource:Labelme\n) f.write(gsd:None\n) for shape in shapes: points np.array(shape[points]) # 计算凸包 hull cv2.convexHull(points) # 获取最小旋转矩形 rect cv2.minAreaRect(hull) box cv2.boxPoints(rect) # 排序顶点为顺时针 box sort_points_clockwise(box) # 写入DOTA格式 line f{shape[label]} {box[0][0]} {box[0][1]} {box[1][0]} {box[1][1]} line f{box[2][0]} {box[2][1]} {box[3][0]} {box[3][1]} 0\n f.write(line) def sort_points_clockwise(points): # 实现顶点顺时针排序逻辑 ...提示实际应用中还需处理多边形的凹性、目标遮挡等复杂情况可能需要结合业务逻辑调整转换算法3. mmdetection旋转框检测配置详解mmdetection作为强大的目标检测框架支持多种旋转框检测模型。下面以S2ANet为例介绍关键配置数据集配置示例dataset_type DOTADataset data_root data/dota/ train_pipeline [ dict(typeLoadImageFromFile), dict(typeLoadAnnotations, with_bboxTrue), dict(typeRResize, img_scale(1024, 1024)), dict(typeRRandomFlip, flip_ratio0.5), dict(typeNormalize, mean[123.675, 116.28, 103.53], std[58.395, 57.12, 57.375]), dict(typePad, size_divisor32), dict(typeDefaultFormatBundle), dict(typeCollect, keys[img, gt_bboxes, gt_labels]) ] data dict( samples_per_gpu2, workers_per_gpu2, traindict( typedataset_type, ann_filedata_root train/labelTxt/, img_prefixdata_root train/images/, pipelinetrain_pipeline), valdict( typedataset_type, ann_filedata_root val/labelTxt/, img_prefixdata_root val/images/, pipelinetest_pipeline), testdict( typedataset_type, ann_filedata_root test/labelTxt/, img_prefixdata_root test/images/, pipelinetest_pipeline))模型配置关键参数model dict( typeS2ANet, backbonedict( typeResNet, depth50, num_stages4, out_indices(0, 1, 2, 3), frozen_stages1, norm_cfgdict(typeBN, requires_gradTrue), norm_evalTrue, stylepytorch), neckdict( typeFPN, in_channels[256, 512, 1024, 2048], out_channels256, num_outs5), bbox_headdict( typeS2ANetHead, num_classes15, # 根据实际类别数调整 in_channels256, feat_channels256, stacked_convs2, with_orconvTrue, anchor_ratios[1.0], anchor_strides[8, 16, 32, 64, 128], anchor_scales[4], target_means[.0, .0, .0, .0, .0], target_stds[1.0, 1.0, 1.0, 1.0, 1.0], loss_fam_clsdict( typeFocalLoss, use_sigmoidTrue, gamma2.0, alpha0.25, loss_weight1.0), loss_fam_bboxdict( typeSmoothL1Loss, beta1.0 / 9.0, loss_weight1.0), loss_odm_clsdict( typeFocalLoss, use_sigmoidTrue, gamma2.0, alpha0.25, loss_weight1.0), loss_odm_bboxdict( typeSmoothL1Loss, beta1.0 / 9.0, loss_weight1.0)))4. 大尺寸遥感图像处理策略4096×4096等高分辨率遥感图像直接输入网络会带来显存和计算量问题。常用处理方案方法优点缺点适用场景直接下采样实现简单小目标信息丢失目标尺寸较大且均匀滑动窗口裁剪保留细节边缘目标处理复杂高精度要求场景图像金字塔多尺度检测计算量大目标尺寸差异大自适应裁剪平衡精度效率实现复杂通用场景推荐滑动窗口实现代码import cv2 import numpy as np def sliding_window(image, window_size, stride): height, width image.shape[:2] windows [] for y in range(0, height - window_size[1] 1, stride): for x in range(0, width - window_size[0] 1, stride): window image[y:ywindow_size[1], x:xwindow_size[0]] windows.append({ window: window, x: x, y: y }) # 处理边缘未覆盖区域 if (height - window_size[1]) % stride ! 0: for x in range(0, width - window_size[0] 1, stride): window image[-window_size[1]:, x:xwindow_size[0]] windows.append({ window: window, x: x, y: height - window_size[1] }) if (width - window_size[0]) % stride ! 0: for y in range(0, height - window_size[1] 1, stride): window image[y:ywindow_size[1], -window_size[0]:] windows.append({ window: window, x: width - window_size[0], y: y }) return windows提示实际应用中建议设置50%重叠区域并使用NMS后处理合并重复检测结果5. 模型部署与Docker优化将训练好的旋转框检测模型部署到生产环境时Docker能有效解决环境依赖问题。以下是关键注意事项Dockerfile优化要点使用轻量级基础镜像如python:3.8-slim多阶段构建减少最终镜像体积合理利用层缓存加速构建# 构建阶段 FROM nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04 as builder WORKDIR /install RUN apt-get update apt-get install -y --no-install-recommends \ build-essential \ python3-dev \ python3-pip COPY requirements.txt . RUN pip install --prefix/install -r requirements.txt # 运行阶段 FROM nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04 WORKDIR /app COPY --frombuilder /install /usr/local COPY . . # 预编译模型加速推理 RUN python -c import torch; modeltorch.load(model.pth) CMD [python, inference.py]性能优化技巧使用TensorRT加速推理开启CUDA Graph减少内核启动开销批处理预测请求提高GPU利用率使用半精度(FP16)计算import torch from torch2trt import torch2trt # 转换模型为TensorRT格式 model init_detector(config, checkpoint) x torch.ones((1, 3, 1024, 1024)).cuda() model_trt torch2trt(model, [x], fp16_modeTrue) # 保存优化后模型 torch.save(model_trt.state_dict(), model_trt.pth)在实际项目中我们发现使用DOTA_devkit中的评估工具时需要特别注意测试集name_list的生成格式必须与评估脚本严格匹配。一个常见的错误是忘记去除文件扩展名导致评估失败。解决方案是统一使用os.path.splitext处理文件名import os def generate_namelist(img_dir, output_file): with open(output_file, w) as f: for filename in os.listdir(img_dir): if filename.endswith((.jpg, .png, .bmp)): name os.path.splitext(filename)[0] f.write(name \n)

Pixel Language Portal 交互式学习环境：Jupyter Notebook 实战案例集

Pixel Language Portal 交互式学习环境：Jupyter Notebook 实战案例集 1. 为什么选择Jupyter Notebook进行交互式学习 Jupyter Notebook已经成为数据科学和机器学习领域的事实标准工具。它提供了一个直观的交互式环境，特别适合探索性学习和快速原型开发…...

2026/6/29 1:18:36 阅读更多 →