别再混淆了！Pascal VOC、COCO、YOLO格式的bounding box到底差在哪？附Python互转代码

张

张建站

2026/6/25 0:00:14

10分钟阅读

别再混淆了！Pascal VOC、COCO、YOLO格式的bounding box到底差在哪？附Python互转代码

目标检测三大边界框格式全解析从原理到互转实战刚接触目标检测时面对不同数据集的标注格式总让人头疼——Pascal VOC的[x_min, y_min, x_max, y_max]、COCO的[x_min, y_min, width, height]、YOLO的[x_center, y_center, width, height]三种主流格式各有特点。这些格式差异看似简单但在实际项目中混合使用时稍不注意就会导致标注错位、模型训练异常等问题。本文将深入解析这三种边界框格式的设计哲学、应用场景并提供可复用的Python转换代码帮助你在不同框架间无缝切换。1. 边界框格式的本质差异1.1 Pascal VOC对角点绝对坐标Pascal VOC格式采用左上角和右下角的绝对像素坐标表示边界框例如[98, 345, 420, 462]。这种表示法的核心特点是直观性强直接对应图像中的物理位置计算简便框面积(x_max-x_min)×(y_max-y_min)兼容性好被多数传统计算机视觉库支持但它的缺点也很明显对图像尺寸敏感resize操作后所有坐标需要重新计算中心点计算需要额外步骤center_x (x_min x_max)/2# VOC格式转中心点示例 def voc_to_center(voc_box): x_min, y_min, x_max, y_max voc_box center_x (x_min x_max) / 2 center_y (y_min y_max) / 2 width x_max - x_min height y_max - y_min return [center_x, center_y, width, height]1.2 COCO起点宽高的平衡设计COCO数据集采用的[x_min, y_min, width, height]格式在VOC基础上做了优化特性优势劣势保留起点坐标便于可视化锚点中心点仍需计算使用宽高尺寸变化时只需调整width/height边界检查需x_minwidth这种格式特别适合多尺度训练只需等比缩放width/height数据增强旋转、裁剪等变换更容易实现# COCO转VOC格式 def coco_to_voc(coco_box): x_min, y_min, width, height coco_box x_max x_min width y_max y_min height return [x_min, y_min, x_max, y_max]1.3 YOLO归一化中心坐标YOLO格式的[x_center, y_center, width, height]全部为归一化值0-1之间其设计考量包括设备无关性适应不同分辨率的输入训练稳定性归一化后梯度更平稳计算效率减少GPU显存占用注意YOLO格式的归一化是基于图像宽高进行的。假设原始图像640×480边界框[259,403.5,322,117]的转换过程为 x_center 259/640 0.4047y_center 403.5/480 0.8406width 322/640 0.5031height 117/480 0.243752. 格式互转的数学原理2.1 VOC ↔ COCO 转换两种格式的相互转换最为直接VOC转COCOwidth x_max - x_minheight y_max - y_minCOCO转VOCx_max x_min widthy_max y_min height2.2 VOC ↔ YOLO 转换需要考虑归一化和中心点转换def voc_to_yolo(voc_box, img_w, img_h): x_min, y_min, x_max, y_max voc_box # 计算中心点并归一化 x_center ((x_min x_max)/2) / img_w y_center ((y_min y_max)/2) / img_h # 计算宽高并归一化 width (x_max - x_min) / img_w height (y_max - y_min) / img_h return [x_center, y_center, width, height] def yolo_to_voc(yolo_box, img_w, img_h): x_center, y_center, width, height yolo_box # 反归一化 x_center * img_w y_center * img_h width * img_w height * img_h # 计算角点 x_min x_center - width/2 y_min y_center - height/2 x_max x_center width/2 y_max y_center height/2 return [x_min, y_min, x_max, y_max]2.3 COCO ↔ YOLO 转换通过VOC格式中转或直接计算def coco_to_yolo(coco_box, img_w, img_h): x_min, y_min, width, height coco_box # 计算中心点并归一化 x_center (x_min width/2) / img_w y_center (y_min height/2) / img_h # 归一化宽高 width / img_w height / img_h return [x_center, y_center, width, height]3. 实战格式转换完整流程3.1 批量转换工具实现以下代码实现了文件夹内所有标注文件的格式批量转换import os import cv2 import numpy as np def convert_folder(format_from, format_to, img_dir, label_dir, output_dir): 批量转换标注格式 Args: format_from: 原格式 (voc,coco,yolo) format_to: 目标格式 img_dir: 图像文件夹路径 label_dir: 原标注文件夹路径 output_dir: 输出文件夹路径 os.makedirs(output_dir, exist_okTrue) for img_name in os.listdir(img_dir): # 获取对应标注文件 base_name os.path.splitext(img_name)[0] label_path os.path.join(label_dir, f{base_name}.txt) img_path os.path.join(img_dir, img_name) # 读取图像获取尺寸 img cv2.imread(img_path) img_h, img_w img.shape[:2] # 读取并转换标注 with open(label_path) as f: boxes [list(map(float, line.strip().split())) for line in f] converted_boxes [] for box in boxes: if format_from voc and format_to yolo: converted voc_to_yolo(box, img_w, img_h) elif format_from yolo and format_to voc: converted yolo_to_voc(box, img_w, img_h) # 其他转换组合... converted_boxes.append(converted) # 保存转换结果 output_path os.path.join(output_dir, f{base_name}.txt) with open(output_path, w) as f: for box in converted_boxes: f.write( .join(map(str, box)) \n)3.2 转换结果验证转换后必须进行可视化验证def visualize_boxes(img_path, label_path, format_type): img cv2.imread(img_path) h, w img.shape[:2] with open(label_path) as f: boxes [list(map(float, line.strip().split())) for line in f] for box in boxes: if format_type yolo: x1, y1, x2, y2 yolo_to_voc(box, w, h) elif format_type voc: x1, y1, x2, y2 box # 绘制矩形 cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0,255,0), 2) cv2.imshow(Validation, img) cv2.waitKey(0)4. 工程实践中的常见问题4.1 边界框越界处理转换过程中可能出现坐标超出图像范围的情况需要特殊处理def safe_convert(box, img_w, img_h): # 确保坐标在0-img_w和0-img_h范围内 box[0] max(0, min(box[0], img_w)) # x_min box[1] max(0, min(box[1], img_h)) # y_min box[2] max(0, min(box[2], img_w)) # width或x_max box[3] max(0, min(box[3], img_h)) # height或y_max return box4.2 多类别标签处理当标注文件包含类别信息时YOLO格式通常为class x_center y_center width height转换时需要保留类别def convert_with_class(box, img_w, img_h, from_fmt, to_fmt): class_id int(box[0]) coords box[1:] if from_fmt yolo and to_fmt voc: converted yolo_to_voc(coords, img_w, img_h) # 其他转换... return [class_id] converted4.3 性能优化技巧大规模数据转换时可以采用以下优化方法向量化计算使用NumPy批量处理def batch_yolo_to_voc(yolo_boxes, img_w, img_h): boxes np.array(yolo_boxes) centers boxes[:, :2] * [img_w, img_h] sizes boxes[:, 2:] * [img_w, img_h] x_min centers[:, 0] - sizes[:, 0]/2 y_min centers[:, 1] - sizes[:, 1]/2 x_max centers[:, 0] sizes[:, 0]/2 y_max centers[:, 1] sizes[:, 1]/2 return np.column_stack([x_min, y_min, x_max, y_max])并行处理利用multiprocessing加速缓存机制避免重复读取图像尺寸

书匠策AI：学术界的“魔法棒”，期刊论文写作的得力助手

在学术研究的广阔天地里，每一位学者都怀揣着探索未知、贡献智慧的梦想。然而，当面对期刊论文这一学术成果的重要载体时，不少人却感到力不从心，从选题到撰写，再到反复修改，每一步都充满了挑战。别担心&#…...

2026/5/1 15:11:07 阅读更多 →

从ViT的class token到Lora适配器：手把手教你用nn.Parameter为PyTorch模型注入可学习‘外挂’

从ViT的class token到Lora适配器：手把手教你用nn.Parameter为PyTorch模型注入可学习‘外挂’ 在深度学习模型的演进历程中，我们常常会遇到这样的需求：既希望保留预训练模型的核心结构，又需要为其添加特定任务的可学习组件。这种&q…...

2026/5/1 16:22:41 阅读更多 →

别再手动写Java注释了！IntelliJ IDEA 2024.1最新版设置类/方法模板保姆级教程

告别重复劳动：IntelliJ IDEA 2024.1智能注释模板全攻略每次新建Java类或方法时，你是否还在机械地复制粘贴相同的注释模板？作为从业十年的Java老手，我必须告诉你：2024年了，是时候把这些重复劳动交给工具了。…...

2026/5/1 16:22:38 阅读更多 →

终极网盘直链下载指南：八大平台高速下载完全解决方案

终极网盘直链下载指南：八大平台高速下载完全解决方案【免费下载链接】Online-disk-direct-link-download-assistant 一个基于 JavaScript 的网盘文件下载地址获取工具。基于【网盘直链下载助手】修改 ，支持百度网盘 / 阿里云盘 / 中国移动云盘 / 天翼云…...

2026/6/24 22:21:16 阅读更多 →

抖音无水印下载终极指南：专业级开源工具完全解析

抖音无水印下载终极指南：专业级开源工具完全解析【免费下载链接】douyin-downloader A practical Douyin downloader for both single-item and profile batch downloads, with progress display, retries, SQLite deduplication, and browser fallback support. 抖…...

2026/6/24 12:43:56 阅读更多 →

考研英语黄皮书pdf|考研英语黄皮书原文外教朗读|考研英语真题手译本电子版

考研英语黄皮书pdf|考研英语黄皮书原文外教朗读|考研英语真题手译本电子版资料全科都有考研英语黄皮书 PDFhttps://tool.nineya.com/s/1jpq3effr 【英语真题】1. The word "resilient" means（ ） A. able to recover quickly B. very fragile C…...

2026/6/22 16:15:36 阅读更多 →

中兴光猫权限解锁工具：zteOnu完整使用指南与教程

中兴光猫权限解锁工具：zteOnu完整使用指南与教程【免费下载链接】zteOnu A tool that can open ZTE onu device factory mode 项目地址: https://gitcode.com/gh_mirrors/zt/zteOnu 中兴光猫权限解锁工具zteOnu是一款专门用于开启中兴光猫设备工厂模式的强大…...

2026/6/24 12:44:02 阅读更多 →