保姆级教程：用Python脚本将ICDAR2015文本定位数据集转成COCO格式（附完整代码）

张

张建站

2026/5/2 2:16:24

10分钟阅读

保姆级教程：用Python脚本将ICDAR2015文本定位数据集转成COCO格式（附完整代码）

从ICDAR2015到COCO文本定位数据集格式转换实战指南在计算机视觉领域文本检测任务一直是研究热点之一。ICDAR2015作为场景文本检测的经典基准数据集其提供的标注格式与当前主流检测框架如MMDetection、Detectron2等常用的COCO格式存在显著差异。本文将手把手教你如何用Python脚本实现这两种格式的无缝转换并提供可直接集成到生产环境的完整代码解决方案。1. 理解数据集格式差异1.1 ICDAR2015原始标注解析ICDAR2015数据集采用简单的文本文件(.txt)存储标注信息每行对应一个文本实例的标注格式如下x1,y1,x2,y2,x3,y3,x4,y4,transcription典型示例377,117,463,117,465,130,378,130,Genaxis Theatre 493,115,519,115,519,131,493,131,[06] 374,155,409,155,409,170,374,170,###关键特征前8个数字表示四边形文本框的四个顶点坐标顺时针或逆时针顺序最后一个字段是文本内容###表示模糊或不可读文本应忽略每个图像对应一个同名的.txt标注文件1.2 COCO格式规范详解COCO格式采用JSON结构组织数据主要包含三个核心部分{ images: [ { file_name: img_1.jpg, height: 720, width: 1280, id: 0 } ], categories: [ {id: 1, name: text} ], annotations: [ { iscrowd: 0, category_id: 1, bbox: [216, 222, 93, 37], area: 2054.0, segmentation: [[217,235,304,222,309,243,216,259]], image_id: 0, id: 0 } ] }关键字段对比属性ICDAR2015COCO格式几何表示四边形顶点坐标多边形segmentationbbox忽略标注###文本内容iscrowd1字段图像信息单独文件存储集成在JSON中的images数组类别信息隐含均为文本显式categories定义2. 转换流程设计与核心代码2.1 整体转换流程遍历数据集目录收集所有图像和对应的标注文件路径解析原始标注读取.txt文件提取四边形坐标和文本内容几何转换将四边形转换为COCO需要的多边形表示字段映射将ICDAR字段转换为COCO对应字段JSON序列化按照COCO格式组织数据并输出2.2 关键代码实现2.2.1 数据结构定义首先定义转换过程中需要使用的数据结构from shapely.geometry import Polygon import numpy as np import mmcv import os.path as osp class ICDAR15ToCOCOConverter: def __init__(self, dataset_root, output_dir): self.dataset_root dataset_root self.output_dir output_dir self.categories [{id: 1, name: text}]2.2.2 标注解析核心方法def parse_icdar_annotation(self, gt_file): 解析ICDAR2015标注文件 with open(gt_file, r, encodingutf-8-sig) as f: lines f.readlines() annotations [] for line in lines: parts line.strip().split(,) if len(parts) 8: continue # 提取坐标和文本内容 coords list(map(int, parts[:8])) text ,.join(parts[8:]) if len(parts) 8 else # 创建多边形几何对象 polygon Polygon(np.array(coords).reshape(-1, 2)) # 构建COCO格式标注 annotation { iscrowd: 1 if text ### else 0, category_id: 1, bbox: self._get_bbox_from_polygon(polygon), area: polygon.area, segmentation: [coords], image_id: None, # 将在后续填充 id: None # 将在后续填充 } annotations.append(annotation) return annotations def _get_bbox_from_polygon(self, polygon): 从多边形获取COCO格式的bbox[x,y,width,height] min_x, min_y, max_x, max_y polygon.bounds return [min_x, min_y, max_x - min_x, max_y - min_y]2.2.3 图像信息处理def process_image(self, img_path, gt_path): 处理单张图像及其标注 img mmcv.imread(img_path) img_info { file_name: osp.relpath(img_path, self.dataset_root), height: img.shape[0], width: img.shape[1], id: None # 将在后续填充 } annotations self.parse_icdar_annotation(gt_path) return img_info, annotations3. 完整转换脚本实现3.1 脚本组织结构建议按以下目录结构组织代码icdar15_to_coco/ ├── converter.py # 主转换脚本 ├── utils.py # 工具函数 ├── requirements.txt # 依赖项 └── configs/ # 配置文件目录3.2 主转换脚本import os import os.path as osp from tqdm import tqdm import json from concurrent.futures import ThreadPoolExecutor class ICDAR15ToCOCOConverter: # ... (之前定义的方法) def convert(self, splits[training, test]): 执行完整转换流程 coco_data { images: [], annotations: [], categories: self.categories } annotation_id 0 for split in splits: img_dir osp.join(self.dataset_root, imgs, split) gt_dir osp.join(self.dataset_root, annotations, split) img_files [f for f in os.listdir(img_dir) if f.endswith(.jpg)] with ThreadPoolExecutor(max_workers8) as executor: futures [] for img_file in img_files: img_path osp.join(img_dir, img_file) gt_file osp.join(gt_dir, fgt_{osp.splitext(img_file)[0]}.txt) futures.append(executor.submit(self.process_image, img_path, gt_file)) for future in tqdm(futures, descfProcessing {split} set): img_info, annotations future.result() img_id len(coco_data[images]) img_info[id] img_id coco_data[images].append(img_info) for ann in annotations: ann[id] annotation_id ann[image_id] img_id coco_data[annotations].append(ann) annotation_id 1 output_path osp.join(self.output_dir, icdar2015_coco_format.json) with open(output_path, w) as f: json.dump(coco_data, f, indent2) return output_path3.3 使用示例if __name__ __main__: converter ICDAR15ToCOCOConverter( dataset_root/path/to/icdar2015, output_dir/path/to/output ) output_json converter.convert() print(f转换完成结果保存至: {output_json})4. 高级技巧与优化建议4.1 性能优化策略并行处理使用ThreadPoolExecutor加速图像处理with ThreadPoolExecutor(max_workers8) as executor: results list(tqdm(executor.map(process_func, file_pairs), totallen(file_pairs)))内存优化对于大型数据集可分批次处理并增量写入文件缓存机制对已处理的图像添加校验机制避免重复处理4.2 质量检查方法转换完成后建议进行以下验证def validate_coco_json(json_path): 验证生成的COCO格式文件 with open(json_path) as f: data json.load(f) # 基础结构检查 assert all(k in data for k in [images, annotations, categories]) # 图像与标注关联检查 image_ids {img[id] for img in data[images]} for ann in data[annotations]: assert ann[image_id] in image_ids # 几何有效性检查 for ann in data[annotations]: assert len(ann[segmentation][0]) 8 # 至少4个点 assert ann[area] 0 print(验证通过COCO格式文件有效)4.3 可视化对比使用以下代码对比原始标注和转换后标注import cv2 import matplotlib.pyplot as plt def visualize_comparison(img_path, icdar_gt_path, coco_annotations): 可视化对比原始和转换后的标注 img cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB) # 绘制ICDAR原始标注 with open(icdar_gt_path) as f: for line in f: coords list(map(int, line.strip().split(,)[:8])) pts np.array(coords).reshape(-1, 2) cv2.polylines(img, [pts], isClosedTrue, color(255,0,0), thickness2) # 绘制COCO标注 for ann in coco_annotations: seg ann[segmentation][0] pts np.array(seg).reshape(-1, 2).astype(int) cv2.polylines(img, [pts], isClosedTrue, color(0,255,0), thickness2) bbox list(map(int, ann[bbox])) cv2.rectangle(img, (bbox[0], bbox[1]), (bbox[0]bbox[2], bbox[1]bbox[3]), (0,0,255), 1) plt.figure(figsize(12, 8)) plt.imshow(img) plt.title(Red: ICDAR | Green: COCO | Blue: COCO bbox) plt.axis(off) plt.show()5. 实际应用集成5.1 与MMDetection集成转换后的数据集可直接用于MMDetection训练# configs/textdet/custom_config.py dataset_type CocoDataset data_root data/icdar2015/ train dict( typedataset_type, ann_filedata_root icdar2015_coco_format.json, img_prefixdata_root imgs/training/, pipelinetrain_pipeline )5.2 常见问题解决问题1坐标越界错误解决方案在转换前添加边界检查coords [max(0, min(coord, img_width if i%20 else img_height)) for i, coord in enumerate(coords)]问题2无效多边形解决方案使用Shapely进行几何验证from shapely.validation import make_valid polygon Polygon(coords) if not polygon.is_valid: polygon make_valid(polygon)问题3文本方向不一致解决方案统一顶点顺序from scipy.spatial import ConvexHull def sort_vertices(pts): hull ConvexHull(pts) return pts[hull.vertices]

用Python+OpenCV搞定机械臂手眼标定（眼在手上），附完整代码与实测数据

PythonOpenCV实现机械臂手眼标定（眼在手上）实战指南机械臂视觉引导系统中，手眼标定是连接视觉感知与运动控制的核心技术。当相机安装在机械臂末端时，如何精确计算相机坐标系与机械臂末端坐标系的空间关系，直接决定了…...

2026/5/2 2:14:44 阅读更多 →

终端AI助手termGPT：命令行集成大模型与自动化实战

1. 项目概述：在终端里装一个AI助手作为一个常年泡在终端里的开发者，我一直在寻找一个能无缝融入命令行工作流的AI工具。我不想在浏览器和终端之间来回切换，也不想复制粘贴一堆命令。我需要一个能直接在终端里对话、甚至能帮我执行命令的“副…...

2026/5/2 2:14:29 阅读更多 →

小白也能学会的 OpenClaw 本地 AI 部署全流程（包含新版安装包）

OpenClaw（小龙虾）是一款可通过自然语言操控电脑的开源 AI 智能体，支持文件管理、办公自动化、数据处理、浏览器操作等场景，全程可视化部署，无需命令行与复杂环境配置。一、OpenClaw 核心优势本地运行，数…...

2026/5/2 2:14:27 阅读更多 →

如何理解临键锁Next-Key Lock_行锁与间隙锁的组合原理解析

临键锁锁定的是左开右闭区间，如对索引值20加锁即锁住(10,20]，包含记录20及前一索引间隙；仅作用于被扫描的索引范围，且在REPEATABLE READ下启用。临键锁到底锁了哪块数据？临键锁不是新锁类型，而是 Record Lo…...

2026/5/1 20:02:40 阅读更多 →

CUDA 13.3 RTX 4090实测报告：FP16混合精度算子性能断层分析（含37个主流PyTorch算子汇编级差异对比）

更多请点击： https://intelliparadigm.com 第一章：CUDA 13.3 RTX 4090混合精度算子性能断层分析总览 NVIDIA RTX 4090 搭载的 Ada Lovelace 架构在 CUDA 13.3 中首次全面启用第三代 Tensor Core 的 FP8 原生支持，使得混合精度计算路径&…...

2026/5/1 20:02:41 阅读更多 →

Vue3项目实战：手写Ant Design Vue a-table拖拽排序（绕过付费功能）

Vue3项目实战：基于Ant Design Vue的a-table手写拖拽排序方案去年接手一个从React迁移到Vue3的项目时，遇到了一个有趣的挑战。项目使用了Ant Design Vue作为UI组件库，在实现菜单管理列表的拖拽排序功能时，发现官方提供的a-table拖…...

2026/5/1 7:45:55 阅读更多 →

2026届最火的AI辅助写作平台实测分析

Ai论文网站排名（开题报告、文献综述、降aigc率、降重综合对比） TOP1. 千笔AI TOP2. aipasspaper TOP3. 清北论文 TOP4. 豆包 TOP5. kimi TOP6. deepseek 在人工智能进行交互期间，指令存在冗余情形常常会致使输出出现偏差以及造成效率方…...

2026/5/1 20:02:42 阅读更多 →