热轧钢带缺陷数据集称为Xsteel表面缺陷数据集X-SDD其中包含七种典型的热轧带钢缺陷类型共有1360个缺陷图像。与常用的NEU表面缺陷数据库NEU-CLS的六种缺陷类型相比X-SDD包含更多类型。该数据集包含7种类型的1360张缺陷图像包括238个渣夹杂物简称“夹杂物”、397个红色铁皮、122个铁皮灰、134个表面划痕简称“划痕”、63个板系氧化标度、203个整理辊印刷和203个温度系氧化标度。Xsteel表面缺陷数据集X-SDD并探讨如何使用这个数据集进行深度学习模型的训练特别是在目标检测任务中。1. X-SDD 数据集介绍数据集概述数据集名称: Xsteel表面缺陷数据集X-SDD数据集来源: 该数据集由工业界和学术界共同制作旨在提供更全面的热轧钢带表面缺陷数据。数据集内容: 包含1360张缺陷图像涵盖7种典型的热轧带钢缺陷类型。缺陷类别:渣夹杂物夹杂物红色铁皮铁皮灰表面划痕划痕板系氧化标度整理辊印刷温度系氧化标度数据集目录结构X-SDD/ ├── images/ │ ├── image1.jpg │ ├── image2.jpg │ └── ... └── labels/ ├── image1.txt ├── image2.txt └── ...2. 数据集预处理数据集划分为了训练和验证模型我们需要将数据集划分为训练集和验证集。通常情况下可以将数据集按80%用于训练20%用于验证。importosimportrandomimportshutildefsplit_dataset(dataset_dir,output_dir,train_ratio0.8):os.makedirs(os.path.join(output_dir,images,train),exist_okTrue)os.makedirs(os.path.join(output_dir,images,val),exist_okTrue)os.makedirs(os.path.join(output_dir,labels,train),exist_okTrue)os.makedirs(os.path.join(output_dir,labels,val),exist_okTrue)image_files[fforfinos.listdir(os.path.join(dataset_dir,images))iff.endswith(.jpg)]random.shuffle(image_files)train_sizeint(len(image_files)*train_ratio)train_imagesimage_files[:train_size]val_imagesimage_files[train_size:]forimage_fileintrain_images:shutil.copy(os.path.join(dataset_dir,images,image_file),os.path.join(output_dir,images,train,image_file))label_fileimage_file.replace(.jpg,.txt)shutil.copy(os.path.join(dataset_dir,labels,label_file),os.path.join(output_dir,labels,train,label_file))forimage_fileinval_images:shutil.copy(os.path.join(dataset_dir,images,image_file),os.path.join(output_dir,images,val,image_file))label_fileimage_file.replace(.jpg,.txt)shutil.copy(os.path.join(dataset_dir,labels,label_file),os.path.join(output_dir,labels,val,label_file))if__name____main__:dataset_dirX-SDDoutput_dirX-SDD_splitsplit_dataset(dataset_dir,output_dir)3. 数据增强数据增强可以帮助提高模型的泛化能力。我们将使用albumentations库来进行数据增强。安装依赖pipinstallalbumentations数据增强代码importalbumentationsasAfromalbumentations.pytorchimportToTensorV2importcv2importosimportshutildefaugment_image(image_path,label_path,output_dir,num_augmentations10):imagecv2.imread(image_path)withopen(label_path,r)asf:labels[line.strip().split()forlineinf.readlines()]bboxes[[float(x)forxinlabel[1:]]forlabelinlabels]class_labels[int(label[0])forlabelinlabels]transformA.Compose([A.HorizontalFlip(p0.5),A.VerticalFlip(p0.5),A.RandomRotate90(p0.5),A.RandomBrightnessContrast(p0.2),A.RandomGamma(p0.2),A.Blur(blur_limit3,p0.2),A.CLAHE(p0.2),A.HueSaturationValue(p0.2),A.RandomResizedCrop(heightimage.shape[0],widthimage.shape[1],scale(0.8,1.0),p0.5),ToTensorV2()],bbox_paramsA.BboxParams(formatyolo,label_fields[class_labels]))foriinrange(num_augmentations):augmentedtransform(imageimage,bboxesbboxes,class_labelsclass_labels)augmented_imageaugmented[image]augmented_bboxesaugmented[bboxes]augmented_class_labelsaugmented[class_labels]output_image_pathos.path.join(output_dir,f{os.path.basename(image_path)}_aug_{i}.jpg)cv2.imwrite(output_image_path,augmented_image.permute(1,2,0).numpy())output_label_pathos.path.join(output_dir,f{os.path.basename(label_path)}_aug_{i}.txt)withopen(output_label_path,w)asf:forjinrange(len(augmented_bboxes)):f.write(f{augmented_class_labels[j]}{ .join(map(str,augmented_bboxes[j]))}\n)defaugment_dataset(input_dir,output_dir,num_augmentations10):ifos.path.exists(output_dir):shutil.rmtree(output_dir)os.makedirs(output_dir,exist_okTrue)forsplitin[train,val]:input_split_diros.path.join(input_dir,split)output_split_diros.path.join(output_dir,split)os.makedirs(output_split_dir,exist_okTrue)forimage_fileinos.listdir(os.path.join(input_split_dir,images)):image_pathos.path.join(input_split_dir,images,image_file)label_pathos.path.join(input_split_dir,labels,image_file.replace(.jpg,.txt))augment_image(image_path,label_path,output_split_dir,num_augmentations)if__name____main__:input_dirX-SDD_splitoutput_dirX-SDD_augmentedaugment_dataset(input_dir,output_dir,num_augmentations10)4. 训练YOLOv5模型安装YOLOv5gitclone https://github.com/ultralytics/yolov5cdyolov5 pipinstall-rrequirements.txt数据集配置文件创建一个data.yaml文件配置数据集路径和类别信息。# data.yamltrain:../X-SDD_augmented/images/trainval:../X-SDD_augmented/images/valnc:7names:[夹杂物,红色铁皮,铁皮灰,划痕,板系氧化标度,整理辊印刷,温度系氧化标度]训练脚本# src/train_yolov5.pyimporttorchfromyolov5.models.experimentalimportattempt_loadfromyolov5.utils.torch_utilsimportselect_devicefromyolov5.utils.generalimportcheck_img_sizefromyolov5.utils.datasetsimportcreate_dataloaderfromyolov5.utils.lossimportComputeLossfromyolov5.models.yoloimportModelimporttimeimportyamldeftrain_model(data_yaml_path,model_config,epochs,batch_size,img_size,device):withopen(data_yaml_path,r)asf:datayaml.safe_load(f)train_loadercreate_dataloader(data[train],img_size,batch_size,32)[0]val_loadercreate_dataloader(data[val],img_size,batch_size,32)[0]modelModel(model_config,ch3,ncdata[nc]).to(device)model.train()optimizertorch.optim.Adam(model.parameters(),lr0.001)compute_lossComputeLoss(model)start_timetime.time()forepochinrange(epochs):epoch_start_timetime.time()fori,(imgs,targets,paths,_)inenumerate(train_loader):imgsimgs.to(device)targetstargets.to(device)predmodel(imgs)loss,loss_itemscompute_loss(pred,targets)optimizer.zero_grad()loss.backward()optimizer.step()ifi%100:print(fEpoch [{epoch1}/{epochs}], Step [{i1}/{len(train_loader)}], Loss:{loss.item()})epoch_end_timetime.time()epoch_durationepoch_end_time-epoch_start_timeprint(fEpoch [{epoch1}/{epochs}] completed in{epoch_duration:.2f}seconds)torch.save(model.state_dict(),fmodels/yolov5_custom_epoch_{epoch1}.pth)end_timetime.time()total_durationend_time-start_timeprint(fTotal training time:{total_duration:.2f}seconds)if__name____main__:data_yaml_pathdata.yamlmodel_configmodels/yolov5s.yamlepochs100batch_size16img_size640deviceselect_device(0)# 使用GPU如果需要使用CPU可以改为cputrain_model(data_yaml_path,model_config,epochs,batch_size,img_size,device)5. 运行训练脚本数据增强python src/utils/data_augmentation.py训练模型python src/train_yolov5.py6. 详细解释数据增强augment_image: 对单张图像进行数据增强并保存增强后的图像和标签。augment_dataset: 对整个数据集进行数据增强生成新的数据集。训练脚本数据加载使用create_dataloader创建训练和验证的数据加载器。模型加载加载YOLOv5模型并设置为训练模式。优化器和损失函数定义Adam优化器和YOLOv5的损失函数。训练循环记录每个epoch的训练时间和总训练时间进行前向传播、计算损失、反向传播和优化。保存模型每个epoch结束后保存模型的权重。7. 注意事项数据集路径确保数据集路径正确特别是data.yaml文件中的路径。模型配置确保模型配置文件路径正确。图像大小img_size可以根据实际需求调整通常使用640或1280。设备确保设备CPU或GPU可用。8. 总结通过以上步骤你可以构建一个完整的X-SDD热轧钢带表面缺陷检测数据集并使用YOLOv5进行训练。