开始讲解之前推荐一下我的专栏本专栏的内容支持(分类、检测、分割、追踪、关键点检测),专栏目前为限时折扣欢迎大家订阅本专栏本专栏每周更新5-7篇最新机制更有包含我所有改进的文件和交流群提供给大家。一、本文介绍本文给大家带来的改进机制是利用ADown模块来改进我们的Conv模块经过实验我发现该卷积模块作为下采样模块首先可以大幅度降低参数值其次其精度上也有很高的提升大家可以尝试以下在自己数据集上的效果但该模型是由YOLOv9提出这个结构效果很好但是其原始是默认下采样特征图的大小本文优化了其结构可以决定是否进行下采样大家可以用于二次创新其它模块本文用其二次创新C3k2模块。欢迎大家订阅我的专栏一起学习YOLO专栏链接YOLOv26有效涨点专栏包含Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制目录一、本文介绍二、框架图2.1 Programmable Gradient Information​​/可编程梯度信息2.1.1 Auxiliary Reversible Branch/辅助可逆分支2.1.2 Multi-level Auxiliary Information/多级辅助信息2.2 Generalized ELAN三、 核心代码四、手把手教你添加ADown机制4.1 修改一4.2 修改二4.3 修改三4.4 修改四4.5 修改五4.6 修改六五、正式训练5.1 yaml文件5.1.1 yaml文件15.1.2 yaml文件25.1.3 yaml文件35.1.4 yaml文件45.2 训练代码5.3 训练过程截图五、本文总结二、框架图​​这张图图3展示了可编程梯度信息PGI及其相关网络架构和方法。图中展示了四种不同的网络设计a) PAN (Path Aggregation Network)这种网络结构主要用于改进特征融合以提高目标检测的性能。然而由于信息瓶颈的存在网络中可能会丢失一些信息。b) RevCol (Reversible Columns)这是一种旨在减少信息丢失的网络设计。它通过可逆的列结构来尝试维持信息流通不受损失但如图中“Heavy Cost”所示这种结构会增加计算成本。c) 深度监督这种方法通过在网络的多个层次中插入额外的监督信号来提高学习的效率和最终模型的性能。图中显示了通过深度监督连接的各个层。d) 可编程梯度信息 (PGI)PGI是作者提出的一种新方法我理解的这种方法就是在前向传播的过程中没有跳级链接它主要由三个部分组成1. 主分支用于推理的架构。2. 辅助可逆分支生成可靠的梯度以供给主分支进行反向传播。3. 多级辅助信息控制主分支学习可规划的多级语义信息。PGI的目的是通过辅助可逆分支如图中虚线框所示来解决信息瓶颈问题以便在不增加推理成本的情况下为深度网络提供更可靠的梯度。通过这种设计即使是轻量级和浅层的神经网络也可以实现有效的信息保留和准确的梯度更新。如图中的深色方框所示的主分支通过辅助可逆分支提供的可靠梯度信息可以获得更有效的目标任务特征而不会因为信息瓶颈而损失重要信息。图中的符号代表不同的操作灰色圆形代表池化操作白色圆形代表上采样操作灰色方块代表预测头蓝色方块代表辅助分支深色方块代表主分支。这种设计允许网络在保持高效计算的同时也能够处理复杂的目标检测任务。2.1 Programmable Gradient Information​​/可编程梯度信息为了解决前述问题我们提出了一种新的辅助监督框架称为可编程梯度信息PGI如图3d所示。PGI主要包括三个部分即1主分支2辅助可逆分支和3多级辅助信息。从图3d我们可以看到PGI的推理过程只使用主分支因此不需要任何额外的推理成本。至于其他两个部分它们用于解决或减缓深度学习方法中的几个重要问题。其中辅助可逆分支旨在处理由神经网络加深造成的问题。网络加深将导致信息瓶颈这将使得损失函数无法生成可靠的梯度。至于多级辅助信息它旨在处理由深度监督造成的误差累积问题特别是对于具有多个预测分支的架构和轻量型模型。接下来我们将逐步介绍这两个部分​​。2.1.1 Auxiliary Reversible Branch/辅助可逆分支在PGI中我们提出了辅助可逆分支来生成可靠的梯度并更新网络参数。通过提供从数据到目标的映射信息损失函数可以提供指导并避免从与目标关系较小的不完整前馈特征中找到错误相关性的可能性。我们提出通过引入可逆架构来维持完整信息但在可逆架构中添加主分支将消耗大量的推理成本。我们分析了图3b的架构并发现在深层到浅层添加额外连接时推理时间将增加20%。当我们反复将输入数据添加到网络的高分辨率计算层黄色框推理时间甚至超过了两倍。由于我们的目标是使用可逆架构来获取可靠的梯度因此“可逆”并不是推理阶段的唯一必要条件。鉴于此我们将可逆分支视为深度监督分支的扩展并设计了如图3d所示的辅助可逆分支。至于主分支由于信息瓶颈可能会丢失重要信息的深层特征将能够从辅助可逆分支接收可靠的梯度信息。这些梯度信息将推动参数学习以帮助提取正确和重要的信息并使主分支能够获取更有效的目标任务特征。此外由于复杂任务需要在更深的网络中进行转换可逆架构在浅层网络上的表现不如在一般网络上。我们提出的方法不强迫主分支保留完整的原始信息而是通过辅助监督机制生成有用的梯度来更新它。这种设计的优势是所提出的方法也可以应用于较浅的网络。最后由于辅助可逆分支可以在推理阶段移除因此可以保留原始网络的推理能力。我们还可以在PGI中选择任何可逆架构来充当辅助可逆分支的角色。2.1.2 Multi-level Auxiliary Information/多级辅助信息在本节中我们将讨论多级辅助信息是如何工作的。包含多个预测分支的深度监督架构如图3c所示。对于对象检测可以使用不同的特征金字塔来执行不同的任务例如它们可以一起检测不同大小的对象。因此连接到深度监督分支后浅层特征将被引导学习小对象检测所需的特征此时系统将将其他大小的对象位置视为背景。然而上述行为将导致深层特征金字塔丢失很多预测目标对象所需的信息。对于这个问题我们认为每个特征金字塔都需要接收所有目标对象的信息以便后续主分支能够保留完整信息来学习对各种目标的预测。多级辅助信息的概念是在辅助监督的特征金字塔层之间和主分支之间插入一个集成网络然后使用它来结合不同预测头返回的梯度如图3d所示。然后多级辅助信息将汇总包含所有目标对象的梯度信息并将其传递给主分支然后更新参数。此时主分支的特征金字塔层次的特性不会被某些特定对象的信息所主导。因此我们的方法可以缓解深度监督中的断裂信息问题。此外任何集成网络都可以在多级辅助信息中使用。因此我们可以规划所需的语义级别来指导不同大小的网络架构的学习。2.2 Generalized ELAN在本节中我们描述了提出的新网络架构 - GELAN。通过结合两种神经网络架构CSPNet和ELAN这两种架构都是以梯度路径规划设计的我们设计了考虑了轻量级、推理速度和准确性的广义高效层聚合网络GELAN。其整体架构如图4所示。我们推广了ELAN的能力ELAN原本只使用卷积层的堆叠到一个新的架构可以使用任何计算块。​​这张图图4展示了广义高效层聚合网络GELAN的架构以及它是如何从CSPNet和ELAN这两种神经网络架构演变而来的。这两种架构都设计有梯度路径规划。a) CSPNet在CSPNet的架构中输入通过一个转换层被分割为两部分然后分别通过任意的计算块。之后这些分支被重新合并通过concatenation并再次通过转换层。b) ELAN与CSPNet相比ELAN采用了堆叠的卷积层其中每一层的输出都会与下一层的输入相结合再经过卷积处理。c) GELAN结合了CSPNet和ELAN的设计提出了GELAN。它采用了CSPNet的分割和重组的概念并在每一部分引入了ELAN的层级卷积处理方式。不同之处在于GELAN不仅使用卷积层还可以使用任何计算块使得网络更加灵活能够根据不同的应用需求定制。GELAN的设计考虑到了轻量化、推理速度和精确度以此来提高模型的整体性能。图中显示的模块和分区的可选性进一步增加了网络的适应性和可定制性。GELAN的这种结构允许它支持多种类型的计算块这使得它可以更好地适应各种不同的计算需求和硬件约束。总的来说GELAN的架构是为了提供一个更加通用和高效的网络可以适应从轻量级到复杂的深度学习任务同时保持或增强计算效率和性能。通过这种方式GELAN旨在解决现有架构的限制提供一个可扩展的解决方案以适应未来深度学习的发展。大家看图片一眼就能看出来它融合了什么就是将CSPHet的anyBlock模块堆叠的方式和ELAN融合到了一起。目前针对该结构并无原理介绍下面的图片为我个人经过代码复现的结构图结构上也是非常的简单。​三、 核心代码核心代码的使用方式看章节四import torch import torch.nn as nn import torch.nn.functional as F __all__ [Snu77ADown, C3k2_ADown] def autopad(k, pNone, d1): # kernel, padding, dilation # Pad to same shape outputs if d 1: k d * (k - 1) 1 if isinstance(k, int) else [d * (x - 1) 1 for x in k] # actual kernel-size if p is None: p k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad return p class Conv(nn.Module): # Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation) default_act nn.SiLU() # default activation def __init__(self, c1, c2, k1, s1, pNone, g1, d1, actTrue): super().__init__() self.conv nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groupsg, dilationd, biasFalse) self.bn nn.BatchNorm2d(c2) self.act self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity() def forward(self, x): return self.act(self.bn(self.conv(x))) def forward_fuse(self, x): return self.act(self.conv(x)) class Snu77ADown(nn.Module): def __init__(self, c1, c2, downsampleFalse): super().__init__() self.downsample downsample self.c c2 // 2 self.cv1 Conv(c1 // 2, self.c, 3, s2 if downsample else 1, p1) self.cv2 Conv(c1 // 2, self.c, 1, 1, 0) def forward(self, x): if self.downsample: x F.avg_pool2d(x, 2, 1, 0, False, True) x1, x2 x.chunk(2, 1) x1 self.cv1(x1) if self.downsample: x2 F.max_pool2d(x2, 3, 2, 1) else: x2 F.max_pool2d(x2, 3, 1, 1) x2 self.cv2(x2) return torch.cat((x1, x2), 1) class Bottleneck_ADown(nn.Module): # Standard bottleneck with DCN def __init__(self, c1, c2, shortcutTrue, g1, k(3, 3), e0.5): # ch_in, ch_out, shortcut, groups, kernels, expand super().__init__() c_ int(c2 * e) # hidden channels self.cv1 Conv(c1, c_, k[0], 1) self.cv2 Snu77ADown(c_, c2, False) self.add shortcut and c1 c2 def forward(self, x): return x self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x)) class Bottleneck(nn.Module): Standard bottleneck. def __init__( self, c1: int, c2: int, shortcut: bool True, g: int 1, k: tuple[int, int] (3, 3), e: float 0.5 ): Initialize a standard bottleneck module. Args: c1 (int): Input channels. c2 (int): Output channels. shortcut (bool): Whether to use shortcut connection. g (int): Groups for convolutions. k (tuple): Kernel sizes for convolutions. e (float): Expansion ratio. super().__init__() c_ int(c2 * e) # hidden channels self.cv1 Conv(c1, c_, k[0], 1) self.cv2 Conv(c_, c2, k[1], 1, gg) self.add shortcut and c1 c2 def forward(self, x: torch.Tensor) - torch.Tensor: Apply bottleneck with optional shortcut connection. return x self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x)) class C2f(nn.Module): Faster Implementation of CSP Bottleneck with 2 convolutions. def __init__(self, c1, c2, n1, shortcutFalse, g1, e0.5): Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing. super().__init__() self.c int(c2 * e) # hidden channels self.cv1 Conv(c1, 2 * self.c, 1, 1) self.cv2 Conv((2 n) * self.c, c2, 1) # optional actFReLU(c2) self.m nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k((3, 3), (3, 3)), e1.0) for _ in range(n)) def forward(self, x): Forward pass through C2f layer. y list(self.cv1(x).chunk(2, 1)) y.extend(m(y[-1]) for m in self.m) return self.cv2(torch.cat(y, 1)) def forward_split(self, x): Forward pass using split() instead of chunk(). y list(self.cv1(x).split((self.c, self.c), 1)) y.extend(m(y[-1]) for m in self.m) return self.cv2(torch.cat(y, 1)) class C3(nn.Module): CSP Bottleneck with 3 convolutions. def __init__(self, c1, c2, n1, shortcutTrue, g1, e0.5): Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values. super().__init__() c_ int(c2 * e) # hidden channels self.cv1 Conv(c1, c_, 1, 1) self.cv2 Conv(c1, c_, 1, 1) self.cv3 Conv(2 * c_, c2, 1) # optional actFReLU(c2) self.m nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k((1, 1), (3, 3)), e1.0) for _ in range(n))) def forward(self, x): Forward pass through the CSP bottleneck with 2 convolutions. return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1)) class C3k(C3): C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks. def __init__(self, c1, c2, n1, shortcutTrue, g1, e0.5, k3): Initializes the C3k module with specified channels, number of layers, and configurations. super().__init__(c1, c2, n, shortcut, g, e) c_ int(c2 * e) # hidden channels # self.m nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k(k, k), e1.0) for _ in range(n))) self.m nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k(k, k), e1.0) for _ in range(n))) class Attention_YOLOv26(nn.Module): Attention module that performs self-attention on the input tensor. Args: dim (int): The input tensor dimension. num_heads (int): The number of attention heads. attn_ratio (float): The ratio of the attention key dimension to the head dimension. Attributes: num_heads (int): The number of attention heads. head_dim (int): The dimension of each attention head. key_dim (int): The dimension of the attention key. scale (float): The scaling factor for the attention scores. qkv (Conv): Convolutional layer for computing the query, key, and value. proj (Conv): Convolutional layer for projecting the attended values. pe (Conv): Convolutional layer for positional encoding. def __init__(self, dim: int, num_heads: int 8, attn_ratio: float 0.5): Initialize multi-head attention module. Args: dim (int): Input dimension. num_heads (int): Number of attention heads. attn_ratio (float): Attention ratio for key dimension. super().__init__() self.num_heads num_heads self.head_dim dim // num_heads self.key_dim int(self.head_dim * attn_ratio) self.scale self.key_dim**-0.5 nh_kd self.key_dim * num_heads h dim nh_kd * 2 self.qkv Conv(dim, h, 1, actFalse) self.proj Conv(dim, dim, 1, actFalse) self.pe Conv(dim, dim, 3, 1, gdim, actFalse) def forward(self, x: torch.Tensor) - torch.Tensor: Forward pass of the Attention module. Args: x (torch.Tensor): The input tensor. Returns: (torch.Tensor): The output tensor after self-attention. B, C, H, W x.shape N H * W qkv self.qkv(x) q, k, v qkv.view(B, self.num_heads, self.key_dim * 2 self.head_dim, N).split( [self.key_dim, self.key_dim, self.head_dim], dim2 ) attn (q.transpose(-2, -1) k) * self.scale attn attn.softmax(dim-1) x (v attn.transpose(-2, -1)).view(B, C, H, W) self.pe(v.reshape(B, C, H, W)) x self.proj(x) return x class PSABlock(nn.Module): PSABlock class implementing a Position-Sensitive Attention block for neural networks. This class encapsulates the functionality for applying multi-head attention and feed-forward neural network layers with optional shortcut connections. Attributes: attn (Attention): Multi-head attention module. ffn (nn.Sequential): Feed-forward neural network module. add (bool): Flag indicating whether to add shortcut connections. Methods: forward: Performs a forward pass through the PSABlock, applying attention and feed-forward layers. def __init__(self, c: int, attn_ratio: float 0.5, num_heads: int 4, shortcut: bool True) - None: Initialize the PSABlock. Args: c (int): Input and output channels. attn_ratio (float): Attention ratio for key dimension. num_heads (int): Number of attention heads. shortcut (bool): Whether to use shortcut connections. super().__init__() self.attn Attention_YOLOv26(c, attn_ratioattn_ratio, num_headsnum_heads) self.ffn nn.Sequential(Conv(c, c * 2, 1), Conv(c * 2, c, 1, actFalse)) self.add shortcut def forward(self, x: torch.Tensor) - torch.Tensor: Execute a forward pass through PSABlock. Args: x (torch.Tensor): Input tensor. Returns: (torch.Tensor): Output tensor after attention and feed-forward processing. x x self.attn(x) if self.add else self.attn(x) x x self.ffn(x) if self.add else self.ffn(x) return x class C3k_ADown(C3): C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks. def __init__(self, c1, c2, n1, shortcutTrue, g1, e0.5, k3): Initializes the C3k module with specified channels, number of layers, and configurations. super().__init__(c1, c2, n, shortcut, g, e) c_ int(c2 * e) # hidden channels # self.m nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k(k, k), e1.0) for _ in range(n))) self.m nn.Sequential(*(Bottleneck_ADown(c_, c_, shortcut, g, k(k, k), e1.0) for _ in range(n))) class C3k2_ADown(C2f): Faster Implementation of CSP Bottleneck with 2 convolutions. def __init__( self, c1: int, c2: int, n: int 1, c3k: bool False, e: float 0.5, attn: bool False, g: int 1, shortcut: bool True, ): Initialize C3k2 modu Args: c1 (int): Input channels. c2 (int): Output channels. n (int): Number of blocks. c3k (bool): Whether to use C3k blocks. e (float): Expansion ratio. attn (bool): Whether to use attention blocks. g (int): Groups for convolutions. shortcut (bool): Whether to use shortcut connections. super().__init__(c1, c2, n, shortcut, g, e) self.m nn.ModuleList( nn.Sequential( Bottleneck_ADown(self.c, self.c, shortcut, g), PSABlock(self.c, attn_ratio0.5, num_headsmax(self.c // 64, 1)), ) if attn else C3k_ADown(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck_ADown(self.c, self.c, shortcut, g) for _ in range(n) ) if __name__ __main__: x torch.randn(1, 32, 16, 16) model C3k2_ADown(32, 32,2,True,0.25, True) print(model(x).shape)四、手把手教你添加ADown机制下面的步骤如果你不会或者不想麻烦操作可以联系作者获得本专栏添加所有项目文件的源代码可直接训练.4.1 修改一第一还是建立文件我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是Addmodules文件夹4.2 修改二然后在Addmodules文件夹内建立一个新的py文件将本文章节三中的“核心代码复制粘贴进去。​4.3 修改三第二步我们在该目录下创建一个新的py文件名字为__init__.py然后在其内部导入我们的文件如下图所示。​​​4.4 修改四第三步我门中到如下文件ultralytics/nn/tasks.py进行导入和注册我们的模块(此处只需要添加一次即可如果你用我其它的改进机制这里的步骤只需要添加一次)​​​4.5 修改五在ultralytics/nn/tasks.py文件内的parse_model方法函数内位置大概在1500行左右按照图示位置添加即可此处需要自己有一定的判别能力如果不会可联系作者获得视频教程。​​​4.6 修改六在ultralytics/nn/tasks.py文件内的parse_model方法函数内位置大概在1600行左右按照图示位置进行代码的替换即可此处不改如果你yaml文件中的所有C3k2都被改名了则检测头会使用老版本的v8检测头参数量会大幅度增加但不影响运行很多人都忽略了这一步。if C3k2 in getattr(m, __name__, str(m)): legacy False if scale in mlx: args[3] True到此就修改完成了大家可以复制下面的yaml文件运行更多使用方式可以联系作者获得使用视频本文仅列出常见的使用方式。。五、正式训练5.1 yaml文件5.1.1 yaml文件1训练信息YOLO26-ADown summary: 273 layers, 2,089,720 parameters, 2,089,720 gradients, 4.9 GFLOPs# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Snu77ADown, [128, True]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Snu77ADown, [256, True]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Snu77ADown, [512, True]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Snu77ADown, [1024, True]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) # - [-1, 1, Conv, [256, 3, 2]] - [-1, 1, Snu77ADown, [256, True]] # 和上面一层二选一. - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium) # - [-1, 1, Conv, [512, 3, 2]] - [-1, 1, Snu77ADown, [512, True]] # 和上面一层二选一. - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.1.2 yaml文件2训练信息YOLO26-C3k2-ADown-1 summary: 273 layers, 2,501,560 parameters, 2,501,560 gradients, 5.9 GFLOPs# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2_ADown, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2_ADown, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2_ADown, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2_ADown, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.1.3 yaml文件3训练信息YOLO26-C3k2-ADown-2 summary: 275 layers, 2,489,080 parameters, 2,489,080 gradients, 5.9 GFLOPs# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2_ADown, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2_ADown, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2_ADown, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2_ADown, [1024, True, 0.5, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.1.4 yaml文件4训练信息YOLO26-C3k2-ADown-3 summary: 263 layers, 3,283,864 parameters, 3,283,864 gradients, 9.3 GFLOPs# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2_ADown, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2_ADown, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2_ADown, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2_ADown, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2_ADown, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2_ADown, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2_ADown, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2_ADown, [1024, True, 0.5, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.2 训练代码大家可以创建一个py文件将我给的代码复制粘贴进去配置好自己的文件路径即可运行。import warnings warnings.filterwarnings(ignore) from ultralytics import YOLO if __name__ __main__: model YOLO(模型配置文件地址,也就是5.1你保存到本地文件的地址) # 如何切换模型版本, 上面的ymal文件可以改为 yolo26s.yaml就是使用的26s, # 类似某个改进的yaml文件名称为yolo26-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolo26l-XXX.yaml即可改的是上面YOLO中间的名字不是配置文件的 # model.load(yolo26n.pt) # 是否加载预训练权重,科研不建议大家加载否则很难提升精度 model.train( datar数据集文件地址, # 如果大家任务是其它的ultralytics/cfg/default.yaml找到这里修改task可以改成detect, segment, classify, pose cacheFalse, imgsz640, epochs20, single_clsFalse, # 是否是单类别检测 batch16, close_mosaic0, workers0, device0, optimizerMuSGD, # using SGD/MuSGD # resume, # 这里是填写last.pt地址 ampTrue, # 如果出现训练损失为Nan可以关闭amp projectruns/train, nameexp, )5.3 训练过程截图五、本文总结到此本文的正式分享内容就结束了在这里给大家推荐我的YOLOv26改进有效涨点专栏本专栏目前为新开的平均质量分98分后期我会根据各种最新的前沿顶会进行论文复现也会对一些老的改进机制进行补充如果大家觉得本文帮助到你了订阅本专栏关注后续更多的更新~专栏链接YOLOv26有效涨点专栏包含Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制