1. 线性回归简化模型2. 神经网络3. 线性回归求最优解4. 线性回归总结5. 优化方法5.1 梯度下降法5.2 小批量随即梯度下降法5.3 选择批量大小5.4 总结6. 线性回归使用自定义① 将从零开始实现整个方法包括数据流水线、模型、损失函数和小批量随即梯度下降优化器。6.1 生成数据集① 根据带有噪声的线性模型构造一个人造数据集。我们使用线性模型参数、和噪声项ϵ生成数据集及其标签。代码类型作用%matplotlib inlineJupyter 魔法命令图片直接显示在笔记本里import randomPython 标准库生成随机数import torch第三方库深度学习框架from d2l import torch as d2l第三方库《动手学深度学习》配套工具这就是一个深度学习 Jupyter Notebook 的标准开头%matplotlib inline import random import torch from d2l import torch as d2l def synthetic_data(w,b,num_examples): 生成 y Xw b 噪声 #torch.normal(均值, 标准差, 形状) 就是从符合正态分布的随机数中采样生成一个张量 X torch.normal(0,1,(num_examples,len(w)))#0是均值1是标准差num_exaples样本量,len(w)样本长度 print(X.shape:,X.shape) y torch.matmul(X,w) b#矩阵的乘法 print(y.shape:,y.shape) y torch.normal(0,0.01,y.shape)#从均值为0、标准差为0.01的正态分布中采样噪声 print(y.shape:,y.shape) return X, y.reshape((-1,1))#把 y 从形状 (num_examples,) 变成 (num_examples, 1)变成列#向量返回特征矩阵 X 和标签向量 y true_w torch.tensor([2,-3.4]) true_b 4.2 features, labels synthetic_data(true_w, true_b, 1000) print(features.shape:,features.shape) print(labels.shape:,labels.shape) #初始计算的 y 是一维向量 [1000]和 X 的行数匹配但维度是 1D #y.reshape((-1, 1))把一维向量转换成二维列向量 [1000, 1]-1 表示 “自动计算该维度的大小”这里就是 1000 #而 X即 features本身就是二维矩阵 [1000, 2]所以最终两者形状一个是 [1000,2]一个是 [1000,1]。 #features接收函数返回的特征矩阵 X1000 个样本的输入特征 labels接收函数返回的标签向量 y1000 个样本的真实输出带噪声 #输出 #X.shape: torch.Size([1000, 2]) #y.shape: torch.Size([1000]) #y.shape: torch.Size([1000]) #features.shape: torch.Size([1000, 2]) #labels.shape: torch.Size([1000, 1])6.2 绘制数据集① features中每一行都包含一个二维数据样本labels中的每一行都包含一维标签值一个标签。d2l.set_figsize()设置图表大小features[:, 1]取所有样本的第2个特征索引从0开始所以[:,1]是第二列.detach()把张量从计算图中分离出来因为画图不需要梯度.numpy()把 PyTorch 张量转成 NumPy 数组因为plt.scatter需要 NumPy 格式labels.detach().numpy()把标签也转成 NumPy1点的大小d2l.plt.scatter(...)画散点图横轴是第二个特征纵轴是标签值%matplotlib inline import random import torch from d2l import torch as d2l def synthetic_data(w,b,num_exaples): 生成 y Xw b 噪声 X torch.normal(0,1,(num_exaples,len(w))) y torch.matmul(X,w) b#这是计算理想的y y torch.normal(0,0.01,y.shape)#添加噪声让数据更真实噪声服从均值为0、标准差0.01的正态分布 return X, y.reshape((-1,1))#返回特征和标签把y变成列向量形式 true_w torch.tensor([2,-3.4]) true_b 4.2 features, labels synthetic_data(true_w, true_b, 1000) print(features:,features[0],\nlabel:,labels[0])#打印第一个样本的特征和标签验证数据 d2l.set_figsize() d2l.plt.scatter(features[:,1].detach().numpy(),labels.detach().numpy(),1) # 只有detach后才能转到numpy里面去features: tensor([-1.2519, -1.9356]) label: tensor([8.2824])matplotlib.collections.PathCollection at 0x2a2e6b93048from d2l import torch as d2l help(d2l.set_figsize)Help on function set_figsize in module d2l.torch: set_figsize(figsize(3.5, 2.5)) Set the figure size for matplotlib.PShelp(d2l.set_figsize)查看函数文档它会显示d2l.set_figsize这个函数的使用说明包括函数的作用参数说明返回值使用示例from d2l import torch as d2l help(d2l.plt.scatter)Help on function scatter in module matplotlib.pyplot: scatter(x, y, sNone, cNone, markerNone, cmapNone, normNone, vminNone, vmaxNone, alphaNone, linewidthsNone, vertsdeprecated parameter, edgecolorsNone, *, plotnonfiniteFalse, dataNone, **kwargs) A scatter plot of *y* vs. *x* with varying marker size and/or color. Parameters ---------- x, y : float or array-like, shape (n, ) The data positions. s : float or array-like, shape (n, ), optional The marker size in points**2. Default is rcParams[lines.markersize] ** 2. c : array-like or list of colors or color, optional The marker colors. Possible values: - A scalar or sequence of n numbers to be mapped to colors using *cmap* and *norm*. - A 2-D array in which the rows are RGB or RGBA. - A sequence of colors of length n. - A single color format string. Note that *c* should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. If you want to specify the same RGB or RGBA value for all points, use a 2-D array with a single row. Otherwise, value- matching will have precedence in case of a size matching with *x* and *y*. If you wish to specify a single color for all points prefer the *color* keyword argument. Defaults to None. In that case the marker color is determined by the value of *color*, *facecolor* or *facecolors*. In case those are not specified or None, the marker color is determined by the next color of the Axes current shape and fill color cycle. This cycle defaults to :rc:axes.prop_cycle. marker : ~.markers.MarkerStyle, default: :rc:scatter.marker The marker style. *marker* can be either an instance of the class or the text shorthand for a particular marker. See :mod:matplotlib.markers for more information about marker styles. cmap : str or ~matplotlib.colors.Colormap, default: :rc:image.cmap A .Colormap instance or registered colormap name. *cmap* is only used if *c* is an array of floats. norm : ~matplotlib.colors.Normalize, default: None If *c* is an array of floats, *norm* is used to scale the color data, *c*, in the range 0 to 1, in order to map into the colormap *cmap*. If *None*, use the default .colors.Normalize. vmin, vmax : float, default: None *vmin* and *vmax* are used in conjunction with the default norm to map the color array *c* to the colormap *cmap*. If None, the respective min and max of the color array is used. It is deprecated to use *vmin*/*vmax* when *norm* is given. alpha : float, default: None The alpha blending value, between 0 (transparent) and 1 (opaque). linewidths : float or array-like, default: :rc:lines.linewidth The linewidth of the marker edges. Note: The default *edgecolors* is face. You may want to change this as well. edgecolors : {face, none, *None*} or color or sequence of color, default: :rc:scatter.edgecolors The edge color of the marker. Possible values: - face: The edge color will always be the same as the face color. - none: No patch boundary will be drawn. - A color or sequence of colors. For non-filled markers, the *edgecolors* kwarg is ignored and forced to face internally. plotnonfinite : bool, default: False Set to plot points with nonfinite *c*, in conjunction with ~matplotlib.colors.Colormap.set_bad. Returns ------- ~matplotlib.collections.PathCollection Other Parameters ---------------- **kwargs : ~matplotlib.collections.Collection properties See Also -------- plot : To plot scatter plots when markers are identical in size and color. Notes ----- * The .plot function will be faster for scatterplots where markers dont vary in size or color. * Any or all of *x*, *y*, *s*, and *c* may be masked arrays, in which case all masks will be combined and only unmasked points will be plotted. * Fundamentally, scatter works with 1-D arrays; *x*, *y*, *s*, and *c* may be input as N-D arrays, but within scatter they will be flattened. The exception is *c*, which will be flattened only if its size matches the size of *x* and *y*. .. note:: In addition to the above described arguments, this function can take a *data* keyword argument. If such a *data* argument is given, the following arguments can also be string s, which is interpreted as data[s] (unless this raises an exception): *x*, *y*, *s*, *linewidths*, *edgecolors*, *c*, *facecolor*, *facecolors*, *color*. Objects passed as **data** must support item access (data[s]) and membership test (s in data).6.3 读取小批量%matplotlib inline import random import torch from d2l import torch as d2l def synthetic_data(w,b,num_exaples): 生成 y Xw b 噪声 X torch.normal(0,1,(num_exaples,len(w))) y torch.matmul(X,w) b y torch.normal(0,0.01,y.shape) return X, y.reshape((-1,1)) true_w torch.tensor([2,-3.4]) true_b 4.2 features, labels synthetic_data(true_w, true_b, 1000) print(features:,features[0],\nlabel:,labels[0]) d2l.set_figsize() d2l.plt.scatter(features[:,(1)].detach().numpy(),labels.detach().numpy(),1) # data_iter函数接收批量大小、特征矩阵和标签向量作为输入生成大小为batch_size的小批量 def data_iter(batch_size,features,labels): num_examples len(features) # 样本个数 indices list(range(num_examples)) # 样本索引 # 这些样本是随即读取的没有特定的顺序 random.shuffle(indices) # 把索引随即打乱 for i in range(0, num_examples, batch_size): batch_indices torch.tensor(indices[i:min(ibatch_size,num_examples)]) # 当ibatch_size超出时取num_examples 从打乱的索引中取出一批比如 i0 时取前10个随机索引转成张量 yield features[batch_indices], labels[batch_indices] # 获得随即顺序的特征及对应的标签yield 是生成器关键字每次循环返回一批数据x,y下次调用时继续执行 batch_size 10 for X,y in data_iter(batch_size, features, labels): print(X, \n, y) # 取一个批次后就break跳出了 break执行过程第一次进入for循环调用data_iter生成器从i0开始取第一批 10 个样本print(X, \n, y)打印这批数据break跳出循环只取一个批次features: tensor([-0.3649, 0.1915]) label: tensor([2.8186]) tensor([[ 0.0121, -0.8915], [ 0.0873, -0.2825], [-1.2352, 0.0334], [-1.6237, 0.2243], [-1.0160, 0.4319], [ 0.0915, 0.1113], [-0.6346, 1.3844], [-1.4504, -0.0873], [-0.5357, 1.5457], [-0.5088, 0.0778]]) tensor([[ 7.2667], [ 5.3237], [ 1.6068], [ 0.1980], [ 0.7058], [ 3.9992], [-1.7850], [ 1.6098], [-2.1443], [ 2.9316]])完整流程回顾用synthetic_data生成 1000 个样本data_iter把样本索引打乱每次返回batch_size个随机样本测试时取一个批次验证这是深度学习中数据加载的标准模式随机打乱 批量读取。6.4 完整模型%matplotlib inline import random import torch from d2l import torch as d2l def synthetic_data(w,b,num_exaples): 生成 y Xw b 噪声 X torch.normal(0,1,(num_exaples,len(w))) y torch.matmul(X,w) b y torch.normal(0,0.01,y.shape) return X, y.reshape((-1,1)) true_w torch.tensor([2,-3.4]) true_b 4.2 features, labels synthetic_data(true_w, true_b, 1000) print(features:,features[0],\nlabel:,labels[0]) d2l.set_figsize() d2l.plt.scatter(features[:,(1)].detach().numpy(),labels.detach().numpy(),1) def data_iter(batch_size,features,labels): num_examples len(features) # 样本个数 indices list(range(num_examples)) # 样本索引 # 这些样本是随即读取的没有特定的顺序 random.shuffle(indices) # 把索引随即打乱 for i in range(0, num_examples, batch_size): batch_indices torch.tensor(indices[i:min(ibatch_size,num_examples)]) # 当ibatch_size超出时取num_examples yield features[batch_indices], labels[batch_indices] # 获得随即顺序的特征及对应的标签 batch_size 10 for X,y in data_iter(batch_size, features, labels): print(X, \n, y) # 取一个批次后就break跳出了 break # 定义初始化模型参数 w torch.normal(0,0.01,size(2,1),requires_gradTrue) b torch.zeros(1,requires_gradTrue) # 定义模型 def linreg(X,w,b): 线性回归模型 return torch.matmul(X,w)b # 定义损失函数 def squared_loss(y_hat,y): 均方损失 return (y_hat - y.reshape(y_hat.shape))**2/2 # 将y统一成与y_hat一样同尺寸 # 定义优化算法 def sgd(params,lr,batch_size): 小批量随即梯度下降 with torch.no_grad(): # 不要产生梯度计算减少内存消耗 for param in params: # 每个参数进行遍历 param - lr * param.grad / batch_size # 每个参数进行更新损失函数没有求均值所以这里除以 batch_size 求了均值。由于乘法的线性关系这里除以放在loss的除以是等价的。 param.grad.zero_() # 每个参数的梯度清零 # 训练过程 lr 0.03#超参数学习率为0.03 num_epochs 3#整个数据扫3遍 net linreg # 这里用线性模型这样写是很方便net赋予其他模型只需要改一处不需要下面所有网络模型名称都改 loss squared_loss#损失 # 训练过程 for epoch in range(num_epochs): for X,y in data_iter(batch_size,features,labels): l loss(net(X,w,b),y) # x和y的小批量损失 # 因为l是形状是(batch_size,1)而不是一个标量。l中所有元素被加到一起 # 并以此计算关于[w,b]的梯度 l.sum().backward()#算梯度 sgd([w,b],lr,batch_size) #使用参数的梯度更新参数 with torch.no_grad(): train_l loss(net(features,w,b),labels) print(fepoch{epoch1},loss{float(train_l.mean()):f}) # 比较真实参数和通过训练学到的参数来评估训练的成功程度 print(fw的估计误差{true_w-w.reshape(true_w.shape)}) print(fb的估计误差{true_b-b})features: tensor([ 0.3564, -0.2589]) label: tensor([5.7863]) tensor([[-0.5143, -0.4114], [ 0.3137, -0.2456], [ 0.3821, 0.3905], [ 1.6864, 0.2536], [ 0.9743, -0.5378], [-0.2416, -2.0560], [-0.3996, -1.3219], [-1.6266, -2.0647], [-1.8871, 1.1921], [-0.4374, -0.0374]]) tensor([[ 4.5433], [ 5.6544], [ 3.6427], [ 6.7095], [ 7.9819], [10.7031], [ 7.8992], [ 7.9747], [-3.6286], [ 3.4295]]) epoch1,loss0.047041 epoch2,loss0.000188 epoch3,loss0.000048 w的估计误差tensor([ 4.6253e-05, -1.0376e-03], grad_fnSubBackward0) b的估计误差tensor([0.0005], grad_fnRsubBackward1)7. 线性回归使用框架import numpy as np import torch from torch.utils import data from d2l import torch as d2l from torch import nn true_w torch.tensor([2,-3.4]) true_b 4.2 features, labels d2l.synthetic_data(true_w,true_b,1000) # 库函数生成人工数据集 # 调用框架现有的API来读取数据 def load_array(data_arrays,batch_size,is_trainTrue): 构造一个Pytorch数据迭代器 dataset data.TensorDataset(*data_arrays) # dataset相当于Pytorch的Dataset。一个星号*表示对list解开入参。 return data.DataLoader(dataset,batch_size,shuffleis_train) # 返回的是从dataset中随机挑选出batch_size个样本出来 batch_size 10 data_iter load_array((features,labels),batch_size) # 返回的数据的迭代器 print(next(iter(data_iter))) # iter(data_iter) 是一个迭代器对象next是取迭代器里面的元素 # 使用框架的预定义好的层 # nn是神经网络的缩写 net nn.Sequential(nn.Linear(2,1)) # 初始化模型参数 net[0].weight.data.normal_(0,0.01) # 使用正态分布替换掉weight变量里面的数据值 net[0].bias.data.fill_(0) # 偏差bias变量里面的值设置为0 print(net[0]) # 计算均方误差使用的是MSELoss类也称为平方L2范数 loss nn.MSELoss() #L1是算术差L2是平方差 # 实例化SGD实例 trainer torch.optim.SGD(net.parameters(),lr0.03) # 训练过程代码与从零开始时所做的非常相似 num_epochs 3 for epoch in range(num_epochs): for X, y in data_iter: # 从DataLoader里面一次一次把所有数据拿出来 # print(X:,X) # print(y:,y) l loss(net(X),y) # net(X) 为计算出来的线性回归的预测值 trainer.zero_grad() # 梯度清零 l.backward() trainer.step() # SGD优化器优化模型 l loss(net(features),labels) print(fepoch{epoch1},loss{l:f})[tensor([[ 4.0302e-01, -3.0438e-02], [-2.6363e-02, -1.3848e00], [-9.3803e-01, 2.6146e00], [ 1.0863e-01, 6.7422e-01], [-1.4073e00, -7.3437e-01], [ 4.1662e-01, 9.1266e-02], [ 8.1243e-01, -4.4630e-01], [-1.4354e00, -2.9520e-01], [ 1.1644e-03, -1.5696e00], [-1.1745e00, 1.7154e-02]]), tensor([[ 5.0900], [ 8.8735], [-6.5605], [ 2.1188], [ 3.8872], [ 4.7255], [ 7.3227], [ 2.3349], [ 9.5274], [ 1.7901]])] Linear(in_features2, out_features1, biasTrue) epoch1,loss0.000225 epoch2,loss0.000105 epoch3,loss0.000105参考github作者我只是用来学习。