【深度学习第一周】深度学习基础

[scode type="lblue"]Hi，朋友，这是我参加中国海洋大学 2023 暑期《深度学习-算法与实战》课程的学习笔记和感悟。网络上有相关的资源，如果你也对 Deep learning 感兴趣欢迎一起参与学习！[/scode]

代码练习

本部分使用 Google Colab，他是一个在线 Jupyter 笔记本环境。已经配置好 Python 和 pytorch，不需要在本地计算机进行任何设置就可以直接使用。对免费用户提供约 12G 内存，100G 硬盘以及 GPU、TPU 资源。https://colab.research.google.com/

[scode type="yellow"]如果无法使用 Google Colab，你也可以尝试使用 Kaggle。他同样提供在线 Jupyter 环境。对免费用户提供约 30G 内存、100G 硬盘以及每周 30h GPU、20h TPU 资源。由于同样使用 Google 服务器，需要自行解决网络受限问题。https://www.kaggle.com/[/scode]

pytorch基础练习

练习 pytorch 基础操作。实验书：Github

PyTorch是一个python库，它主要提供了两个高级功能：

GPU加速的张量计算
构建在反向自动求导系统上的深度神经网络

定义数据

Tensor支持各种各样类型的数据，包括：

torch.float32, torch.float64, torch.float16, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64 。这里不过多描述。

创建Tensor有多种方法，包括：ones, zeros, eye, arange, linspace, rand, randn, normal, uniform, randperm, 使用的时候可以在线搜。

import torch

print(torch.tensor([1,2,3,4,5]))
print(torch.ones(2,3,4))
print(torch.empty(2,3))
print(torch.rand(2,3))
print(torch.zeros(2,3,dtype=torch.long)) # 指定数据类型为 long
print(torch.zeros(2,3,dtype=torch.long).new_ones(3,4)) # 使用现有张量的属性创建新张量
print(torch.randn_like(torch.zeros(2,3,dtype=torch.long), dtype=torch.float)) # 使用现有张量的大小创建新张量

tensor([1, 2, 3, 4, 5])
tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])
tensor([[1.0513e-34, 0.0000e+00, 8.9350e-35],
        [0.0000e+00, 1.1210e-43, 0.0000e+00]])
tensor([[0.2040, 0.1415, 0.3545],
        [0.2630, 0.1898, 0.6299]])
tensor([[0, 0, 0],
        [0, 0, 0]])
tensor([[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]])
tensor([[ 1.1768,  0.0888, -1.1211],
        [ 0.1485,  0.4914,  1.6770]])

定义操作

凡是用Tensor进行各种运算的，都是Function

最终，还是需要用Tensor来进行计算的，计算无非是

基本运算，加减乘除，求幂求余
布尔运算，大于小于，最大最小
线性运算，矩阵乘法，求模，求行列式

基本运算包括： abs/sqrt/div/exp/fmod/pow ，及一些三角函数 cos/ sin/ asin/ atan2/ cosh，及 ceil/round/floor/trunc 等具体在使用的时候可以百度一下

布尔运算包括： gt/lt/ge/le/eq/ne，topk, sort, max/min

线性计算包括： trace, diag, mm/bmm，t，dot/cross，inverse，svd 等

不再多说，需要使用的时候百度一下即可

m = torch.Tensor([[2, 5, 3, 7],[4, 2, 1, 9]])
print(m.size(0), m.size(1), m.size(), sep=' -- ') # m的规模
print(m.numel()) # m的元素数量
print(m[0][2]) # m的(0,2)元素
print(m[:, 1]) # m的1列元素
print(m[0, :]) # m的0行元素

2 -- 4 -- torch.Size([2, 4])
8
tensor(3.)
tensor([5., 2.])
tensor([2., 5., 3., 7.])

v = torch.arange(1, 5, dtype=torch.float) # 生成1到4的张量
print(v)
print(m @ v) # 向量点乘
print(m + torch.rand(2, 4)) # 相加
print(m.t()) # 转置，也可以使用 transpose
print(torch.linspace(3, 8, 20)) # 从3到8线性增长的20个数

tensor([1., 2., 3., 4.])
tensor([49., 47.])
tensor([[2.8234, 5.8941, 3.0906, 7.7028],
        [4.7184, 2.6416, 1.3373, 9.0231]])
tensor([[2., 4.],
        [5., 2.],
        [3., 1.],
        [7., 9.]])
tensor([3.0000, 3.2632, 3.5263, 3.7895, 4.0526, 4.3158, 4.5789, 4.8421, 5.1053,
        5.3684, 5.6316, 5.8947, 6.1579, 6.4211, 6.6842, 6.9474, 7.2105, 7.4737,
        7.7368, 8.0000])

from matplotlib import pyplot as plt

plt.hist(torch.randn(1000).numpy(), 100); 
# 生成均值0，方差1的随机数，转换为numpy类型并展示

2023-07-12T14:07:29.png

a = torch.Tensor([[1, 2, 3, 4]])
b = torch.Tensor([[5, 6, 7, 8]])
print( torch.cat((a,b), 0)) # 在0方向上拼接，得到2x4矩阵
print( torch.cat((a,b), 1)) # 在1方向上拼接，得到1x8矩阵

tensor([[1., 2., 3., 4.],
        [5., 6., 7., 8.]])
tensor([[1., 2., 3., 4., 5., 6., 7., 8.]])

螺旋数据分类

实验书：Github

根据要求初始化样本

初始化样本

线性分类

使用的是交叉熵（cross entropy loss）损失函数

线性分类

Sequential(
  (0): Linear(in_features=2, out_features=100, bias=True)
  (1): Linear(in_features=100, out_features=3, bias=True)
)

上面使用 print(model) 把模型输出，可以看到有两层：

第一层输入为 2（因为特征维度为主2），输出为 100；
第二层输入为 100 （上一层的输出），输出为 3（类别数）

二层神经网络分类

在两层中加入一个 ReLU 激活函数

加入 ReLU 激活函数

Sequential(
  (0): Linear(in_features=2, out_features=100, bias=True)
  (1): ReLU()
  (2): Linear(in_features=100, out_features=3, bias=True)
)

发现分类的准确性有大幅提升

问题总结

[scode type="green"]1. AlexNet有哪些特点？为什么可以比LeNet取得更好的性能？[/scode]

AlexNet相对于LeNet具有更深的网络结构和更多的参数，并引入了ReLU激活函数和Dropout正则化技术

[scode type="green"]2. 激活函数有哪些作用？[/scode]

引入非线性性质，增加网络的表达能力；解决梯度消失或梯度爆炸问题；对输入进行映射，使得神经网络可以学习非线性关系；提供稀疏性，使得部分神经元被激活而其他神经元被抑制；增加网络的鲁棒性，对输入的扰动有一定的容忍度

[scode type="green"]3. 梯度消失现象是什么？[/scode]

在深层神经网络中，反向传播算法中的梯度逐层递减，导致较浅层网络参数更新缓慢甚至不更新，使得这些层难以有效学习和更新权重，从而影响网络的训练和性能

[scode type="green"]4. 神经网络是更宽好还是更深好？[/scode]

当数据集较小或者较简单时，更浅而宽的网络可能更合适，因为它们具有更少的参数和计算复杂度。而当数据集较大或者较复杂时，更深的网络通常能够更好地捕捉到数据中的复杂特征和关系

[scode type="green"]5. 为什么要使用Softmax? [/scode]

将输入值映射到0到1之间的概率分布；保持了概率的相对大小关系；输出概率和为1。这使得Softmax函数在多类别分类任务中可以方便地计算每个类别的概率，并选择概率最高的类别作为预测结果

[scode type="green"]6. SGD 和 Adam 哪个更有效？[/scode]

Adam优化算法相对于传统的随机梯度下降（SGD）具有更快的收敛速度和更好的性能。Adam算法结合了自适应学习率和动量的特性，适用于大多数深度学习任务，并且对超参数的选择相对较稳定。SGD更适合在数据集较小或者噪声较大的情况下

[post cid="97"/]