深度学习/facenet-pytorch源码解析

facenet-pytorch 源码详解

1. 组织结构

facenet-pytorch
—data
—dependencies 依赖
—examples 使用示例代码
—models 训练好的模型
—tests 测试代码 ,迭代性能等
—mydemo 我自己的代码

2. 运行过程问题详解

  1. datasets.ImageFolder() 这里要使用绝对路径,使用相对路径会报错!

  2. MTCNN( ) 用于处理人脸对齐的,传入的参数详解:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    	image_size:输出的(对齐后的)图片的大小

    ​ margin:要添加的边框的边距,默认为 0

    ​ min_face_size: 最小的可检测的人脸大小(单位px),默认20; 对于小于该值的人脸会忽略掉。

    ​ thresholds: 人脸检测的阈值,默认[0.6, 0.7, 0.7],阈值越高,通过率和误识率越低,阈值越低,通过率和误识率越高。所以阈值不是越高越好,也不是越低越好

    ​ factor: 用于创建面部尺寸缩放金字塔的因子,默认:{0.709}

    ​ post_process:在返回之前 是否发布图像处理张量。

    ​ select_largest:如果为True,则如果检测到多个面部,则返回最大的面部。如果为False,则返回具有最高检测概率的面部。

    ​ keep_all:如果为True,则按照select_largest参数指定的顺序返回所有检测到的面部。如果指定了save_path,则将第一个面保存到该路径,并将其余面保存到<save_path> 1,<save_path> 2等

    ​ device:通过什么设备运行神经网络

    mtcnn使用方法
    >>> from facenet_pytorch import MTCNN
    >>> mtcnn = MTCNN() # 这里是初始实例化对象
    >>> face_tensor, prob = mtcnn(img, save_path='face.png', return_prob=True) # 这里是调用对象的forword函数
  3. RuntimeError: Calculated padded input size per channel: (2 x 2). Kernel size: (3 x 3). Kernel size can’t be greater than actual input size

    点进 出错的源码:

    是出现在Conv2d这个类里面一个conv2d_forward函数里面

    然后查看这个类的注释:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    Applies a 2D convolution over an input signal composed of several input
    planes.
    在由多个输入平面组成的输入信号上应用2D卷积。

    In the simplest case, the output value of the layer with input size
    :math:`(N, C_{\text{in}}, H, W)` and output :math:`(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})`
    can be precisely described as:

    .. math::
    \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) +
    \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)

image-20200905164953780

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
where :math:`\star` is the valid 2D `cross-correlation`_ operator,
:math:`N` is a batch size, :math:`C` denotes a number of channels,
:math:`H` is a height of input planes in pixels, and :math:`W` is
width in pixels.

* :attr:`stride` controls the stride for the cross-correlation, a single
number or a tuple.

* :attr:`padding` controls the amount of implicit zero-paddings on both
sides for :attr:`padding` number of points for each dimension.

* :attr:`dilation` controls the spacing between the kernel points; also
known as the à trous algorithm. It is harder to describe, but this `link`_
has a nice visualization of what :attr:`dilation` does.

* :attr:`groups` controls the connections between inputs and outputs.
:attr:`in_channels` and :attr:`out_channels` must both be divisible by
:attr:`groups`. For example,

* At groups=1, all inputs are convolved to all outputs.
* At groups=2, the operation becomes equivalent to having two conv
layers side by side, each seeing half the input channels,
and producing half the output channels, and both subsequently
concatenated.
* At groups= :attr:`in_channels`, each input channel is convolved with
its own set of filters, of size:
:math:`\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor`.

The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:

- a single ``int`` -- in which case the same value is used for the height and width dimension
- a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
and the second `int` for the width dimension

.. note::

Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid `cross-correlation`_,
and not a full `cross-correlation`_.
It is up to the user to add proper padding.

.. note::

When `groups == in_channels` and `out_channels == K * in_channels`,
where `K` is a positive integer, this operation is also termed in
literature as depthwise convolution.

In other words, for an input of size :math:`(N, C_{in}, H_{in}, W_{in})`,
a depthwise convolution with a depthwise multiplier `K`, can be constructed by arguments
:math:`(in\_channels=C_{in}, out\_channels=C_{in} \times K, ..., groups=C_{in})`.

.. include:: cudnn_deterministic.rst

Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of the input. Default: 0
padding_mode (string, optional). Accepted values `zeros` and `circular` Default: `zeros`
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``True``

Shape:
- Input: :math:`(N, C_{in}, H_{in}, W_{in})`
- Output: :math:`(N, C_{out}, H_{out}, W_{out})` where

.. math::
H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0]
\times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor

.. math::
W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1]
\times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor

Attributes:
weight (Tensor): the learnable weights of the module of shape
:math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},`
:math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
The values of these weights are sampled from
:math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{1}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
bias (Tensor): the learnable bias of the module of shape (out_channels). If :attr:`bias` is ``True``,
then the values of these weights are
sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{1}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`

Examples::

>>> # With square kernels and equal stride
>>> m = nn.Conv2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> # non-square kernels and unequal stride and with padding and dilation
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = m(input)

.. _cross-correlation:
https://en.wikipedia.org/wiki/Cross-correlation

.. _link:
https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md

然后我就将

1
2
3
4
5
mtcnn = MTCNN(
image_size=64, margin=0, min_face_size=20,
thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True,
device=device
)

中的 image_size=64 修改成 image_size=96 ,结果运行很顺畅,没有任何问题。最终经过的多次测试, image_size的值最小为 75 都是没有问题的,只要小于这个值就会报错。说明,说明核的单位大小是25

  1. InceptionResnetV1类的详解
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
"""Inception Resnet V1 model with optional loading of pretrained weights.

Model parameters can be loaded based on pretraining on the VGGFace2 or CASIA-Webface
datasets. Pretrained state_dicts are automatically downloaded on model instantiation if
requested and cached in the torch cache. Subsequent instantiations use the cache rather than
redownloading.

Keyword Arguments:
pretrained {str} -- Optional pretraining dataset. Either 'vggface2' or 'casia-webface'.
(default: {None})
classify {bool} -- Whether the model should output classification probabilities or feature
embeddings. (default: {False})
num_classes {int} -- Number of output classes. If 'pretrained' is set and num_classes not
equal to that used for the pretrained model, the final linear layer will be randomly
initialized. (default: {None})
dropout_prob {float} -- Dropout probability. (default: {0.6})
"""

Inception Resnet V1模型,可以选择加载预训练的权重。可以基于对VGGFace2或CASIA-Webface数据集的预训练来加载模型参数。如果需要,可以在模型实例化时自动下载预训练的state_dict,并将其缓存在火炬缓存中。使用缓存而不是重新下载关键字参数:pretrained {str}-可选的pretraining数据集“ vggface2”或“ casia-webface”(默认值:{None})对{bool}进行分类–模型是否应输出分类概率或特征嵌入(默认值:{False})num_classes {int}-输出类数。如果设置了“ pretrained”,并且num_classes不等于用于预训练模型的数,则最终的线性层将被随机初始化。 (默认:{None})dropout_prob {float}-退出概率。(默认:{0.6})

混淆矩阵

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
datasets.ImageFolder()
"""A generic data loader where the images are arranged in this way: ::

root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png

Args:
root (string): Root directory path.
transform (callable, optional): A function/transform that takes in an PIL image
and returns a transformed version. E.g, ``transforms.RandomCrop``
target_transform (callable, optional): A function/transform that takes in the
target and transforms it.
loader (callable, optional): A function to load an image given its path.
is_valid_file (callable, optional): A function that takes path of an Image file
and check if the file is a valid_file (used to check of corrupt files)
root(字符串):根目录路径。
transform(可调用,可选):接受PIL映像的函数/转换
并返回转换后的版本。如”、“transforms.RandomCrop ' '
target_transform(可调用,可选):一个函数/转换,它接受目标和转换它。
loader(可调用,可选):一个用于加载给定路径的图像的函数。
is_valid_file(可调用,可选):接受图像文件路径的函数
检查文件是否为valid_file(用于检查损坏的文件)

Attributes:
classes (list): List of the class names.
class_to_idx (dict): Dict with items (class_name, class_index).
imgs (list): List of (image path, class_index) tuples
"""