Machine Learning Notes - Pytorch

Pytorch Introduction

PyTorch is an open source machine learning framework. You can find more information about PyTorch by following one of the oficial tutorials or by reading the documentation .

Import Pythorch

torch.cude.is_available()

In [141]:
1.8.1
Is cuda available? False

Pytorch Tensor

Tensor Initialization

torch.tensor(), torch.from_numpy(), torch.zerors_like(), torch.ones_like(), torch.rand(), torch.ones(), torch.zeros(), torch.eye(), torch.full(),

In [142]:
Direct Tensor: 
 tensor([[1, 2],
        [3, 4]]) 

Numpy Tensor: 
 tensor([[1, 2],
        [3, 4]]) 

Zeros Tensor: 
 tensor([[0, 0],
        [0, 0]]) 

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.3355, 0.0668],
        [0.2169, 0.7994]]) 

Random Tensor: 
 tensor([[0.0993, 0.1139, 0.0951],
        [0.6038, 0.0597, 0.6543]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])

Full twos Tensor: 
 tensor([[2, 2, 2],
        [2, 2, 2]])

Tensor Attributes

tensor.dim(), tensor.shape, tensor.dtype, tensor.device,

In [143]:
Dimension of tensor: 2

Shape of tensor: torch.Size([3, 4])

Datatype of tensor: torch.float32

Device tensor is stored on: cpu

Tensor Indexing

tensor[start:stop:step]

In [144]:
Original tensor:
tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
shape:  torch.Size([3, 4])

Single row:
torch.Size([4]) tensor([5, 6, 7, 8])
shape:  torch.Size([4]) torch.Size([4])

Single column:
tensor([ 2,  6, 10])
shape:  torch.Size([3])

First two rows, last two columns:
tensor([[2, 3, 4],
        [6, 7, 8]])
shape:  torch.Size([2, 3])

Every other row, middle columns:
tensor([[ 2,  3],
        [10, 11]])
shape:  torch.Size([2, 2])

More generally, given index arrays idx0 and idx1 with N elements each, a[idx0, idx1] is equivalent to:

torch.tensor([
  a[idx0[0], idx1[0]],
  a[idx0[1], idx1[1]],
  ...,
  a[idx0[N - 1], idx1[N - 1]]
])

(A similar pattern extends to tensors with more than two dimensions)

In [145]:
Original tensor:
tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
shape:  torch.Size([3, 4])

Reordered columns:
tensor([[ 4,  3,  2,  1],
        [ 8,  7,  6,  5],
        [12, 11, 10,  9]])

Reordered rows/columns:
tensor([4, 7, 2])
In [146]:
Original tensor:
tensor([[1, 2],
        [3, 4],
        [5, 6]])

Mask tensor:
tensor([[False, False],
        [False,  True],
        [ True,  True]])

Selecting elements with the mask:
tensor([4, 5, 6])

After modifying with a mask:
tensor([[0, 0],
        [0, 4],
        [5, 6]])

Tensor Operations

tensor.cat(), torch.sum(), torch.mean(), torch.max(), torch.min(), torch.dot(), torch.mm(), torch.mv(), torch.addmm(), torch,addmv(),torch.bmm(), torch,baddmm(), torch.matmul(), torch.torch.broadcast_tensors()

Tensor Stacks

In [147]:
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 2., 1.],
        [1., 0., 1., 3.]])

horizontal cat:
 tensor([[1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 2., 1., 1., 0., 2., 1.],
        [1., 0., 1., 3., 1., 0., 1., 3.]])

vetical cat:
 tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 2., 1.],
        [1., 0., 1., 3.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 2., 1.],
        [1., 0., 1., 3.]])

Elementwise Operations

In [148]:
Elementwise sum:
tensor([[ 6.,  8., 10., 12.]])
tensor([[ 6.,  8., 10., 12.]])
tensor([[ 6.,  8., 10., 12.]])

Elementwise difference:
tensor([[-4., -4., -4., -4.]])
tensor([[-4., -4., -4., -4.]])
tensor([[-4., -4., -4., -4.]])

Elementwise product:
tensor([[ 5., 12., 21., 32.]])
tensor([[ 5., 12., 21., 32.]])
tensor([[ 5., 12., 21., 32.]])

Elementwise division
tensor([[0.2000, 0.3333, 0.4286, 0.5000]])
tensor([[0.2000, 0.3333, 0.4286, 0.5000]])
tensor([[0.2000, 0.3333, 0.4286, 0.5000]])

Elementwise power
tensor([[1.0000e+00, 6.4000e+01, 2.1870e+03, 6.5536e+04]])
tensor([[1.0000e+00, 6.4000e+01, 2.1870e+03, 6.5536e+04]])
tensor([[1.0000e+00, 6.4000e+01, 2.1870e+03, 6.5536e+04]])

Reduction Operations

Reduction operations reduce the rank of tensors: the dimension over which you perform the reduction will be removed from the shape of the output. If you pass keepdim=True to a reduction operation, the specified dimension will not be removed; the output tensor will instead have a shape of 1 in that dimension.

In [149]:
Original tensor:
tensor([[1., 2., 3.],
        [4., 5., 6.]])

Sum over entire tensor:
tensor(21.)

Sum of each row:
torch.Size([3])

Sum of each column:
tensor([ 6., 15.])

Matrix Operations

  • @ is used for multiplication, same as torch.mm() , torch.mv() , torch.bmm() , torch.bmv()
  • torch.matmul()
    • If both tensors are 1-dimensional, the dot product (scalar) is returned.
    • If both arguments are 2-dimensional, the matrix-matrix product is returned.
    • If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned.
    • If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.
    • Support broadcasting
In [150]:
 x0 shape: torch.Size([])

 x1 shape: torch.Size([2, 3])

 x2 shape: torch.Size([2])

 x3 shape: torch.Size([2, 3])

 x4 shape: torch.Size([10, 3, 5])

 vector x vector torch.Size([])
 matrix x vector torch.Size([3])
 batched matrix x broadcasted vector torch.Size([10, 3])
 batched matrix x broadcasted vector torch.Size([10, 3, 5])
 batched matrix x broadcasted vector torch.Size([10, 3, 5])

Tensor Broadcasting

  • Each tensor has at least one dimension.
  • When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.
In [151]:
Here is x (before broadcasting):
tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])
x.shape:  torch.Size([4, 3])

Here is v (before broadcasting):
tensor([1, 0, 1])
v.shape:  torch.Size([3])

Here is xx (after) broadcasting):
tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])
xx.shape:  torch.Size([4, 3])

Here is vv (after broadcasting):
tensor([[1, 0, 1],
        [1, 0, 1],
        [1, 0, 1],
        [1, 0, 1]])
vv.shape:  torch.Size([4, 3])

Tensor Data Type

tensor.to(), tensor.new_zeros(), tensor.float(), tensor.double(),

In [152]:
dtype when torch chooses for us:
List of integers: torch.int64
List of floats: torch.float32
Mixed list: torch.float32

dtype when we force a datatype:
32-bit float:  torch.float32
32-bit integer:  torch.int32
64-bit integer:  torch.int64

torch.ones with different dtypes
default dtype: torch.float32
16-bit integer: torch.int16
8-bit unsigned integer: torch.uint8

x0: torch.int64
x1: torch.float32
x2: torch.float64
x3: torch.float32
x4: torch.float64

x0 shape is torch.Size([3, 3]), dtype is torch.float64
x1 shape is torch.Size([3, 3]), dtype is torch.float64
x2 shape is torch.Size([4, 5]), dtype is torch.float64
x3 shape is torch.Size([6, 7]), dtype is torch.float64

Tensor Reshape

tensor.view(), tensor.reshape(), tensor.transpose(), tensor.permute(), tensor.contiguous()

The view() function takes elements in row-major order, so you cannot transpose matrices with view() .

In [153]:
Original tensor:
tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])
shape: torch.Size([2, 4])

Flattened tensor:
tensor([1, 2, 3, 4, 5, 6, 7, 8])
shape: torch.Size([8])

Rank 2 tensor:
shape: torch.Size([8, 1])

Rank 3 tensor:
shape: torch.Size([2, 2, 2])
In [154]:
Original tensor:
tensor([[[ 1,  2,  3,  4],
         [ 5,  6,  7,  8],
         [ 9, 10, 11, 12]],

        [[13, 14, 15, 16],
         [17, 18, 19, 20],
         [21, 22, 23, 24]]])
shape: torch.Size([2, 3, 4])

Swap axes 1 and 2:
tensor([[[ 1,  5,  9],
         [ 2,  6, 10],
         [ 3,  7, 11],
         [ 4,  8, 12]],

        [[13, 17, 21],
         [14, 18, 22],
         [15, 19, 23],
         [16, 20, 24]]])
torch.Size([2, 4, 3])

Permute axes
tensor([[[ 1, 13],
         [ 2, 14],
         [ 3, 15],
         [ 4, 16]],

        [[ 5, 17],
         [ 6, 18],
         [ 7, 19],
         [ 8, 20]],

        [[ 9, 21],
         [10, 22],
         [11, 23],
         [12, 24]]])
shape: torch.Size([3, 4, 2])

Tensor Test

torch.all(), torch.is_close(), torch.all_close()

Tensor Range

torch.arange(), torch.linspace()

In [ ]:

Tensor GPU

In [155]:
x0 device: cpu
Cuda is not available.

Pytorch Autograd

tensor.requires_grad_(), tensor.detach()

Every Tensor has a flag: requires_grad that allows for fine grained exclusion of subgraphs from gradient computation and can increase efficiency.

In [156]:
a requires_grad: False
b requires_grad: True
z grad: 
 tensor([[2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.]])

Pytorch DataLoader

Pytorch Transforms

Pytorch Loss Functions

torch.nn.MSELoss(), torch.nn.CrossEntropyLoss(),torch.nn.MultiLabelMarginLoss(),

MSELoss()

  • Input: (N, )
  • Target: (N, )
  • Output: Scalar. If reduction is 'none', then (N,)
In [157]:

CrossEntropyLoss()

  • Input: (N, C), N is batch size, C is number of classes
  • Target: (N,), $0 \leq target[i] \leq C-1 $
  • Outtput: (N,)
In [158]:

MultiLabelMarginLoss()

Creates a criterion that optimizes a multi-class multi-classification hinge loss. This means that for a sample x, it could have multiple correct labels.

  • Input: (C,) or (N, C), N is batch size, C is number of classes
  • Target: (C) or (N, C), label targets after first -1 are ignored
  • Output: Scalar. If reduction is 'none', then (N,)
In [159]:
single class loss: 0.3250
multi-class loss: 0.8500

Torch Activation Functions

Torch Optimizer

torch.optim.SGD(),torch.optim.RMSprop(), torch.optim.Adam()

torch.optim is a package implementing various optimization algorithms. Most commonly used methods are already supported, and the interface is general enough, so that more sophisticated ones can be also easily integrated in the future. To use torch.optim you have to construct an optimizer object, that will hold the current state and will update the parameters based on the computed gradients.

Note: If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.

Construct Optimizers

In [ ]:

Set Parameter Options

This means that model.bas e’s parameters will use the default learning rate of 1e-2 , model.classifier ’s parameters will use a learning rate of 1e-3 , and a momentum of 0.9 will be used for all parameters.

In [35]:
SGD (
Parameter Group 0
    dampening: 0
    lr: 0.001
    momentum: 0.8
    nesterov: False
    weight_decay: 0

Parameter Group 1
    dampening: 0
    lr: 0.01
    momentum: 0.9
    nesterov: False
    weight_decay: 0
)
group_id: 0, lr: 0.001, momentum: 0.8
group_id: 1, lr: 0.01, momentum: 0.9

Model Parameter

In [31]:
Named parameters...
layer1.0.weight
torch.Size([16, 1, 5, 5]) 

layer1.0.bias
torch.Size([16]) 

layer1.1.weight
torch.Size([16]) 

layer1.1.bias
torch.Size([16]) 

layer2.0.weight
torch.Size([32, 16, 5, 5]) 

layer2.0.bias
torch.Size([32]) 

layer2.1.weight
torch.Size([32]) 

layer2.1.bias
torch.Size([32]) 

fc.weight
torch.Size([10, 1568]) 

fc.bias
torch.Size([10]) 

Taking an Optimization Step

In [ ]:
This blog is converted from machine-learning-pytorch.ipynb
Written on May 4, 2021