Vectors, Matrices, and Tensors

Scalars, vectors, matrices, and tensors are the fundamental data structures of deep learning. In this section, we will briefly review these concepts.

Scalar

rank-0 tensor
\(x \in \mathbb{R}\)
x = 1.23

Vector

rank-1 tensor
\( x \in \mathbb{R}^nx1 \)

\[\begin{split} \mathbf{x}=\left[\begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_n \end{array}\right] \end{split}\]

Matrix

rank-2 tensor
\( x \in \mathbb{R}^{nxm} \)

\[\begin{split} \mathbf{X}=\left[\begin{array}{cccc} x_{1,1} & x_{1,2} & \ldots & x_{1, n} \\ x_{2,1} & x_{2,2} & \ldots & x_{2, n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m, 1} & x_{m, 2} & \ldots & x_{m, n} \end{array}\right] \end{split}\]

import torch

t = torch.tensor([ [1, 2, 3, 4,], [6, 7, 8, 9] ])

print(t)
print(t.shape)
print(t.ndim)
print(t.dtype)

tensor([[1, 2, 3, 4],
        [6, 7, 8, 9]])
torch.Size([2, 4])
2
torch.int64

Data onto the GPU

print(torch.cuda.is_available())
print(torch.backends.mps.is_available())

if torch.cuda.is_available():
  t = t.to(torch.device('cuda:0'))
  print(t)

False
False

Broadcasting

Making Vector and Matrix computations more convenient

Computing the Output From Multiple Training Examples at Once

The perceptron algorithm is typically considered an “online” algorithm (i.e., it updates the weights after each training example)
However, during prediction (e.g., test set evaluation), we could pass all data points at once (so that we can get rid of the “forloop”)
Two opportunities for parallelism:
1. computing the dot product in parallel
2. computing multiple dot products at once

import torch

X = torch.arange(6).view(2, 3)

print(X)

w = torch.tensor([1, 2, 3])

print(w)

print(X.matmul(w))

w = w.view(-1, 1)

print(X.matmul(w))

tensor([[0, 1, 2],
        [3, 4, 5]])
tensor([1, 2, 3])
tensor([ 8, 26])
tensor([[ 8],
        [26]])

This (general) feature is called “broadcasting”

print(torch.tensor([1, 2, 3]) + 1)

t = torch.tensor([[4, 5, 6], [7, 8, 9]])

print(t)

print( t + torch.tensor([1, 2, 3]))

tensor([2, 3, 4])
tensor([[4, 5, 6],
        [7, 8, 9]])
tensor([[ 5,  7,  9],
        [ 8, 10, 12]])

Notational Linear Algebra

X = torch.arange(50, dtype=torch.float).view(10, 5)

print(X)

fc = torch.nn.Linear(in_features=5, out_features=3)

print(fc.weight)

print(fc.bias)

print(f"X dim: {X.size()}")
print(f"Weights dim: {fc.weight.size()}")
print(f"bias dim: {fc.bias.size()}")

A = fc(X)

print(f"A dim: {A.size()}")

tensor([[ 0.,  1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.,  9.],
        [10., 11., 12., 13., 14.],
        [15., 16., 17., 18., 19.],
        [20., 21., 22., 23., 24.],
        [25., 26., 27., 28., 29.],
        [30., 31., 32., 33., 34.],
        [35., 36., 37., 38., 39.],
        [40., 41., 42., 43., 44.],
        [45., 46., 47., 48., 49.]])
Parameter containing:
tensor([[ 0.0283, -0.2716, -0.1834, -0.4373,  0.2282],
        [ 0.2731, -0.2637, -0.3609, -0.0048,  0.0700],
        [ 0.2773, -0.2305, -0.1061, -0.4259, -0.3791]], requires_grad=True)
Parameter containing:
tensor([-0.3188, -0.0365,  0.1801], requires_grad=True)
X dim: torch.Size([10, 5])
Weights dim: torch.Size([3, 5])
bias dim: torch.Size([3])
A dim: torch.Size([10, 3])