Einops, Einsum & Tensor Manipulation
学习目标
einops
执行基本的张量重排,以及如何使用einsum
对张量执行标准线性代数运算.阅读材料
- 请在此处了解
einops
库的优势. - 如果你还没看过的话,请先看看 Einops 基础教程 (直到 "fancy examples 部分").
- 阅读 einsum is all you need (或者看他的视频) 来简要的浏览
einsum
函数及其工作原理 (你无需阅读 2.10 节)
预先准备
python
import os
import sys
import math
import numpy as np
import einops
import torch as t
from pathlib import Path
# Make sure exercises are in the path
chapter = r"chapter0_fundamentals"
exercises_dir = Path(f"{os.getcwd().split(chapter)[0]}/{chapter}/exercises").resolve()
section_dir = exercises_dir / "part0_prereqs"
if str(exercises_dir) not in sys.path: sys.path.append(str(exercises_dir))
from plotly_utils import imshow, line, bar
from part0_prereqs.utils import display_array_as_img
import part0_prereqs.tests as tests
MAIN = __name__ == "__main__"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Einops
python
arr = np.load(section_dir / "numbers.npy")
1
arr
是一个 4D 的 numpy 数组。第一个轴对应数字,接下来三个轴分别是通道 (即 RGB), 高度和宽度。我们提供了函数 utils.display_array_as_img
, 他能接受 numpy 数组并将其显示为图像。该函数有两种运行方式:
- 如果输入一个 3 维的数组,三个维度会被按照
(通道, 高度, 宽度)
的方式来理解 -- 换句话说,就是和 RGB 图片一样. - 如果输入一个 2 维的数组,两个维度会被按照
(高度, 宽度)
的方式来理解 -- 即单色图像. 举个例子:
python
display_array_as_img(arr[0])
1
下面是一系列的图像,这些图像是使用 arr
上执行的 einops
函数创建的。你应该尝试自己展示每张图像,此页面还包含 Solution, 但是你应该在至少尝试 5 分钟后再查看他们. 这部分内容不是很重要,所以如果你觉得没有必要的话可以直接跳到后面.
Einops exercises - images
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪
你应该花最多45分钟在这个练习上.
如果你觉得你已经基本理解了有关内容,那么你可以跳到下个小节.
Importance: 🔵🔵🔵⚪⚪
你应该花最多45分钟在这个练习上.
如果你觉得你已经基本理解了有关内容,那么你可以跳到下个小节.
Exercise 1
python
# Your code here - define arr1
display_array_as_img(arr1)
1
2
3
2
3
Solution
arr1 = einops.rearrange(arr, "b c h w -> c h (b w)")
Exercise2
python
# Your code here - define arr2
display_array_as_img(arr2)
1
2
3
2
3
Solution
arr2 = einops.repeat(arr[0], "c h w -> c (2 h) w")
Exercise3
python
# Your code here - define arr3
display_array_as_img(arr3)
1
2
3
2
3
Solution
arr3 = einops.repeat(arr[0:2], "b c h w -> c (b h) (2 w)")
Exercise4
python
# Your code here - define arr4
display_array_as_img(arr4)
1
2
3
2
3
Solution
arr4 = einops.repeat(arr[0], "c h w -> c (h 2) w")
Exercise 5
python
# Your code here - define arr5
display_array_as_img(arr5)
1
2
3
2
3
Solution
arr5 = einops.rearrange(arr[0], "c h w -> h (c w)")
Exercise 6
python
# Your code here - define arr6
display_array_as_img(arr6)
1
2
3
2
3
Solution
arr6 = einops.rearrange(arr, "(b1 b2) c h w -> c (b1 h) (b2 w)", b1=2)
Exercise 7
python
# Your code here - define arr7
display_array_as_img(arr7)
1
2
3
2
3
Solution
arr7 = einops.reduce(arr.astype(float), "b c h w -> h (b w)", "max").astype(int)
Exercise 8
python
# Your code here - define arr8
display_array_as_img(arr8)
1
2
3
2
3
Solution
arr8 = einops.reduce(arr.astype(float), "b c h w -> h w", "min").astype(int)
Exercise 9
python
# Your code here - define arr9
display_array_as_img(arr9)
1
2
3
2
3
Solution
arr9 = einops.rearrange(arr[1], "c h w -> c w h")
Exercise 10
python
# Your code here - define arr10
display_array_as_img(arr10)
1
2
3
2
3
Solution
arr10 = einops.reduce(arr, "(b1 b2) c (h h2) (w w2) -> c (b1 h) (b2 w)", "max", h2=2, w2=2, b1=2)
Einops exercises - operations
接下来,我们有一系列的函数需要你使用 einops
库实现其功能。在所有这些练习下方的 Solution 中可以查看解答. 首先,我们来定义一些函数来帮助我们测试你的 Solution 是否正确.
python
def assert_all_equal(actual: t.Tensor, expected: t.Tensor) -> None:
assert actual.shape == expected.shape, f"Shape mismatch, got: {actual.shape}"
assert (actual == expected).all(), f"Value mismatch, got: {actual}"
print("Passed!")
def assert_all_close(actual: t.Tensor, expected: t.Tensor, rtol=1e-05, atol=0.0001) -> None:
assert actual.shape == expected.shape, f"Shape mismatch, got: {actual.shape}"
assert t.allclose(actual, expected, rtol=rtol, atol=atol)
print("Passed!")
1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
Exercise A.1 - rearrnage(1)
python
def rearrange_1() -> t.Tensor:
'''Return the following tensor using only torch.arange and einops.rearrange:
[[3, 4],
[5, 6],
[7, 8]]
'''
pass
expected = t.tensor([[3, 4], [5, 6], [7, 8]])
assert_all_equal(rearrange_1(), expected)
1
2
3
4
5
6
7
8
9
10
11
12
2
3
4
5
6
7
8
9
10
11
12
Solution
def rearrange_1() -> t.Tensor:
'''Return the following tensor using only torch.arange and einops.rearrange:
[[3, 4],
[5, 6],
[7, 8]]
'''
return einops.rearrange(t.arange(3, 9), "(h w) -> h w", h=3, w=2)
Exercise A.2 - rearrange(2)
python
def rearrange_2() -> t.Tensor:
'''Return the following tensor using only torch.arange and einops.rearrange:
[[1, 2, 3],
[4, 5, 6]]
'''
pass
assert_all_equal(rearrange_2(), t.tensor([[1, 2, 3], [4, 5, 6]]))
1
2
3
4
5
6
7
8
9
10
2
3
4
5
6
7
8
9
10
Solution
def rearrange_2() -> t.Tensor:
'''Return the following tensor using only torch.arange and einops.rearrange:
[[1, 2, 3],
[4, 5, 6]]
'''
return einops.rearrange(t.arange(1, 7), "(h w) -> h w", h=2, w=3)
Exercise A.3 - rearrnage(3)
python
def rearrange_3() -> t.Tensor:
'''Return the following tensor using only torch.arange and einops.rearrange:
[[[1], [2], [3], [4], [5], [6]]]
'''
pass
assert_all_equal(rearrange_3(), t.tensor([[[1], [2], [3], [4], [5], [6]]]))
1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
Solution
def rearrange_3() -> t.Tensor:
'''Return the following tensor using only torch.arange and einops.rearrange:
[[[1], [2], [3], [4], [5], [6]]]
'''
return einops.rearrange(t.arange(1, 7), "a -> 1 a 1")
Exercise B.1 - temperature average
python
def temperatures_average(temps: t.Tensor) -> t.Tensor:
'''Return the average temperature for each week.
temps: a 1D temperature containing temperatures for each day.
Length will be a multiple of 7 and the first 7 days are for the first week, second 7 days for the second week, etc.
You can do this with a single call to reduce.
'''
assert len(temps) % 7 == 0
pass
temps = t.Tensor([71, 72, 70, 75, 71, 72, 70, 68, 65, 60, 68, 60, 55, 59, 75, 80, 85, 80, 78, 72, 83])
expected = t.tensor([71.5714, 62.1429, 79.0])
assert_all_close(temperatures_average(temps), expected)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Solution
def temperatures_average(temps: t.Tensor) -> t.Tensor:
'''Return the average temperature for each week.
temps: a 1D temperature containing temperatures for each day.
Length will be a multiple of 7 and the first 7 days are for the first week, second 7 days for the second week, etc.
You can do this with a single call to reduce.
'''
assert len(temps) % 7 == 0
return einops.reduce(temps, "(h 7) -> h", "mean")
Exercise B.2 - temperature difference
python
def temperatures_differences(temps: t.Tensor) -> t.Tensor:
'''For each day, subtract the average for the week the day belongs to.
temps: as above
'''
assert len(temps) % 7 == 0
pass
expected = t.tensor(
[
-0.5714,
0.4286,
-1.5714,
3.4286,
-0.5714,
0.4286,
-1.5714,
5.8571,
2.8571,
-2.1429,
5.8571,
-2.1429,
-7.1429,
-3.1429,
-4.0,
1.0,
6.0,
1.0,
-1.0,
-7.0,
4.0,
]
)
actual = temperatures_differences(temps)
assert_all_close(actual, expected)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Solution
def temperatures_differences(temps: t.Tensor) -> t.Tensor:
'''For each day, subtract the average for the week the day belongs to.
temps: as above
'''
assert len(temps) % 7 == 0
avg = einops.repeat(temperatures_average(temps), "w -> (w 7)")
return temps - avg
Exercise B.3 - temperature normalized
python
def temperatures_normalized(temps: t.Tensor) -> t.Tensor:
'''For each day, subtract the weekly average and divide by the weekly standard deviation.
temps: as above
Pass torch.std to reduce.
'''
pass
expected = t.tensor(
[
-0.3326,
0.2494,
-0.9146,
1.9954,
-0.3326,
0.2494,
-0.9146,
1.1839,
0.5775,
-0.4331,
1.1839,
-0.4331,
-1.4438,
-0.6353,
-0.8944,
0.2236,
1.3416,
0.2236,
-0.2236,
-1.5652,
0.8944,
]
)
actual = temperatures_normalized(temps)
assert_all_close(actual, expected)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Solution
def temperatures_normalized(temps: t.Tensor) -> t.Tensor:
'''For each day, subtract the weekly average and divide by the weekly standard deviation.
temps: as above
Pass torch.std to reduce.
'''
avg = einops.repeat(temperatures_average(temps), "w -> (w 7)")
std = einops.repeat(einops.reduce(temps, "(h 7) -> h", t.std), "w -> (w 7)")
return (temps - avg) / std
Exercise C - identity matrix
python
def identity_matrix(n: int) -> t.Tensor:
'''Return the identity matrix of size nxn.
Don't use torch.eye or similar.
Hint: you can do it with arange, rearrange, and ==.
Bonus: find a different way to do it.
'''
assert n >= 0
pass
assert_all_equal(identity_matrix(3), t.Tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))
assert_all_equal(identity_matrix(0), t.zeros((0, 0)))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
2
3
4
5
6
7
8
9
10
11
12
13
14
Solution
def identity_matrix(n: int) -> t.Tensor:
'''Return the identity matrix of size nxn.
Don't use torch.eye or similar.
Hint: you can do it with arange, rearrange, and ==.
Bonus: find a different way to do it.
'''
assert n >= 0
return (einops.rearrange(t.arange(n), "i->i 1") == t.arange(n)).float()
Exercise D - sample distribution
python
def sample_distribution(probs: t.Tensor, n: int) -> t.Tensor:
'''Return n random samples from probs, where probs is a normalized probability distribution.
probs: shape (k,) where probs[i] is the probability of event i occurring.
n: number of random samples
Return: shape (n,) where out[i] is an integer indicating which event was sampled.
Use torch.rand and torch.cumsum to do this without any explicit loops.
Note: if you think your solution is correct but the test is failing, try increasing the value of n.
'''
assert abs(probs.sum() - 1.0) < 0.001
assert (probs >= 0).all()
pass
n = 10000000
probs = t.tensor([0.05, 0.1, 0.1, 0.2, 0.15, 0.4])
freqs = t.bincount(sample_distribution(probs, n)) / n
assert_all_close(freqs, probs, rtol=0.001, atol=0.001)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Solution
def sample_distribution(probs: t.Tensor, n: int) -> t.Tensor:
'''Return n random samples from probs, where probs is a normalized probability distribution.
probs: shape (k,) where probs[i] is the probability of event i occurring.
n: number of random samples
Return: shape (n,) where out[i] is an integer indicating which event was sampled.
Use torch.rand and torch.cumsum to do this without any explicit loops.
Note: if you think your solution is correct but the test is failing, try increasing the value of n.
'''
assert abs(probs.sum() - 1.0) < 0.001
assert (probs >= 0).all()
return (t.rand(n, 1) > t.cumsum(probs, dim=0)).sum(dim=-1)
Exercise E - classifier accuracy
python
def classifier_accuracy(scores: t.Tensor, true_classes: t.Tensor) -> t.Tensor:
'''Return the fraction of inputs for which the maximum score corresponds to the true class for that input.
scores: shape (batch, n_classes). A higher score[b, i] means that the classifier thinks class i is more likely.
true_classes: shape (batch, ). true_classes[b] is an integer from [0...n_classes).
Use torch.argmax.
'''
assert true_classes.max() < scores.shape[1]
pass
scores = t.tensor([[0.75, 0.5, 0.25], [0.1, 0.5, 0.4], [0.1, 0.7, 0.2]])
true_classes = t.tensor([0, 1, 0])
expected = 2.0 / 3.0
assert classifier_accuracy(scores, true_classes) == expected
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Solution
def classifier_accuracy(scores: t.Tensor, true_classes: t.Tensor) -> t.Tensor:
'''Return the fraction of inputs for which the maximum score corresponds to the true class for that input.
scores: shape (batch, n_classes). A higher score[b, i] means that the classifier thinks class i is more likely.
true_classes: shape (batch, ). true_classes[b] is an integer from [0...n_classes).
Use torch.argmax.
'''
assert true_classes.max() < scores.shape[1]
return (scores.argmax(dim=1) == true_classes).float().mean()
Exercise F.1 - total price indexing
python
def total_price_indexing(prices: t.Tensor, items: t.Tensor) -> float:
'''Given prices for each kind of item and a tensor of items purchased, return the total price.
prices: shape (k, ). prices[i] is the price of the ith item.
items: shape (n, ). A 1D tensor where each value is an item index from [0..k).
Use integer array indexing. The below document describes this for NumPy but it's the same in PyTorch:
https://numpy.org/doc/stable/user/basics.indexing.html#integer-array-indexing
'''
assert items.max() < prices.shape[0]
pass
prices = t.tensor([0.5, 1, 1.5, 2, 2.5])
items = t.tensor([0, 0, 1, 1, 4, 3, 2])
assert total_price_indexing(prices, items) == 9.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Solution
def total_price_indexing(prices: t.Tensor, items: t.Tensor) -> float:
'''Given prices for each kind of item and a tensor of items purchased, return the total price.
prices: shape (k, ). prices[i] is the price of the ith item.
items: shape (n, ). A 1D tensor where each value is an item index from [0..k).
Use integer array indexing. The below document describes this for NumPy but it's the same in PyTorch:
https://numpy.org/doc/stable/user/basics.indexing.html#integer-array-indexing
'''
assert items.max() < prices.shape[0]
return prices[items].sum().item()
Exercise F.2 - gather 2D
python
def gather_2d(matrix: t.Tensor, indexes: t.Tensor) -> t.Tensor:
'''Perform a gather operation along the second dimension.
matrix: shape (m, n)
indexes: shape (m, k)
Return: shape (m, k). out[i][j] = matrix[i][indexes[i][j]]
For this problem, the test already passes and it's your job to write at least three asserts relating the arguments and the output. This is a tricky function and worth spending some time to wrap your head around its behavior.
See: https://pytorch.org/docs/stable/generated/torch.gather.html?highlight=gather#torch.gather
'''
"TODO: YOUR CODE HERE"
out = matrix.gather(1, indexes)
"TODO: YOUR CODE HERE"
return out
matrix = t.arange(15).view(3, 5)
indexes = t.tensor([[4], [3], [2]])
expected = t.tensor([[4], [8], [12]])
assert_all_equal(gather_2d(matrix, indexes), expected)
indexes2 = t.tensor([[2, 4], [1, 3], [0, 2]])
expected2 = t.tensor([[2, 4], [6, 8], [10, 12]])
assert_all_equal(gather_2d(matrix, indexes2), expected2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Solution
def gather_2d(matrix: t.Tensor, indexes: t.Tensor) -> t.Tensor:
'''Perform a gather operation along the second dimension.
matrix: shape (m, n)
indexes: shape (m, k)
Return: shape (m, k). out[i][j] = matrix[i][indexes[i][j]]
For this problem, the test already passes and it's your job to write at least three asserts relating the arguments and the output. This is a tricky function and worth spending some time to wrap your head around its behavior.
See: https://pytorch.org/docs/stable/generated/torch.gather.html?highlight=gather#torch.gather
'''
assert matrix.ndim == indexes.ndim
assert indexes.shape[0] <= matrix.shape[0]
out = matrix.gather(1, indexes)
assert out.shape == indexes.shape
return out
Exercise F.3 - total price gather
python
def total_price_gather(prices: t.Tensor, items: t.Tensor) -> float:
'''Compute the same as total_price_indexing, but use torch.gather.'''
assert items.max() < prices.shape[0]
pass
prices = t.tensor([0.5, 1, 1.5, 2, 2.5])
items = t.tensor([0, 0, 1, 1, 4, 3, 2])
assert total_price_gather(prices, items) == 9.0
1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
Solution
def total_price_gather(prices: t.Tensor, items: t.Tensor) -> float:
'''Compute the same as total_price_indexing, but use torch.gather.'''
assert items.max() < prices.shape[0]
return prices.gather(0, items).sum().item()
Exercise G - indexing
python
def integer_array_indexing(matrix: t.Tensor, coords: t.Tensor) -> t.Tensor:
'''Return the values at each coordinate using integer array indexing.
For details on integer array indexing, see:
https://numpy.org/doc/stable/user/basics.indexing.html#integer-array-indexing
matrix: shape (d_0, d_1, ..., d_n)
coords: shape (batch, n)
Return: (batch, )
'''
pass
mat_2d = t.arange(15).view(3, 5)
coords_2d = t.tensor([[0, 1], [0, 4], [1, 4]])
actual = integer_array_indexing(mat_2d, coords_2d)
assert_all_equal(actual, t.tensor([1, 4, 9]))
mat_3d = t.arange(2 * 3 * 4).view((2, 3, 4))
coords_3d = t.tensor([[0, 0, 0], [0, 1, 1], [0, 2, 2], [1, 0, 3], [1, 2, 0]])
actual = integer_array_indexing(mat_3d, coords_3d)
assert_all_equal(actual, t.tensor([0, 5, 10, 15, 20]))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Solution
def integer_array_indexing(matrix: t.Tensor, coords: t.Tensor) -> t.Tensor:
'''Return the values at each coordinate using integer array indexing.
For details on integer array indexing, see:
https://numpy.org/doc/stable/user/basics.indexing.html#integer-array-indexing
matrix: shape (d_0, d_1, ..., d_n)
coords: shape (batch, n)
Return: (batch, )
'''
return matrix[tuple(coords.T)]
Exercise H.1 - batched logsumexp
python
def batched_logsumexp(matrix: t.Tensor) -> t.Tensor:
'''For each row of the matrix, compute log(sum(exp(row))) in a numerically stable way.
matrix: shape (batch, n)
Return: (batch, ). For each i, out[i] = log(sum(exp(matrix[i]))).
Do this without using PyTorch's logsumexp function.
A couple useful blogs about this function:
- https://leimao.github.io/blog/LogSumExp/
- https://gregorygundersen.com/blog/2020/02/09/log-sum-exp/
'''
pass
matrix = t.tensor([[-1000, -1000, -1000, -1000], [1000, 1000, 1000, 1000]])
expected = t.tensor([-1000 + math.log(4), 1000 + math.log(4)])
actual = batched_logsumexp(matrix)
assert_all_close(actual, expected)
matrix2 = t.randn((10, 20))
expected2 = t.logsumexp(matrix2, dim=-1)
actual2 = batched_logsumexp(matrix2)
assert_all_close(actual2, expected2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Solution
def batched_logsumexp(matrix: t.Tensor) -> t.Tensor:
'''For each row of the matrix, compute log(sum(exp(row))) in a numerically stable way.
matrix: shape (batch, n)
Return: (batch, ). For each i, out[i] = log(sum(exp(matrix[i]))).
Do this without using PyTorch's logsumexp function.
A couple useful blogs about this function:
- https://leimao.github.io/blog/LogSumExp/
- https://gregorygundersen.com/blog/2020/02/09/log-sum-exp/
'''
C = matrix.max(dim=-1).values
exps = t.exp(matrix - einops.rearrange(C, "n -> n 1"))
return C + t.log(t.sum(exps, dim=-1))
Exercise H.2 - batched softmax
python
def batched_softmax(matrix: t.Tensor) -> t.Tensor:
'''For each row of the matrix, compute softmax(row).
Do this without using PyTorch's softmax function.
Instead, use the definition of softmax: https://en.wikipedia.org/wiki/Softmax_function
matrix: shape (batch, n)
Return: (batch, n). For each i, out[i] should sum to 1.
'''
pass
matrix = t.arange(1, 6).view((1, 5)).float().log()
expected = t.arange(1, 6).view((1, 5)) / 15.0
actual = batched_softmax(matrix)
assert_all_close(actual, expected)
for i in [0.12, 3.4, -5, 6.7]:
assert_all_close(actual, batched_softmax(matrix + i))
matrix2 = t.rand((10, 20))
actual2 = batched_softmax(matrix2)
assert actual2.min() >= 0.0
assert actual2.max() <= 1.0
assert_all_equal(actual2.argsort(), matrix2.argsort())
assert_all_close(actual2.sum(dim=-1), t.ones(matrix2.shape[:-1]))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Solution
def batched_softmax(matrix: t.Tensor) -> t.Tensor:
'''For each row of the matrix, compute softmax(row).
Do this without using PyTorch's softmax function.
Instead, use the definition of softmax: https://en.wikipedia.org/wiki/Softmax_function
matrix: shape (batch, n)
Return: (batch, n). For each i, out[i] should sum to 1.
'''
exp = matrix.exp()
return exp / exp.sum(dim=-1, keepdim=True)
Exercise H.3 - batched logsoftmax
python
def batched_logsoftmax(matrix: t.Tensor) -> t.Tensor:
'''Compute log(softmax(row)) for each row of the matrix.
matrix: shape (batch, n)
Return: (batch, n). For each i, out[i] should sum to 1.
Do this without using PyTorch's logsoftmax function.
For each row, subtract the maximum first to avoid overflow if the row contains large values.
'''
pass
matrix = t.arange(1, 6).view((1, 5)).float()
start = 1000
matrix2 = t.arange(start + 1, start + 6).view((1, 5)).float()
actual = batched_logsoftmax(matrix2)
expected = t.tensor([[-4.4519, -3.4519, -2.4519, -1.4519, -0.4519]])
assert_all_close(actual, expected)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Solution
def batched_logsoftmax(matrix: t.Tensor) -> t.Tensor:
'''Compute log(softmax(row)) for each row of the matrix.
matrix: shape (batch, n)
Return: (batch, n). For each i, out[i] should sum to 1.
Do this without using PyTorch's logsoftmax function.
For each row, subtract the maximum first to avoid overflow if the row contains large values.
'''
C = matrix.max(dim=1, keepdim=True).values
return matrix - C - (matrix - C).exp().sum(dim=1, keepdim=True).log()
Exercise H.4 - batched cross entroy loss
python
def batched_cross_entropy_loss(logits: t.Tensor, true_labels: t.Tensor) -> t.Tensor:
'''Compute the cross entropy loss for each example in the batch.
logits: shape (batch, classes). logits[i][j] is the unnormalized prediction for example i and class j.
true_labels: shape (batch, ). true_labels[i] is an integer index representing the true class for example i.
Return: shape (batch, ). out[i] is the loss for example i.
Hint: convert the logits to log-probabilities using your batched_logsoftmax from above.
Then the loss for an example is just the negative of the log-probability that the model assigned to the true class. Use torch.gather to perform the indexing.
'''
pass
logits = t.tensor([[float("-inf"), float("-inf"), 0], [1 / 3, 1 / 3, 1 / 3], [float("-inf"), 0, 0]])
true_labels = t.tensor([2, 0, 0])
expected = t.tensor([0.0, math.log(3), float("inf")])
actual = batched_cross_entropy_loss(logits, true_labels)
assert_all_close(actual, expected)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Solution
def batched_cross_entropy_loss(logits: t.Tensor, true_labels: t.Tensor) -> t.Tensor:
'''Compute the cross entropy loss for each example in the batch.
logits: shape (batch, classes). logits[i][j] is the unnormalized prediction for example i and class j.
true_labels: shape (batch, ). true_labels[i] is an integer index representing the true class for example i.
Return: shape (batch, ). out[i] is the loss for example i.
Hint: convert the logits to log-probabilities using your batched_logsoftmax from above.
Then the loss for an example is just the negative of the log-probability that the model assigned to the true class. Use torch.gather to perform the indexing.
'''
assert logits.shape[0] == true_labels.shape[0]
assert true_labels.max() < logits.shape[1]
logprobs = batched_logsoftmax(logits)
indices = einops.rearrange(true_labels, "n -> n 1")
pred_at_index = logprobs.gather(1, indices)
return -einops.rearrange(pred_at_index, "n 1 -> n")
Exercise I.1 - collect rows
python
def collect_rows(matrix: t.Tensor, row_indexes: t.Tensor) -> t.Tensor:
'''Return a 2D matrix whose rows are taken from the input matrix in order according to row_indexes.
matrix: shape (m, n)
row_indexes: shape (k,). Each value is an integer in [0..m).
Return: shape (k, n). out[i] is matrix[row_indexes[i]].
'''
assert row_indexes.max() < matrix.shape[0]
pass
matrix = t.arange(15).view((5, 3))
row_indexes = t.tensor([0, 2, 1, 0])
actual = collect_rows(matrix, row_indexes)
expected = t.tensor([[0, 1, 2], [6, 7, 8], [3, 4, 5], [0, 1, 2]])
assert_all_equal(actual, expected)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Solution
def collect_rows(matrix: t.Tensor, row_indexes: t.Tensor) -> t.Tensor:
'''Return a 2D matrix whose rows are taken from the input matrix in order according to row_indexes.
matrix: shape (m, n)
row_indexes: shape (k,). Each value is an integer in [0..m).
Return: shape (k, n). out[i] is matrix[row_indexes[i]].
'''
assert row_indexes.max() < matrix.shape[0]
return matrix[row_indexes]
Exercise I.2 - collect columns
python
def collect_columns(matrix: t.Tensor, column_indexes: t.Tensor) -> t.Tensor:
'''Return a 2D matrix whose columns are taken from the input matrix in order according to column_indexes.
matrix: shape (m, n)
column_indexes: shape (k,). Each value is an integer in [0..n).
Return: shape (m, k). out[:, i] is matrix[:, column_indexes[i]].
'''
assert column_indexes.max() < matrix.shape[1]
pass
matrix = t.arange(15).view((5, 3))
column_indexes = t.tensor([0, 2, 1, 0])
actual = collect_columns(matrix, column_indexes)
expected = t.tensor([[0, 2, 1, 0], [3, 5, 4, 3], [6, 8, 7, 6], [9, 11, 10, 9], [12, 14, 13, 12]])
assert_all_equal(actual, expected)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Solution
def collect_columns(matrix: t.Tensor, column_indexes: t.Tensor) -> t.Tensor:
'''Return a 2D matrix whose columns are taken from the input matrix in order according to column_indexes.
matrix: shape (m, n)
column_indexes: shape (k,). Each value is an integer in [0..n).
Return: shape (m, k). out[:, i] is matrix[:, column_indexes[i]].
'''
assert column_indexes.max() < matrix.shape[1]
return matrix[:, column_indexes]
Einsum
Einsum 是一个在执行线性代数操作时非常有用的函数,你可能会经常用到它.
注意,我们使用的是该函数的 举个例子,
尽管有许多不同的操作,但是这些操作都是基于三个关键规则: einops.einsum
版本,他的使用方式和传统的torch.einsum
有些许不同: einops.einsum
将数组作为第一个参数,并使用空格来区分维度torch.einsum
将字符串作为第一个参数,并且不使用空格来区分维度(每个维度使用单个字母表示).torch.einsum("ij,i->j", A, b)
和einops.einsum(A, b, "i j, i -> j")
是等价的.(注意einops并不关心在,
和->
前后是否有空格,所以你不需要完全按照这个格式来写) - 在不同的输入中使用重复字母意味着这些值将相乘,并且他们的乘积将出现在输出中.
- 例如
M = einops.einsum(A, B, "i j, i j -> i j")
对应A
和B
两矩阵按位乘积M = A * B
.
- 例如
- 省略字母意味着这个字母代表的轴将被求和.
- 例如,如果
x
是一个 2 维的数组,形状为(I, J)
, 那么einops.einsum(x, "i j -> i")
将是一个长度为I
的 1 维数组,每个值为x
按行求和的值 (即沿着j
轴求和,即按行求和).
- 例如,如果
- 我们可以以任何顺序返回未求和的轴.
- 例如,
einops.einsum(x, "i j k -> k j i")
和einops.rearrange(x, "i j k -> k j i")
效果一样.
- 例如,
注意,einops 的作者有计划要支持形状重排,例如这样的操作 einops.einsum(x, y, "i j, j k l -> i (k l)")
(即同时结合了 rearrange 和 einsum 的特点). 所以我们可以期待那一天的到来.
Einsum exercises
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵🔵⚪
你应该花最多15-20分钟在这个练习上.
如果你觉得你已经基本理解了有关内容,那么你可以跳到下个小节.
在以下练习中,你将使用`einsum`编写简单的函数,这些函数复现了Numpy里的一些标准函数实现的功能: 迹, 矩阵乘法, 内积和外积.我们还提供了一些测试函数来检验你的结果. Importance: 🔵🔵🔵🔵⚪
你应该花最多15-20分钟在这个练习上.
如果你觉得你已经基本理解了有关内容,那么你可以跳到下个小节.
注意,当前版本的 einsum 要求你的字符串中必须有 ->
, 即使你的矩阵求和到只有一个标量 (即 ->
右边是空的).
python
def einsum_trace(mat: np.ndarray):
'''
Returns the same as `np.trace`.
'''
pass
def einsum_mv(mat: np.ndarray, vec: np.ndarray):
'''
Returns the same as `np.matmul`, when `mat` is a 2D array and `vec` is 1D.
'''
pass
def einsum_mm(mat1: np.ndarray, mat2: np.ndarray):
'''
Returns the same as `np.matmul`, when `mat1` and `mat2` are both 2D arrays.
'''
pass
def einsum_inner(vec1: np.ndarray, vec2: np.ndarray):
'''
Returns the same as `np.inner`.
'''
pass
def einsum_outer(vec1: np.ndarray, vec2: np.ndarray):
'''
Returns the same as `np.outer`.
'''
pass
tests.test_einsum_trace(einsum_trace)
tests.test_einsum_mv(einsum_mv)
tests.test_einsum_mm(einsum_mm)
tests.test_einsum_inner(einsum_inner)
tests.test_einsum_outer(einsum_outer)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Solution
def einsum_trace(mat: np.ndarray):
'''
Returns the same as `np.trace`.
'''
# SOLUTION
return einops.einsum(mat, "i i ->")
def einsum_mv(mat: np.ndarray, vec: np.ndarray):
'''
Returns the same as `np.matmul`, when `mat` is a 2D array and `vec` is 1D.
'''
# SOLUTION
return einops.einsum(mat, vec, "i j, j -> i")
def einsum_mm(mat1: np.ndarray, mat2: np.ndarray):
'''
Returns the same as `np.matmul`, when `mat1` and `mat2` are both 2D arrays.
'''
# SOLUTION
return einops.einsum(mat1, mat2, "i j, j k -> i k")
def einsum_inner(vec1: np.ndarray, vec2: np.ndarray):
'''
Returns the same as `np.inner`.
'''
# SOLUTION
return einops.einsum(vec1, vec2, "i, i ->")
def einsum_outer(vec1: np.ndarray, vec2: np.ndarray):
'''
Returns the same as `np.outer`.
'''
# SOLUTION
return einops.einsum(vec1, vec2, "i, j -> i j")