A Guide To Basic Linear Algebra Notation For Machine Learning

Often, you’ll be looking around on the web for an answer to a question you have about an algorithm & you are presented with a formulae-heavy answer on a forum. If you don’t know the notation, this is going to give you a headache. So this article aims to cover off much of the common notation we will see, as data scientists. 

Let’s start simple, a scalar value is simply a numerical value (e.g. 1).

Sets & Set Operations

A set is an ordered, de-duplicated collection of values. If you see it in square brackets, like [0,1], it means that the set includes all values between 0 and 1 (e.g. 0.001, 0.002, etc..), whereas, a set defined as (0,1) with normal parenthesis, does not.

∈ refers to set membership. So, x ∈ S, means, X is in the set (called S). 

So if we have 2 sets: S1 = (1, 7, 5, 4) and S2 = (1, 5, 3).

We can intersect the two sets by writing S1∩S2. This gives us only the values in both sets. In this case, this would be (1, 5).

We can also union two sets. S1∪S2 = (1, 3, 4, 5, 7)

We can sum all the items in the set, this is denoted as: Σxi = x1 + x2 …. This can also be denoted as Σx(i)  = x(1) + x(2) ….

We can also calculate the product of the elements in a collection (multiply them all together). This is denoted by: Πxi = x1 . x2 . …. . xn , where the dots between the different values mean multiply.

We can carry out operations on sets. We denote a derived set as s’. So, s’ ← {x3 | x ∈ s | x > 10} means, create a derived set, called s’, which is the result of x cubed, when x is a member of s and x is greater than 10.

Vectors & Vector Operations

A vector is an ordered list of scalar values (e.g. 1, 5, 6 , 9). 

A vector is denoted as a bold lower-case letter. For example b = [1, 3]. These can be visualised as arrows, as below. The magnitude (size) of the arrow & it’s direction, can give you a good deal of intuition visually.

As above, vector b has the elements [1,3], so b(1) = 1 and b(2) = 3.

The result of adding two vectors together, is another vector. x + y = [x(1) + y(1), x(2) + y(2), etc…] and to subtract, is very similar: x y = [x(1) – y(1), x(2) – y(2), etc…].

We can multiply a vector by a scalar, for which, the output is also a vector. We can use xc = [cx(1), cx(2)]. So, if c = 12 and x = [1, 3], we have: xc = [(12*1) , (12*3)] = [12, 36].

To take the dot product of two vectors, we get a scalar output. So, a.b = [a(1) * b(1)  + a(2) * b(2) ]

Matrices and Matrix Operations

A matrix is a data structure a bit like a table. A matrix is denoted with a bold upper case letter.

H = [10, 16, 12

         2,   4,  7  ]

Using this H matrix, we have the below positioning:

H = [H(1,1), H(1,2), H(1,3)

         H(2,1),   H(2,2)  H(2,3) ]

If we wanted to multiply this by a vector, we can do that, but only if the vector has the same number of columns. The matrix above has 3 columns (rows / columns are inverse of where we would expect) – hence, we can multiply it with this vector j = [2, 3, 4].

H = [H(1,1)*j(1), H(1,2)*j(2), H(1,3)*j(3)

         H(2,1)*j(1),   H(2,2)*j(2), H(2,3)*j(3) ]

The output of this will be a 3D vector, because there are 3 columns.


A function is a rule which associated each X element to a Y value. So, y = f(x), f is the function name and x is the input variable.

The max function on set B = (1, 7, 9, 12) will be denoted as max∈Bf(b). This takes the highest value in the set, which is 12. The argmax function is defined as argmax∈Bf(b) and takes the index of the highest value in the set, which is 4.

Features & Feature Vectors

{(xi, yi)}N is how we denote the collection of labelled examples. Unlabelled examples will be denoted simply as {(xi)}N. xi is a feature vector, which is comprised of several features. A feature within the vector is denoted as xj.

If we look at an example of xi, where we have height, weight and gender inputs, we may have: [176, 85, F]. In this case, x1 is height, x2 is weight and x3 is gender.