A Guide To Basic Linear Algebra Notation For Machine Learning

Often, you’ll be looking around on the web for an answer to a question you have about an algorithm & you are presented with a formulae-heavy answer on a forum. If you don’t know the notation, this is going to give you a headache. So this article aims to cover off much of the common notation we will see, as data scientists. 

Let’s start simple, a scalar value is simply a numerical value (e.g. 1).

Sets & Set Operations

A set is an ordered, de-duplicated collection of values. If you see it in square brackets, like [0,1], it means that the set includes all values between 0 and 1 (e.g. 0.001, 0.002, etc..), whereas, a set defined as (0,1) with normal parenthesis, does not.

∈ refers to set membership. So, x ∈ S, means, X is in the set (called S). 

So if we have 2 sets: S1 = (1, 7, 5, 4) and S2 = (1, 5, 3).

We can intersect the two sets by writing S1∩S2. This gives us only the values in both sets. In this case, this would be (1, 5).

We can also union two sets. S1∪S2 = (1, 3, 4, 5, 7)

We can sum all the items in the set, this is denoted as: Σxi = x1 + x2 …. This can also be denoted as Σx(i)  = x(1) + x(2) ….

We can also calculate the product of the elements in a collection (multiply them all together). This is denoted by: Πxi = x1 . x2 . …. . xn , where the dots between the different values mean multiply.

We can carry out operations on sets. We denote a derived set as s’. So, s’ ← {x3 | x ∈ s | x > 10} means, create a derived set, called s’, which is the result of x cubed, when x is a member of s and x is greater than 10.

Vectors & Vector Operations

A vector is an ordered list of scalar values (e.g. 1, 5, 6 , 9). 

A vector is denoted as a bold lower-case letter. For example b = [1, 3]. These can be visualised as arrows, as below. The magnitude (size) of the arrow & it’s direction, can give you a good deal of intuition visually.

As above, vector b has the elements [1,3], so b(1) = 1 and b(2) = 3.

The result of adding two vectors together, is another vector. x + y = [x(1) + y(1), x(2) + y(2), etc…] and to subtract, is very similar: x y = [x(1) – y(1), x(2) – y(2), etc…].

We can multiply a vector by a scalar, for which, the output is also a vector. We can use xc = [cx(1), cx(2)]. So, if c = 12 and x = [1, 3], we have: xc = [(12*1) , (12*3)] = [12, 36].

To take the dot product of two vectors, we get a scalar output. So, a.b = [a(1) * b(1)  + a(2) * b(2) ]

Matrices and Matrix Operations

A matrix is a data structure a bit like a table. A matrix is denoted with a bold upper case letter.

H = [10, 16, 12

         2,   4,  7  ]

Using this H matrix, we have the below positioning:

H = [H(1,1), H(1,2), H(1,3)

         H(2,1),   H(2,2)  H(2,3) ]

If we wanted to multiply this by a vector, we can do that, but only if the vector has the same number of columns. The matrix above has 3 columns (rows / columns are inverse of where we would expect) – hence, we can multiply it with this vector j = [2, 3, 4].

H = [H(1,1)*j(1), H(1,2)*j(2), H(1,3)*j(3)

         H(2,1)*j(1),   H(2,2)*j(2), H(2,3)*j(3) ]

The output of this will be a 3D vector, because there are 3 columns.


A function is a rule which associated each X element to a Y value. So, y = f(x), f is the function name and x is the input variable.

The max function on set B = (1, 7, 9, 12) will be denoted as max∈Bf(b). This takes the highest value in the set, which is 12. The argmax function is defined as argmax∈Bf(b) and takes the index of the highest value in the set, which is 4.

Features & Feature Vectors

{(xi, yi)}N is how we denote the collection of labelled examples. Unlabelled examples will be denoted simply as {(xi)}N. xi is a feature vector, which is comprised of several features. A feature within the vector is denoted as xj.

If we look at an example of xi, where we have height, weight and gender inputs, we may have: [176, 85, F]. In this case, x1 is height, x2 is weight and x3 is gender. 

Previous Article

The Ultimate Guide To Linear Regression For Aspiring Data Scientists

Next Article

The Data Scientist Statistics Learning Plan For 2021

Related Posts