Monday, 2 July 2018

Linear Regression - Part(2) [Matrix Calculus]

Previously, I had posted part(1) of the linear regression, where we dealt with machine learning perspective.  And, I had promised to present it with other angles of Mathematics too and here I am!
In this article, we would see it with matrix calculus.

The link to previous article is given below.
http://avidsuraj.blogspot.com/2018/05/linear-algebra-part1-machine-learning.html

I have used the same notation used in the previous article.  So, readers are recommended to read the previous article or at least notation portion of it.

The cost function or error function is written as:
$$ J = \sum  _{i=1}^{m} {{{e}^{(i)}}^{2}}$$
where, ${e}^{(i)}$ is the error for ${i}^{th}$ training example.

Let $E$ represent matrix of all the errors.  So, it is given by:
$$ E = Y-{Y}_{pred} \\
E = Y-(XW+C) \\
E = Y-XW-C $$
Differentiation of a vector, Y with respect to a vector X




Then, the cost function can be written in matrix form as:
$$ J = {E}^{T}E \\
J = {(Y-XW-C)}^{T}(Y-XW-C) $$
Simplifying,
$$ J = ({Y}^{T} - {W}^{T}{X}^{T}-{C}^{T})(Y-XW-C) $$
Multiplying,
$$ J = {Y}^{T}Y - {W}^{T}{X}^{T}Y-{C}^{T}Y - {Y}^{T}XW + {W}^{T}{X}^{T}XW + {C}^{T}XW - {Y}^{T}C + {W}^{T}{X}^{T}C + {C}^{T}C $$
Note that $C$ is also the matrix of shape, $m \times 1$.
The cost function or error function is minimum when the both partial derivatives of the cost function equal zero.
So, let's find partial derivatives now.
$$ \cfrac {\partial J} {\partial W} = -{X}^{T}Y - {X}^{T}Y + 2{X}^{T}XW + {X}^{T}C + {X}^{T}C \\
\cfrac {\partial J} {\partial W} = -2{X}^{T}Y + 2{X}^{T}XW + 2{X}^{T}C \quad ... \quad (1)$$
Similarly,
$$ \cfrac {\partial J} {\partial C} = - Y + XW - Y + XW + C \\
\cfrac {\partial J} {\partial C} = - 2Y + 2XW + C \quad ... \quad (2) $$
Equating (1) and (2) to $0$ for minimum cost function,
$$ -2{X}^{T}Y + 2{X}^{T}XW + 2{X}^{T}C = 0 \quad ... \quad (3) \\
- 2Y + 2XW + C = 0 \quad ... \quad (4) $$
From $(4)$, $  C = 2Y - 2XW $
So, $(3)$ becomes,
$$ -2{X}^{T}Y + 2{X}^{T}XW + 2{X}^{T}(2Y - 2XW) = 0 \\
 -2{X}^{T}Y + 2{X}^{T}XW + 4{X}^{T}Y - 4{X}^{T}XW= 0 $$
Simplifying,
$$ 2{X}^{T}Y - 2{X}^{T}XW= 0 $$
So, the matrix $W$ becomes:
$$ W = {({X}^{T}{X})}^{-1}{X}^{T}{Y} \quad ... \quad (5)$$
And, the value of $C$ can be found out by substituting value of $W$ in $(4)$.
$$ C = 2Y - 2X{({X}^{T}{X})}^{-1}{X}^{T}{Y} \quad ... \quad (6) $$
As given in $(5)$ and $(6)$, matrices $X$ and $W$ are found out.  With those matrices we can find the predictions, ${Y}_{pred}$ as:
$${Y}_{pred} = {X}_{pred}W + C $$

In this way, the linear regression model is made.  That is it for today and next time, hopefully, I would present the same linear regression with another angle of Mathematics.
Thank you.

Please provide feedback if any.  I would love to hear from you. 

No comments :

Post a Comment