Linear regression iterative

To implement a linear regression algorithm in an iterative form does not make too much sense as there is a classic algorithm that calculates this at once (see Linear regression) but never the less it is a good way to understand the gradient descent algorithm which is used in machine learning algorithms.

The linear regression approximates a random number of supporting points (x,y) to a straight line with the formulation


Linear regression

with the offset s the place where the line crosses the cero line of x and the rise the direction of the line. The placement of the line is optimized the way that the sum of the square of the deviation to the supporting points is smallest as possible.

For an iterative approach the idea is to search the offset and rise for which the mean of the deviations in square becomes the smallest. That means


Linear regression

Therefore the offset and the rise both are set to a random starting value. With these starting values the deviation is (most probably) big. Now the formulation for the deviation is derived for the rise (in some articles they derive for the offset as well. I regard this as wrong. But more to this later).

Now the derivation


Linear regression

that can as well be written as



Linear regression

There where the searched minimum is, this derivation will become 0 and so the algorithm should converge to this minimum.

The derivation itself is:



Linear regression

With this derivation a mean slope is computed by the use of all supporting points like




Linear regression

And with this rise’ and a small fixed factor, the so called learning rate, the actual rise is adjusted:



Linear regression

In a C# function that’s:


double update_rise(double[] x, double[] y, double rise, double offs, double learning_rate)
{
     double rise_deriv = 0;
    
     for (int i = 0; i < samples; i++)
     {
         // Calculate partial derivative
         rise_deriv += -2 * x[i] * (y[i] - (rise * x[i] + offs));
     }
     rise -= (rise_deriv / samples) * learning_rate;        
     return rise;
}



The adjusting value is subtracted from the actual rise value because we are searching for the minimum and want to move downward the slope.

With this new rise the mean deviation square is computed. This deviation is often called cost and the function that computes it is the cost function


Linear regression

Or in a c# function:


double cost_function(double[] x, double[]  y, double weight, double bias)
{
double cost = 0.0;
for (int i = 0; i < samples; i++)
{
cost += (y[i] - (weight * x[i] + bias)) * (y[i] - (weight * x[i] + bias));
}
return cost / samples;
}




If this 2 operations are called iteratively, the deviation, or cost, will decrease slowly till it gets to a minimum. And as at the minimum the derivation is 0, the change will be 0 there too, at least ideally. In reality there can be an overshoot. The minimum that will be found will most probably not be the absolute minimum. There is still the offset that might not fit.

In the literature I found articles that say you have to derive the deviation function for the offset as well and calculate


Linear regression

and


Linear regression

And for this they use the same learning rate as is used to adjust the rise. So they compute the rise and offset at once.

I implemented the algorithm according to this formulation and tried it. The algorithm converged and it returned a result. But at a first trial this result was absolutely wrong. I stopped the iteration after 100 loops. That was much too few. It took 200 000 iterations to get the correct result. That’s quite a lot, makes the algorithm quite slow and takes a lot of calculation effort.

I tried a bit a different approach and treat the offset separately in my implementation. With a starting offset of 0 I compute the rise for the smallest deviation square. As the offset is most probably not correct, there will be a big deviation square. The mean of the deviation (not in square) is big as well. This mean deviation is quite well related to the offset. The closer the offset gets, the smaller the mean deviation will be. Therefore my idea was just to correct the offset by the mean deviation and repeat the search for the best approximation. If this sequence is repeated in a loop, the mean deviation will decrease and the offset gets closer and closer and finally converges to the correct offset.
With this approach it takes 43 iterations to get the correct offset and for each iteration there are 9 iterations to compute the rise. That makes 400 iterations to get the same result as with the 200 000 iterations further above.

The Gradient Descend converges quite slow and tends to oscillate. Computing the offset separately improves this behaviour quite a bit :-)


In a C# function that's:


double find_min(double[] x, double[] y, double rise, double offs, double learning_rate, int maxIers)
{
int count = 0;  
double oldCost = -1;
double divCost = oldCost - cost;
 
while(Math.Abs(divCost) > 1e-5 && count < maxIers)
{
rise = update_rise(x, y, rise, offs, learning_rate); 
cost = cost_function(x, y, rise, offs);
divCost = oldCost - cost;
oldCost = cost;
count++;
if (divCost < -0.001)
{
learning_rate = -learning_rate / 2;
}
}
return rise;
}




And the complete linear regression:



private void linear_regression()
{
     int count = 0;
     double oldCost = 1e60;
     double divCost = 10;
     double offset_step =  0.1;
 
     offset = 0;
     rise = startRise;
     while ((Math.Abs(divCost) > 0.00002) && (count < 1000))
     {
         rise = find_min(xp, yp, rise, offset, learning_rate, iterations);
         divCost = oldCost - cost;
         if (divCost < -0.001)
         {
              offset_step = -offset_step / 2.0;
         }
         oldCost = cost;
         count++;
         offset = offset + mean_function(xp, yp, rise, offset);
     }
}





with



double mean_function(double[] x, double[] y, double weight, double offset)
{
     double mean = 0.0;
     for (int i = 0; i < samples; i++)
     {
         mean += y[i] - (weight * x[i] + offset);
     }
     return mean / samples;
}





With this implementation I get for my sample values


Linear regression


The offset = 8.475 and rise = 0.2261. That makes a mean deviation of 22.6.

The same data in the classic linear regression gives an offset = 8.4676 and rise = 0.2263. That’s quite close :-)





Of course this implementation is set up for this sample data now. For other data the learning rate, offset step and starting values for rise and offset might have to be changed. In general the more data samples are there the smaller the learning rate should be chosen.

The big point to remember is the fact that an offset must be treated separately from a rise function. This is valid for rise functions of higher complexity as well and will be used in Gradient Descend


C# Demo Project Linear regression iterative
  • LinearRegression.zip
  • Computing the offset separately.
  • LinearRegression_offs.zip
  • Computing the offset and rise at once (with 200 000 iterations)