Approximation with radial basis functions
Basically

The radial basis functions network can be used to approximate a multidimensional function of a set of input vectors X and their output y. It looks a bit like a neural net. But it isn’t and it does not work as well as a neural net does.

rbf
A simple radial basis functions network



Usually the radial basis functions network has 3 layers: An input layer, a hidden layer and an output layer. The hidden layer normally has more knots than the input layer and quite often the output layer has only one knot and builds the function.


rbf



With


rbf



Where Ci is called prototype or center point of the ith knot and


rbf



and

rbf



Is the radial basis function.


There are other radial basis functions used in other applications. But here I use just this one. It is called “Gaussian radial basis function”. The factor σ in some articles is said to be the standard deviation of the vectors X around the centers C. I say it’s not really like that :-)

To find an approximating function to the input data X there are several approaches. It think the simplest and most efficient one is to separate the data into clusters and use the centre points of these clusters as C points (C has always the same amount of dimensions as X). With this the hidden layer gets as many knots as clusters have been built.

Whit these clusters the net must be trained to find a setting of the weights W, the offset O and σ. This is done with the Gradient descend (see Gradient descend for a detailed description) method and cost-function

rbf



With N as the number of data samples and Xi and yi one data sample.

For the Gradient descend the differentiation

rbf



must be built.

With the chain rule (see Differential calculus) the differentiation with respect to wi is not too complicate:

rbf


Whereas the multiplication by 2 can be neglected:

rbf


For the offset it’s even simpler:

rbf


and finally there is σ. For σ the differentiation is a bit more complicate:

rbf


and with

rbf


we get

rbfrbfrbf


This inserted in the above formulation:

rbf


Quite a formula :-)


For the implementation of this algorithm first we need to cluster the input data (see Clustering data by a K-means algorithm) . With the center points of the computed clusters Hi can be computed like:



public void CalculateH(double[] x, Collection<Cluster> cluster)
{
     int j, k;
    
     for (j = 0; j < H.Length; j++)
     {
         radialF[j] = 0;
         for (k = 0; k < x.Length; k++)
         {
              radialF[j] += (x[k] - cluster[j].centerPoint[k]) * (x[k] - cluster[j].centerPoint[k]);
         }
         H[j] = Math.Exp(-radialF[j] / sigma);
     }
}


The variable radialF is the Φ from above. I changed the name to avoid confusion with the number Φ.


With Hi the Gradient can be computed:


public double CalculateGradient(double[] x, double y)
{
     int i;
     double tempSum = 0;
    
     for (i = 0; i < featuresIn; i++)
         tempSum += H[i] * w[i];
     for (i = 0; i < featuresIn; i++)
     {
         gradient[i] += H[i] * (tempSum + offs - y);
         deltaSigma += H[i] * (tempSum + offs - y) *w[i] * radialF[i] / sigma / sigma;
     }
     deltaOffs += (tempSum + offs - y);
     return (tempSum + offs - y) * (tempSum + offs - y);
}



This function computes the Gradient of W and the change for σ and the offset and it returns the part of the cost that is generated by the sample (X, y).

In one iteration the function CalculateH and CalculateGradient must be called for all samples. That gives the changes for W, σ and the offset to be subtracted in the next step:


public void UpdateValues(int samples, double learning_rate)
{
     int i;
    
     for (i = 0; i < featuresIn; i++)
     {
         w[i] -= gradient[i] / samples * learning_rate;
     }
     sigma -= deltaSigma / samples * 10;
     offs -= deltaOffs / samples * learning_rate;
}


Approximation using radial basis functions

In the sample project I use a data set I created for this algorithm. It consists of data points computed with the 2 dimensional function:

rbf


To get a useful clustering there are 7 center points with some data points around them.

The 7 center points are

rbf


And they are stored in the file Centerpoint.csv The distribution looks like:

rbf


They should represent a set of measuring points with some inaccuracy. There are 144 data points with a random noise within ±0.01. That makes y-values between -0.16759 and 2.899164 and these y-values are the values to be approximated.This is done by the sequence:


public class TData
{
     public Collection<double[]> values = new Collection<double[]>();
     public int dataLength = 0;
 
public void readData(string FileName)
{
     foreach (var str in File.ReadLines(FileName, Encoding.Default))
     {
         int i;
         string[] strLine = str.Split(',');
         double[] line = new double[strLine.Length];
         for (i = 0; i < strLine.Length; i++)
              line[i] = Convert.ToDouble(strLine[i]);
              dataLength = line.Length;
              values.Add(line);
         }
     }
}



Whit this sequence carried out 1000 000 times the algorithm yields:

rbf


Offset = 18.0268

σ = 122.0778

and

cost = 0.00034957

This cost value looks quite good. But if we compare the approximated values (the blue curve) to the reference values (the red curve), obviously not all values are approximated that well.


rbf


Beginning at the data point 109 all the rest of the data points are relatively bad approximated. There is quite some difference between the 2 curves. That’s not too useful.


Using the Gradient descend algorithm

Using the Gradient descend algorithm (see Gradient descend) to approximate the same data, the result is much better.

If the algorithm is implemented with the target function


rbf


With this target function and 2110 iterations with the learning_rate = 0.001 the Gradient descend algorithm got the approximation


rbf


With a cost = 1.671E-05That looks much better :-) The graphic shows:


rbf


It approximates the origin shape (blue and invisible below the red shape) without noise very nice. As the very small cost already suggests :-)

Another interesting approach could be the Method of the least squares multi-dimensional.



C# Demo Projects to the Autoencoder
  • RadialBasisFunc.zip
  • ClusteringKmeansRBF.zip
  • GradientDescendRBF.zip