This is my first post regarding artificial intelligence (AI). But I promise to include as much as I can from understanding Simple Neural Network (NN) to deep learning through little theoretical but lots of practical implementations. I will also include simple projects where possible. Lets begin building then.

I like Python a lot so most of the works will be done in Python. Later on I am hoping to develop them in Java and Scala.
To learn NN we will not be using any NN libraries but some mathematical libraries, ie. numpy.

Learn basics of Numpy HERE .

To begin building NN which is supposed to mimic how our brain works, we have to understand little bit of our own Brain.
An averaged sized Brain includes of 100 billion neurons connected by synapses. Neurons are the basic unit of brain which plays major role for all the tasks done by brain.  Blah blah blah… its better you go through this well written article (A Basic Introduction To Neural Networks).

This tutorial we will be building a Artificial unit of this very Neuron. I consider you know matrices which we will be using as mathematical foundation for building NN with numpy.

Our simple ANN will include three inputs and a output. (Input: 3, Output: 1). This neuron we will build should classify a basic problem of classification. We will use various different training algorithms to train our neuron for classification.

So our neuron will have very small dataset for training (a deeplearning model will need very very large dataset for better performance) which will be enough in this problem.

Example# Input A Input B Input C Output Y
1 0 0 1 0
2 1 1 1 1
3 1 0 1 1
4 0 1 1 0
5 1 0 0 ??? (1 expected)

So what will be the output for the last data row (Row #5)?.

Our NN will try to classify what the output value should be based on the dataset provided.

In ANN each input is given a weight, which determines which input will have the most impact to the overall output of the neuron.

Steps we will follow:

  1. We will set input weights to a random number either positive or negative. Take the inputs and the weights which is passed to a function  that calculates the output.
  2. Error for each of the input and output is calculated using the actual output given by the neuron and expected output from the dataset. This is used to adjust the weights values of the input edges, for later steps. This is learning. We will be doing without looking through Error rate too, which is called Feedforward.
  3. This process is done for many times as desired until favorable minimal error rate is obtained. This is called Backpropagation.
  4. Now test the unknown input to the neuron, it should follow the same pattern to classify the output for the input.

Lets follow the steps one-by-one with codes.

Step : 1.

from numpy import exp, array, random, dot

random.seed(1)
training_dataset_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
training_dataset_outputs = array([[0,1,1,0]]).T
weights = 2 * random.random((3,1)) - 1 # generate 3 x 1 matrix of random numbers 

Formula for calculating output by the neuron. We first need is weighted sum of the inputs nad the weights of each inputs.
Inputs = Ii
Weights = Wi

Weighted Sum = W1 * I1 + W2 * I2 + W3 * I3

We do is pass the result to a function that normalizes the weighted sum between 0 and 1, these are called Activation Function. We have various types of these functions like Sigmoid Function, ReLU function, Tanh function, Maxout function, etc. We will use both the functions and compare results. Firstly we will be using Sigmoid Function which is simplest of all.
Sigmoid Function

{\displaystyle f(x)={\frac {1}{1+e^{-x}}}}Activation logistic.svg
ReLU function 

{\displaystyle f(x)=\left\{{\begin{array}{rcl}0&{\mbox{for}}&x<0\\x&{\mbox{for}}&x\geq 0\end{array}}\right.}Activation rectified linear.svg

Tanh function

{\displaystyle f(x)=\tanh(x)={\frac {2}{1+e^{-2x}}}-1}Activation tanh.svg

def sigmoid(x):
    return 1/(1+exp(-x))

def derivative_sigmoid(x):
    return x * (1-x)

Step : 2.

Error calculation and adjusting weights of the inputs for later data computations.

We adjust the weight of the inputs by certain value given by “Error Weighted Derivative”. This equation makes adjustment to weights proportional to the size of the error.

Adjust Weight Rate  = error * input * SigmoidCurveGradient(output)

The gradient  of the sigmoid curve is

SigmoidCurveGradient(output)=output*(1-output)

This is the simplest algorithm for a neuron to learn.

for iteration in xrange(number_iteration):
    output = sigmoid(dot(training_dataset_inputs, weights))
    error = training_dataset_output - output
    
    adjust_rate = error * derivative_sigmoid(output)
    
    adjustment_value = dot(training_dataset_inputs.T, adjust_rate)
    weights += adjustment_value

Predict the new data with the trained model:

sigmoid(dot([1,0,0], weights))

Full Source code

from numpy import exp, array, random, dot

random.seed(1)
weights = 2 * random.random((3,1)) - 1

def sigmoid(x):
    return 1/(1+exp(-x))
def derivative_sigmoid(x):
    return x * (1-x)

number_iteration = 10000
training_dataset_inputs = array([[0,0,1],[1,1,1],[1,0,1],[0,1,1]])
training_dataset_output = array([[0,1,1,0]]).T

for iteration in xrange(number_iteration):
    output = sigmoid(dot(training_dataset_inputs, weights))
    error = training_dataset_output - output
    
    adjust_rate = error * derivative_sigmoid(output)
    
    adjustment_value = dot(training_dataset_inputs.T, adjust_rate)
    weights += adjustment_value

# predict the output
sigmoid(dot([1,0,0], weights)) # expected = output = ~1