9.6 Neural Networks

Neural networks (NN) are computational models inspired by the way biological neural systems, such as the brain, process information. In this section, we will explore the fundamental concepts behind artificial neural networks (ANNs), their types, and key algorithms used in their learning processes.

1. Biological Neural Networks vs. Artificial Neural Networks (ANN)

Biological Neural Networks

Biological neural networks refer to the complex networks of neurons in the human brain and nervous system that process information. A biological neuron consists of three main components:

Dendrites: Receive signals from other neurons.
Axon: Transmits signals to other neurons.
Synapse: The connection between neurons that enables signal transmission.

Neurons in the brain are highly interconnected and process information through electrical and chemical signals.

Artificial Neural Networks (ANNs)

Artificial neural networks are computational models that simulate the behavior of biological neural networks. They consist of artificial neurons (nodes) connected by edges (synapses) that transmit signals in the form of numerical values. ANNs are primarily used for pattern recognition, classification, and regression tasks.

2. McCulloch-Pitts Neuron

The McCulloch-Pitts neuron, introduced in 1943, is a mathematical model of a simple artificial neuron. It is primarily designed to simulate the working of biological neurons in performing basic logical operations. This model laid the foundation for modern artificial neural networks.

Mathematically, the McCulloch-Pitts neuron can be represented as:

$\text{Output} = f \left( \sum_{i=1}^{n} x_i w_i - \theta \right)$

Where:

$x_i= \text{is the input,}$
$w_i = \text{is the weight,}$
$\theta = \text{is the threshold,}$
$f = \text{is the activation function (typically a step function).}$

Components of the McCulloch-Pitts Neuron

Inputs

The neuron accepts a set of binary inputs: x₁, x₂, ..., xₙ, where each input is either 0 or 1.
These inputs represent the external signals received by the neuron.

Weights

Each input has an associated weight: w₁, w₂, ..., wₙ.
Weights determine the importance or influence of each input on the output. A higher weight signifies a stronger influence.

Weighted Sum

The neuron calculates the weighted sum of the inputs:
- $S = \sum_{i=1}^{n} x_i \cdot w_i$

This sum determines the combined effect of all inputs based on their weights.

Threshold (θ)

A predefined threshold value, θ, is set.
The weighted sum, S, is compared against θ to decide the neuron's output.

Activation Function

A step function is used as the activation function:

$y = \begin{cases} 1 & \text{if } S \geq \theta \\ 0 & \text{if } S < \theta \end{cases}$

The output, ( y ), is binary (0 or 1).

Mathematical Representation

$y = \begin{cases} 1 & \text{if } \sum_{i=1}^{n} x_i \cdot w_i \geq \theta \\ 0 & \text{otherwise} \end{cases}$

Where:

xᵢ: Input signals (binary values, 0 or 1).
wᵢ: Weights of the inputs.
θ: Threshold value.
y: Output of the neuron (binary, 0 or 1).

Applications

AND Gate

Outputs 1 only when all inputs are 1.
Example:
- Inputs: x₁ = 1, x₂ = 1
- Weights: w₁ = 1, w₂ = 1
- Threshold: θ = 2
- Calculation:
  - $S = x_1 \cdot w_1 + x_2 \cdot w_2 = 1 \cdot 1 + 1 \cdot 1 = 2 \geq 2 \implies y = 1$

OR Gate

Outputs 1 if at least one input is 1.
Example:
- Inputs: x₁ = 1, x₂ = 0
- Weights: w₁ = 1, w₂ = 1
- Threshold: θ = 1
- Calculations:
  - $S = x_1 \cdot w_1 + x_2 \cdot w_2 = 1 \cdot 1 + 0 \cdot 1 = 1 \geq 1 \implies y = 1$

NOT Gate

Outputs the opposite of the input.
Example:
- Input: x₁ = 1
- Weight: w₁ = -1
- Threshold: θ = -0.5
- Calculation:
  - $S = x_1 \cdot w_1 = 1 \cdot (-1) = -1 < -0.5 \implies y = 0$

Limitations

Linear Separability: McCulloch-Pitts neurons cannot solve problems that are not linearly separable (e.g., the XOR problem).
Static Model: The neuron lacks learning capabilities; weights and thresholds are fixed and not adjusted through training.

Significance

The McCulloch-Pitts neuron was an early milestone in AI and computational neuroscience. It demonstrated how a network of simple neurons could model complex logical operations, paving the way for modern neural networks and deep learning.

3. Mathematical Model of ANN

The basic unit of an artificial neural network is the artificial neuron, which is modeled mathematically by the following steps:

Inputs: The network receives a set of inputs( $x_1, x_2, \dots, x_n$ )
Weights: Each input is multiplied by a corresponding weight( $w₁, w₂, ..., wₙ$ )
Summation: The weighted inputs are summed.
Bias: An additional term called bias is added to the summation to adjust the output independently of the inputs.
Activation Function: The sum is passed through an activation function to produce the output.

Mathematically:

$\text{Output} = f\left( \sum_{i=1}^{n} x_i w_i + b \right)$

Where:

$x_i = \text{are the inputs,}$
$w_i = \text{are the weights,}$
$b = \text{is the bias,}$
$f = \text{is the activation function.}$

4. Activation Functions

Activation functions are mathematical functions used to determine the output of a neuron. They introduce non-linearity into the neural network, allowing it to learn complex patterns. Common activation functions include:

Step Function: Outputs 1 if the input is greater than a threshold, otherwise 0.
Sigmoid Function: Outputs values between 0 and 1, often used in binary classification.
- $\sigma(x) = \frac{1}{1 + e^{-x}}$
Hyperbolic Tangent (tanh): Outputs values between -1 and 1.
- $\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
ReLU (Rectified Linear Unit): Outputs the input directly if it is positive, otherwise zero. It is widely used in modern neural networks.
- $\text{ReLU}(x) = \max(0, x)$
Leaky ReLU: A variant of ReLU that allows small negative values when the input is less than zero.
- $\text{Leaky ReLU}(x) = \max(0.01x, x)$

5. Architectures of Neural Networks

The architecture of a neural network refers to the arrangement of neurons in layers and the connections between them. Key types of architectures include:

Feedforward Neural Networks (FNN): Neurons are arranged in layers, and information moves only in one direction — from input to output.
Recurrent Neural Networks (RNN): In RNNs, information can flow in both directions, and they maintain memory of previous computations, making them useful for sequential data like time series or language modeling.
Convolutional Neural Networks (CNN): Specialized for processing grid-like data such as images, CNNs include layers that apply convolutions (filters) to extract features from data.

6. The Perceptron

The Perceptron is a single-layer neural network model used for binary classification. It consists of a set of inputs, weights, a bias term, and an activation function. The perceptron algorithm updates weights based on the error between the predicted and actual output, typically using a simple learning rule.

Perceptron Learning Rule:

$w_i = w_i + \Delta w_i$

Where:

$\Delta w_i = \eta (y - \hat{y}) x_i$

Where:

$\eta = \text{is the learning rate,}$
$y = \text{is the actual output,}$
$\hat{y} = \text{is the predicted output,}$
$x_i = \text{is the input value.}$

Two essential steps for adjusting weights in a perceptron

Step 1: Calculate the Prediction

Formula: $z = w_1 \cdot x_1 + w_2 \cdot x_2 + b$

If $z \geq 0$ , predict 1; otherwise, predict 0.

Step 2: Adjust Weights if Prediction is Wrong

If $y_{\text{true}} \neq y_{\text{pred}}$ , update weights and bias:

Weight Update: $w_i \leftarrow w_i + \eta \cdot (y_{\text{true}} - y_{\text{pred}}) \cdot x_i$

Bias Update: $b \leftarrow b + \eta \cdot (y_{\text{true}} - y_{\text{pred}})$

Limitations

Linear separability: The perceptron can only solve problems where the classes are linearly separable. For example, it can easily solve problems like AND, OR, but it cannot solve problems like XOR.
Single-layer: The perceptron is a single-layer neural network, meaning it has no hidden layers. It can only model linear decision boundaries.

Advantages

Binary classification: The perceptron is primarily used for binary classification tasks where the goal is to classify inputs into one of two classes (e.g., 0 or 1).
Simple problems: It is often used in introductory machine learning tasks or when working with simple, linearly separable data.

7. The Learning Rate

The learning rate is a hyperparameter that controls the magnitude of the weight updates during training. A small learning rate makes the network learn slowly but more precisely, while a large learning rate speeds up learning but risks overshooting optimal solutions.

8. Gradient Descent

Gradient descent is an optimization algorithm used to minimize the loss function in neural networks. It works by iteratively adjusting the weights in the direction that reduces the error.

Batch Gradient Descent: Computes the gradient using the entire dataset.
Stochastic Gradient Descent (SGD): Computes the gradient using a single data point.
Mini-batch Gradient Descent: A compromise between batch and stochastic gradient descent, using a subset of the data.

9. The Delta Rule

The Delta Rule is used in the perceptron and other neural networks to adjust weights during training. It updates the weights to reduce the difference (error) between the predicted and actual outputs. The update rule is:

w_i = w_i + \eta (y - \hat{y}) x_i

Where:

\eta \quad \text{is the learning rate,} \\ y \quad \text{is the target output,} \\ \hat{y} \quad \text{is the predicted output,} \\ x_i \quad \text{is the input value.}

10. Hebbian Learning

Hebbian learning is a learning principle based on the idea that if two neurons are activated together, their connection (synapse) is strengthened. This is often summarized by the phrase "cells that fire together, wire together."

The update rule for Hebbian learning is:

\Delta w = \eta x_i y

Where:

\eta \quad \text{is the learning rate,} \\ x_i \quad \text{is the input,} \\ y \quad \text{is the output.}

11. Adaline Network

The ADALINE (Adaptive Linear Neuron) network is a type of neural network similar to the perceptron but with continuous activation functions. It uses the least mean squares (LMS) rule for training, which minimizes the squared error between the actual and predicted outputs.

12. Multilayer Perceptron Neural Networks (MLP)

The Multilayer Perceptron (MLP) is a type of feedforward neural network that consists of multiple layers of neurons. It is a powerful model used for learning complex patterns, especially non-linear ones. Here's a breakdown of its key features:

Key Features of MLP:

Layers:
- Input Layer: This is where the data is fed into the network.
- Hidden Layers: These are the intermediate layers between the input and output. An MLP can have one or more hidden layers. The neurons in these layers process the input data and learn complex patterns.
- Output Layer: This layer produces the final output, which can be a classification, prediction, etc.
Neurons: Each layer contains multiple neurons (also called nodes). Each neuron in one layer is connected to every neuron in the next layer, creating a fully connected network.
Activation Functions: MLP uses non-linear activation functions (such as ReLU, Sigmoid, or Tanh) in hidden layers to enable learning of non-linear relationships in the data.
Training using Backpropagation:
- The training process of an MLP involves the use of an algorithm called backpropagation. Backpropagation helps the network adjust its weights by calculating the error (the difference between the predicted output and the actual target) and propagating it backward through the network to update the weights.
- Gradient Descent is commonly used to minimize the error and find the optimal weights.

Why MLP is Powerful:

Non-Linear Learning: With multiple hidden layers and non-linear activation functions, MLPs are capable of learning complex, non-linear patterns that simpler models (like perceptrons) cannot.
Versatile: MLPs can be used for various tasks such as classification, regression, and time series prediction.

Example of MLP in Action:

Input: A set of features (e.g., images, text, or numerical data).
Hidden Layers: The data is processed through multiple layers where the model learns to extract relevant features and patterns.
Output: The final output could be a classification label (e.g., "cat" or "dog") or a numerical prediction.

MLP is a type of neural network that uses multiple layers to learn non-linear patterns in data and is trained using backpropagation.

13. Backpropagation Algorithm

The backpropagation algorithm is a key method for training neural networks, particularly multi-layer neural networks like Multilayer Perceptrons (MLP). It helps the network learn by adjusting its weights based on the error in the predictions. Here's how it works in two main steps:

1. Forward Pass

Input Data: The input data is fed into the network through the input layer.
Pass Through Layers: The data passes through the hidden layers where each neuron processes the data using weights and an activation function.
Output Calculation: The final output is produced in the output layer. This output is a prediction made by the neural network based on the input data.

In this step, the network computes the predicted output but doesn't adjust the weights yet. The output is then compared to the true target (label) to calculate the error (loss).

2. Backward Pass (Backpropagation)

Calculate Error: First, calculate the error (or loss), which is the difference between the predicted output and the actual target. A common loss function is mean squared error (MSE) for regression or cross-entropy for classification.
Apply Chain Rule: To update the weights, backpropagation uses the chain rule of calculus to compute the gradient of the loss function with respect to each weight. This tells us how much each weight contributed to the error. The gradient is calculated layer by layer, starting from the output layer and moving backward through the hidden layers.
Gradient Calculation: For each weight $w_i$ , calculate the derivative of the loss function with respect to $w_i$ , using the chain rule to propagate errors backward through the layers.
Update Weights: Using gradient descent, the weights are updated to minimize the error. The update rule for each weight is:
$w_i \leftarrow w_i - \eta \cdot \frac{\partial \text{Loss}}{\partial w_i}$
where:
$\eta$ is the learning rate, which determines the step size for each weight update.
$\frac{\partial \text{Loss}}{\partial w_i}$ is the gradient of the loss function with respect to the weight $w_i$ , which tells how much to change the weight to reduce the error.

14. Hopfield Neural Network

The Hopfield network is a type of recurrent neural network used for associative memory. It is capable of storing and recalling binary patterns. The network converges to a stable state, which corresponds to the closest stored pattern.

Key Differences between Multilayer Perceptron Neural Network & Hopfield Neural Network:

Architecture: Hopfield networks are recurrent (with feedback loops) and MLPs are feedforward (no feedback).
Training: Hopfield networks use an energy minimization approach, while MLPs use backpropagation.
Purpose: Hopfield is used for pattern storage and retrieval, whereas MLP is used for more general tasks like classification and regression.

Conclusion

Neural networks are a powerful class of models inspired by biological systems, and they play a fundamental role in AI. From simple perceptrons to deep multi-layer networks, these models use various algorithms like backpropagation and Hebbian learning to adjust weights and improve their performance. Understanding the core concepts like activation functions, gradient descent, and the delta rule is essential for building and training effective neural networks.

Previous9.5 Machine Learning NextMCQs

Last updated 2 days ago