Machine Learning / Data Science:
Neural Networks | ASW Hack Club


Introduction:

Neural networks are the key to modern machine learning and "artificial intelligence". I put artificial intelligence in quotes because it is a word thrown around without much meaning. Most things reffered to as artificial intelligence are just machine learning and marketing. Real artificial intelligence is reffered to as artifical general intelligence or AGI. We are not here yet and it is not right to call machine learning AI. Sorry for the rant just needed to put that out there.

Neural networks are mathematical recreations of how neurons behave in the human brain. In this page I will go over the details about how they do that, history about how they were discovered and some information about the mathematics that goes into making a neural network.

History:

The history of neural networks is a rather long one, but it is really interesting and it is able to help you get a better grasp of them. You don't have to read this if you do not want to, just scroll past it.

The earlist forms of neural networks developed by Warren McCulloch and Walter Pitts around 1943 where simple mathematical functions that were recognizing neurons and the connections that they had between them and how they react to eachother.

After the work of the two men, Frank Rosenblatt begain working on and ended up inventing the Perceptron around 1957. Frank Rosenblatt was a psychologist so he had a great understanding of the processes that occur in the brain and this allowed him to create the Perceptron which was able to simulate the visual process in humans and was able to recognize basics geomatrical shapes.

In 1969 Marvin Minsky and Seymour Papert published a book called "Perceptrons" giong over the limitations in the Perceptron. This caused many people to lose their interest in AI as it showed that the current tech was not able to get around the XOR problem. The XOR problem is when a network is not able to predict non-linear relations like in the XOR table. Look at the truth table for the XOR operation here (look at the iamge) This is essentialy that the neural networks is not able to think in multiple steps to discover relationships that are not super connected. The main issue that since you cannot draw a straight line between the values in the XOR truth table, the Perceptron is not able to predict the values accurately.

There was nothing really interesting that happened after the discovery of the XOR problem until 1986 when David Rumelhart, Geoffrey Hinton, and Ronald Williams were able to demonstrate that you are able to get around the XOR problem by using multiple layers known as hidden layers in combinantion with backpropogation to be able to get around the issue where neural networks were only able to solve linear problems. This solved many of the early limitations of neural networks.

In 1989 Yann LeCun and a few others developed the convolutional neural network a.k.a. CNN that is particulary good at image recognition tasks.

Now going into the 2000's, in 2006 Geoffrey Hinton started to popularlize the term "deep learning" and showed the usefulness of deep belief networks a.k.a. DBNs and their performance in unsupervised learning. This caused more people to become even more interested in multilayered networks. About 3 years later in 2009 they developed the ideas of unsupervised pretraining and contrastive divergence which allowed networks to be more effectively trained on their datasets.

This is when there was a huge boom in the development of AI around 2012, where AlexNet, a deep convolutional neural network was able to win the ImageNet competition by a huge amount. This showed the power of deep learning for image classification and vision tasks.

In 2014 the technology known as generative adversarial networks or GANs were created by Ian Goodfellow which made a huge expansion in the generative model landscape.

In 2015 the AlphaGo model created by Google DeepMind was a reinforcement model that showed machine learning models being able to demonstrate a machine learning model performing complex decision making tasks.

Between 2018 and now a large number of huge neural network technologies like GPT from OpenAI, BERT from Google, and TtTTT or T5 from Google used the transformer architecture for machine learning and were all wildy impressive demonstrations of the real capabilities of the machine learning models that they were deloping.

That all of the important history, for now ...

Types of Neural Networks:

There are several different types of neural networks, some of them are better at certain tasks like image classification, identification, generation, or other application of machine learning. Here is a table with information about how the model works along with some key information about how they are usually used. They are in no particular order.

Neural Network Type Function Key Information
Feedforward Neural Network (FNN) Basic neural network where data flows from input to output without loops. Often used for classification and regression tasks. Each layer has direct connections to the next layer.
Convolutional Neural Network (CNN) Good for data stored in arrays, meaning things like images, using convolutional layers. Commonly used in image and video recognition. They key features are convolutional, pooling, and fully connected layers.
Recurrent Neural Network (RNN) Designed for sequential data, with loops that allow information to persist (short term memory). Useful for speech recognition, and natural language processing. Has issues with the vanishing gradient problem.
Long Short-Term Memory (LSTM) A type of RNN designed to fix the vanishing gradient problem. Key for tasks where long-term memory matters, such as text generation and machine language to language translation.
Gated Recurrent Unit (GRU) A variant of RNN similar to LSTM but with a simpler structure and fewer parameters. Efficient for sequence prediction and some basic natural language processing.
Autoencoder Learns a compressed, lower-dimensional representation of data, often for unsupervised learning. Used for reducing dimensionality, detecting anomalies in data, and generating new data
Generative Adversarial Network (GAN) Has two networks, a generator and a discriminator that are trained to create and evaluate realistic data. Used for generating new synthetic data like images videos and music.
Radial Basis Function Network (RBFN) A radial basis function network uses radial functions to determine the influence of center points on the output based on input data distances. Used in classification and regression tasks. The model uses the distance between the data points.
Multilayer Perceptron (MLP) A fully connected feedforward network with >=1 hidden layer(s) between the input and output layer. Commonly used for classification, regression, and pattern recognition tasks.
Transformer Network Turns text into tokens, used for tokenized generative tasks like text generation. Primarily used in natural language processing and machine translation.
Self-Organizing Map (SOM) An unsupervised learning algorithm that maps many dimensional data into less dimensions. Used for clustering, dimensionality reduction, and visualizing complex data.
Capsule Network (CapsNet) Utilizes capsules (groups of neurons) to encode spatial hierarchies in the data. Effective for image classification or object recognition which improves generalization.
Deep Belief Network (DBN) Composed of multiple layers of stochastic, generative models. Used for pretraining deep networks and dimension reduction. Used in unsupervised learning related tasks.
Echo State Network (ESN) A type of RNN with fixed, random weights in the recurrent layer. Used in tasks that require dynamic memory, like forecasting and pattern recognition.
Neural Turing Machine (NTM) Combines neural networks with external memory to simulate a Turing machine. Used for tasks requiring external memory, such as algorithmic tasks and reasoning.
Attention Mechanism Used to focus on important parts of the input data in sequence models, especially in transformers. Helps models find the data that they need to pay attention to in their input.
Siamese Network Consists of two identical networks sharing weights, used for comparing two inputs. Used in tasks like face verification, signature verification, and similarity matching.

Structure of a Neural Network

A network consists of input, hidden, and output layers like such:

Input Layer
x₁
x₂
x₃
Hidden Layer 1
h₁
h₂
h₃
h₄
Hidden Layer 2
h₅
h₆
h₇
Output Layer
y₁
y₂

Binary values of exactly 1 or 0 are inputting into the input layers, hidden layers respond based on weights and biases defined at training, and tensor values between 1 and 0 are outputted on the output layer.

Weights define the amount that the neuron should be listened to by the connected neurons, and the bias pulls the weight further towards a 1 or a 0 based on the sum of the weights,

The neural network in the example has each layer connected in a configuration called a dense connection, where every neuron in the previous layer of the neural network has a connection to every neuron in the next layer. There are many different types, here is a table going over most of them:

Connection Schema Description
Fully Connected (Dense) Every neuron in one layer connects to every neuron in the next.
Convolutional Neurons connect to local regions of the input or prior layer.
Recurrent Neurons connect in loops to retain temporal memory.
LSTM/GRU Recurrent connections with gates for long-term dependencies.
Skip Connections Bypasses layers to connect distant ones directly.
Dropout Randomly drops connections during training.
Sparse Connections Neurons connect to only a subset of the next layer.
Attention Connections based on learned attention weights.
Self-Attention Connections within the same layer based on attention weights.
Residual Adds output of a layer to the input of subsequent layers.
Gated Connections Connections modulated by learned gates.
Weight Sharing Same weights used across multiple connections.
Normalization Layers Neurons connect through normalization operations.

The top four their being the dense, convolutional, recurrent, and LSTM are going to be the most important and mose used, they are the ones that you should pay the most attention to.

Writing has been paused on this article until more time to write it has become available. If you would like to continue this, you can make a pull request on the GitHub.


If there are any edits that you would like to request to be added to this, please submit them in an issue in the GitHub or you can send an email to sysadmin@silverflag.net