Neural Networks

What it is

A neural network is a computational model composed of layers of interconnected nodes (neurons) that process information. Each connection has a weight that is adjusted during training so the network learns to map inputs to desired outputs.

The concept originated in 1943 with the McCulloch-Pitts model, but modern neural networks took off in 2012 when AlexNet demonstrated that deep networks could surpass traditional methods in image classification. Today they are the foundation of LLMs, computer vision systems, and generative models.

Architectures

Architecture	Structure	Primary application	Example
Feedforward (MLP)	Dense layers connected sequentially	Classification, regression	Price prediction
Convolutional (CNN)	Filters that detect spatial patterns	Computer vision	ResNet, EfficientNet
Recurrent (RNN/LSTM)	Connections that maintain temporal state	Sequences (text, audio)	Pre-2017 translation
Transformer	Attention mechanism without recurrence	NLP, vision, multimodal	GPT, BERT, ViT
Autoencoder	Encoder-decoder that compresses and reconstructs	Embeddings, generation	VAE, diffusion
GAN	Generator vs discriminator in competition	Image generation	StyleGAN, DALL-E 1

The Transformer architecture — introduced in the "Attention Is All You Need" paper (Vaswani et al., 2017) — dominates current AI because it parallelizes better than RNNs and captures long-range dependencies.

How they learn

Neural network training follows a cycle:

Forward pass: data flows through the network and produces a prediction
Loss function: the error between prediction and expected value is calculated
Backpropagation: the error propagates backward, computing the gradient of each weight
Optimization: weights are adjusted using the gradient (SGD, Adam, AdamW)

This cycle repeats thousands or millions of times over the training dataset. The key is that backpropagation — formalized by Rumelhart, Hinton, and Williams in 1986 — enables efficiently computing how each weight contributes to the total error.

Training hyperparameters — learning rate, batch size, number of epochs, scheduler — have an enormous impact on the final result. Finding the right combination is more art than science, although techniques like learning rate warmup and cosine annealing have standardized good practices.

Key concepts

Concept	What it does	Why it matters
Activation function	Introduces non-linearity (ReLU, GELU, sigmoid)	Without it, the network can only learn linear functions
Dropout	Randomly deactivates neurons during training	Prevents overfitting
Batch normalization	Normalizes activations between layers	Stabilizes and accelerates training
Learning rate	Controls the size of weight adjustments	Too high diverges, too low doesn't converge
Transfer learning	Reuse pre-trained weights on new task	Reduces data and training time

When NOT to use neural networks

Small tabular data (< 10K rows) — XGBoost or random forests usually win
Interpretability requirements — linear models or decision trees are more explainable
No GPU available — training deep networks without acceleration is prohibitively slow
Insufficient data — deep networks need large data volumes or transfer learning

Why it matters

Neural networks are the fundamental building block of all modern AI — from LLMs that generate code to vision models that drive autonomous vehicles. Understanding their architectures, limitations, and training costs is essential for making informed decisions about which model to use, when to train your own, and when a simpler approach is sufficient. The choice between fine-tuning an existing model and training from scratch defines the cost and timeline of any AI project.

References

Deep Learning — Goodfellow, Bengio, and Courville, 2016. The reference book on neural networks and deep learning.
CS231n: Convolutional Neural Networks for Visual Recognition — Stanford, 2024. Stanford course on neural networks and computer vision.
Build the Neural Network — PyTorch, 2024. Official tutorial for building neural networks with PyTorch.
Feature Visualization — Distill, 2017. Advanced techniques for visualizing and interpreting what neural networks learn.
Neural Networks — 3Blue1Brown, 2024. Visual series explaining the intuition behind neural networks.
Understanding LSTM Networks — Christopher Olah, 2015. Visual explanation of recurrent networks and LSTMs.

What it is

Architectures

Architecture	Structure	Primary application	Example
Feedforward (MLP)	Dense layers connected sequentially	Classification, regression	Price prediction
Convolutional (CNN)	Filters that detect spatial patterns	Computer vision	ResNet, EfficientNet
Recurrent (RNN/LSTM)	Connections that maintain temporal state	Sequences (text, audio)	Pre-2017 translation
Transformer	Attention mechanism without recurrence	NLP, vision, multimodal	GPT, BERT, ViT
Autoencoder	Encoder-decoder that compresses and reconstructs	Embeddings, generation	VAE, diffusion
GAN	Generator vs discriminator in competition	Image generation	StyleGAN, DALL-E 1

How they learn

Neural network training follows a cycle:

Forward pass: data flows through the network and produces a prediction
Loss function: the error between prediction and expected value is calculated
Backpropagation: the error propagates backward, computing the gradient of each weight
Optimization: weights are adjusted using the gradient (SGD, Adam, AdamW)

Key concepts

Concept	What it does	Why it matters
Activation function	Introduces non-linearity (ReLU, GELU, sigmoid)	Without it, the network can only learn linear functions
Dropout	Randomly deactivates neurons during training	Prevents overfitting
Batch normalization	Normalizes activations between layers	Stabilizes and accelerates training
Learning rate	Controls the size of weight adjustments	Too high diverges, too low doesn't converge
Transfer learning	Reuse pre-trained weights on new task	Reduces data and training time

When NOT to use neural networks

Small tabular data (< 10K rows) — XGBoost or random forests usually win
Interpretability requirements — linear models or decision trees are more explainable
No GPU available — training deep networks without acceleration is prohibitively slow
Insufficient data — deep networks need large data volumes or transfer learning

Why it matters

References

Deep Learning — Goodfellow, Bengio, and Courville, 2016. The reference book on neural networks and deep learning.
CS231n: Convolutional Neural Networks for Visual Recognition — Stanford, 2024. Stanford course on neural networks and computer vision.
Build the Neural Network — PyTorch, 2024. Official tutorial for building neural networks with PyTorch.
Feature Visualization — Distill, 2017. Advanced techniques for visualizing and interpreting what neural networks learn.
Neural Networks — 3Blue1Brown, 2024. Visual series explaining the intuition behind neural networks.
Understanding LSTM Networks — Christopher Olah, 2015. Visual explanation of recurrent networks and LSTMs.

Neural Networks

What it is

Architectures

How they learn

Key concepts

When NOT to use neural networks

Why it matters

References

Related content

Neural Networks

What it is

Architectures

How they learn

Key concepts

When NOT to use neural networks

Why it matters

References

Related content