Neural Networks
Computational models inspired by brain structure that learn patterns from data, forming the foundation of modern artificial intelligence systems.
What it is
A neural network is a computational model composed of layers of interconnected nodes (neurons) that process information. Each connection has a weight that is adjusted during training so the network learns to map inputs to desired outputs.
The concept originated in 1943 with the McCulloch-Pitts model, but modern neural networks took off in 2012 when AlexNet demonstrated that deep networks could surpass traditional methods in image classification. Today they are the foundation of LLMs, computer vision systems, and generative models.
Architectures
| Architecture | Structure | Primary application | Example |
|---|---|---|---|
| Feedforward (MLP) | Dense layers connected sequentially | Classification, regression | Price prediction |
| Convolutional (CNN) | Filters that detect spatial patterns | Computer vision | ResNet, EfficientNet |
| Recurrent (RNN/LSTM) | Connections that maintain temporal state | Sequences (text, audio) | Pre-2017 translation |
| Transformer | Attention mechanism without recurrence | NLP, vision, multimodal | GPT, BERT, ViT |
| Autoencoder | Encoder-decoder that compresses and reconstructs | Embeddings, generation | VAE, diffusion |
| GAN | Generator vs discriminator in competition | Image generation | StyleGAN, DALL-E 1 |
The Transformer architecture — introduced in the "Attention Is All You Need" paper (Vaswani et al., 2017) — dominates current AI because it parallelizes better than RNNs and captures long-range dependencies.
How they learn
Neural network training follows a cycle:
- Forward pass: data flows through the network and produces a prediction
- Loss function: the error between prediction and expected value is calculated
- Backpropagation: the error propagates backward, computing the gradient of each weight
- Optimization: weights are adjusted using the gradient (SGD, Adam, AdamW)
This cycle repeats thousands or millions of times over the training dataset. The key is that backpropagation — formalized by Rumelhart, Hinton, and Williams in 1986 — enables efficiently computing how each weight contributes to the total error.
Training hyperparameters — learning rate, batch size, number of epochs, scheduler — have an enormous impact on the final result. Finding the right combination is more art than science, although techniques like learning rate warmup and cosine annealing have standardized good practices.
Key concepts
| Concept | What it does | Why it matters |
|---|---|---|
| Activation function | Introduces non-linearity (ReLU, GELU, sigmoid) | Without it, the network can only learn linear functions |
| Dropout | Randomly deactivates neurons during training | Prevents overfitting |
| Batch normalization | Normalizes activations between layers | Stabilizes and accelerates training |
| Learning rate | Controls the size of weight adjustments | Too high diverges, too low doesn't converge |
| Transfer learning | Reuse pre-trained weights on new task | Reduces data and training time |
When NOT to use neural networks
- Small tabular data (< 10K rows) — XGBoost or random forests usually win
- Interpretability requirements — linear models or decision trees are more explainable
- No GPU available — training deep networks without acceleration is prohibitively slow
- Insufficient data — deep networks need large data volumes or transfer learning
Why it matters
Neural networks are the fundamental building block of all modern AI — from LLMs that generate code to vision models that drive autonomous vehicles. Understanding their architectures, limitations, and training costs is essential for making informed decisions about which model to use, when to train your own, and when a simpler approach is sufficient. The choice between fine-tuning an existing model and training from scratch defines the cost and timeline of any AI project.
References
- Deep Learning — Goodfellow, Bengio, and Courville, 2016. The reference book on neural networks and deep learning.
- CS231n: Convolutional Neural Networks for Visual Recognition — Stanford, 2024. Stanford course on neural networks and computer vision.
- Build the Neural Network — PyTorch, 2024. Official tutorial for building neural networks with PyTorch.
- Neural Networks — 3Blue1Brown, 2024. Visual series explaining the intuition behind neural networks.