Computational models inspired by brain structure that learn patterns from data, forming the foundation of modern artificial intelligence systems.
A neural network is a computational model composed of layers of interconnected nodes (neurons) that process information. Each connection has a weight that is adjusted during training so the network learns to map inputs to desired outputs.
The concept originated in 1943 with the McCulloch-Pitts model, but modern neural networks took off in 2012 when AlexNet demonstrated that deep networks could surpass traditional methods in image classification. Today they are the foundation of LLMs, computer vision systems, and generative models.
| Architecture | Structure | Primary application | Example |
|---|---|---|---|
| Feedforward (MLP) | Dense layers connected sequentially | Classification, regression | Price prediction |
| Convolutional (CNN) | Filters that detect spatial patterns | Computer vision | ResNet, EfficientNet |
| Recurrent (RNN/LSTM) | Connections that maintain temporal state | Sequences (text, audio) | Pre-2017 translation |
| Transformer | Attention mechanism without recurrence | NLP, vision, multimodal | GPT, BERT, ViT |
| Autoencoder | Encoder-decoder that compresses and reconstructs | Embeddings, generation | VAE, diffusion |
| GAN | Generator vs discriminator in competition | Image generation | StyleGAN, DALL-E 1 |
The Transformer architecture — introduced in the "Attention Is All You Need" paper (Vaswani et al., 2017) — dominates current AI because it parallelizes better than RNNs and captures long-range dependencies.
Neural network training follows a cycle:
This cycle repeats thousands or millions of times over the training dataset. The key is that backpropagation — formalized by Rumelhart, Hinton, and Williams in 1986 — enables efficiently computing how each weight contributes to the total error.
Training hyperparameters — learning rate, batch size, number of epochs, scheduler — have an enormous impact on the final result. Finding the right combination is more art than science, although techniques like learning rate warmup and cosine annealing have standardized good practices.
| Concept | What it does | Why it matters |
|---|---|---|
| Activation function | Introduces non-linearity (ReLU, GELU, sigmoid) | Without it, the network can only learn linear functions |
| Dropout | Randomly deactivates neurons during training | Prevents overfitting |
| Batch normalization | Normalizes activations between layers | Stabilizes and accelerates training |
| Learning rate | Controls the size of weight adjustments | Too high diverges, too low doesn't converge |
| Transfer learning | Reuse pre-trained weights on new task | Reduces data and training time |
Neural networks are the fundamental building block of all modern AI — from LLMs that generate code to vision models that drive autonomous vehicles. Understanding their architectures, limitations, and training costs is essential for making informed decisions about which model to use, when to train your own, and when a simpler approach is sufficient. The choice between fine-tuning an existing model and training from scratch defines the cost and timeline of any AI project.
Field of computer science dedicated to creating systems capable of performing tasks that normally require human intelligence, from reasoning and perception to language generation.
Massive neural networks based on the Transformer architecture, trained on enormous text corpora to understand and generate natural language with emergent capabilities like reasoning, translation, and code generation.
Dense vector representations that capture the semantic meaning of text, images, or other data in a numerical space where proximity reflects conceptual similarity.