Concepts

Neural Networks

Computational models inspired by brain structure that learn patterns from data, forming the foundation of modern artificial intelligence systems.

evergreen#neural-networks#deep-learning#machine-learning#ai#backpropagation#transformers

What it is

A neural network is a computational model composed of layers of interconnected nodes (neurons) that process information. Each connection has a weight that is adjusted during training so the network learns to map inputs to desired outputs.

The concept originated in 1943 with the McCulloch-Pitts model, but modern neural networks took off in 2012 when AlexNet demonstrated that deep networks could surpass traditional methods in image classification. Today they are the foundation of LLMs, computer vision systems, and generative models.

Architectures

ArchitectureStructurePrimary applicationExample
Feedforward (MLP)Dense layers connected sequentiallyClassification, regressionPrice prediction
Convolutional (CNN)Filters that detect spatial patternsComputer visionResNet, EfficientNet
Recurrent (RNN/LSTM)Connections that maintain temporal stateSequences (text, audio)Pre-2017 translation
TransformerAttention mechanism without recurrenceNLP, vision, multimodalGPT, BERT, ViT
AutoencoderEncoder-decoder that compresses and reconstructsEmbeddings, generationVAE, diffusion
GANGenerator vs discriminator in competitionImage generationStyleGAN, DALL-E 1

The Transformer architecture — introduced in the "Attention Is All You Need" paper (Vaswani et al., 2017) — dominates current AI because it parallelizes better than RNNs and captures long-range dependencies.

How they learn

Neural network training follows a cycle:

  1. Forward pass: data flows through the network and produces a prediction
  2. Loss function: the error between prediction and expected value is calculated
  3. Backpropagation: the error propagates backward, computing the gradient of each weight
  4. Optimization: weights are adjusted using the gradient (SGD, Adam, AdamW)

This cycle repeats thousands or millions of times over the training dataset. The key is that backpropagation — formalized by Rumelhart, Hinton, and Williams in 1986 — enables efficiently computing how each weight contributes to the total error.

Training hyperparameters — learning rate, batch size, number of epochs, scheduler — have an enormous impact on the final result. Finding the right combination is more art than science, although techniques like learning rate warmup and cosine annealing have standardized good practices.

Key concepts

ConceptWhat it doesWhy it matters
Activation functionIntroduces non-linearity (ReLU, GELU, sigmoid)Without it, the network can only learn linear functions
DropoutRandomly deactivates neurons during trainingPrevents overfitting
Batch normalizationNormalizes activations between layersStabilizes and accelerates training
Learning rateControls the size of weight adjustmentsToo high diverges, too low doesn't converge
Transfer learningReuse pre-trained weights on new taskReduces data and training time

When NOT to use neural networks

  • Small tabular data (< 10K rows) — XGBoost or random forests usually win
  • Interpretability requirements — linear models or decision trees are more explainable
  • No GPU available — training deep networks without acceleration is prohibitively slow
  • Insufficient data — deep networks need large data volumes or transfer learning

Why it matters

Neural networks are the fundamental building block of all modern AI — from LLMs that generate code to vision models that drive autonomous vehicles. Understanding their architectures, limitations, and training costs is essential for making informed decisions about which model to use, when to train your own, and when a simpler approach is sufficient. The choice between fine-tuning an existing model and training from scratch defines the cost and timeline of any AI project.

References

Concepts