Jonatan Matajonmatum.com
conceptsnotesexperimentsessays
© 2026 Jonatan Mata. All rights reserved.v2.1.1
Concepts

Neural Networks

Computational models inspired by brain structure that learn patterns from data, forming the foundation of modern artificial intelligence systems.

evergreen#neural-networks#deep-learning#machine-learning#ai#backpropagation#transformers

What it is

A neural network is a computational model composed of layers of interconnected nodes (neurons) that process information. Each connection has a weight that is adjusted during training so the network learns to map inputs to desired outputs.

The concept originated in 1943 with the McCulloch-Pitts model, but modern neural networks took off in 2012 when AlexNet demonstrated that deep networks could surpass traditional methods in image classification. Today they are the foundation of LLMs, computer vision systems, and generative models.

Architectures

ArchitectureStructurePrimary applicationExample
Feedforward (MLP)Dense layers connected sequentiallyClassification, regressionPrice prediction
Convolutional (CNN)Filters that detect spatial patternsComputer visionResNet, EfficientNet
Recurrent (RNN/LSTM)Connections that maintain temporal stateSequences (text, audio)Pre-2017 translation
TransformerAttention mechanism without recurrenceNLP, vision, multimodalGPT, BERT, ViT
AutoencoderEncoder-decoder that compresses and reconstructsEmbeddings, generationVAE, diffusion
GANGenerator vs discriminator in competitionImage generationStyleGAN, DALL-E 1

The Transformer architecture — introduced in the "Attention Is All You Need" paper (Vaswani et al., 2017) — dominates current AI because it parallelizes better than RNNs and captures long-range dependencies.

How they learn

Neural network training follows a cycle:

  1. Forward pass: data flows through the network and produces a prediction
  2. Loss function: the error between prediction and expected value is calculated
  3. Backpropagation: the error propagates backward, computing the gradient of each weight
  4. Optimization: weights are adjusted using the gradient (SGD, Adam, AdamW)

This cycle repeats thousands or millions of times over the training dataset. The key is that backpropagation — formalized by Rumelhart, Hinton, and Williams in 1986 — enables efficiently computing how each weight contributes to the total error.

Training hyperparameters — learning rate, batch size, number of epochs, scheduler — have an enormous impact on the final result. Finding the right combination is more art than science, although techniques like learning rate warmup and cosine annealing have standardized good practices.

Key concepts

ConceptWhat it doesWhy it matters
Activation functionIntroduces non-linearity (ReLU, GELU, sigmoid)Without it, the network can only learn linear functions
DropoutRandomly deactivates neurons during trainingPrevents overfitting
Batch normalizationNormalizes activations between layersStabilizes and accelerates training
Learning rateControls the size of weight adjustmentsToo high diverges, too low doesn't converge
Transfer learningReuse pre-trained weights on new taskReduces data and training time

When NOT to use neural networks

  • Small tabular data (< 10K rows) — XGBoost or random forests usually win
  • Interpretability requirements — linear models or decision trees are more explainable
  • No GPU available — training deep networks without acceleration is prohibitively slow
  • Insufficient data — deep networks need large data volumes or transfer learning

Why it matters

Neural networks are the fundamental building block of all modern AI — from LLMs that generate code to vision models that drive autonomous vehicles. Understanding their architectures, limitations, and training costs is essential for making informed decisions about which model to use, when to train your own, and when a simpler approach is sufficient. The choice between fine-tuning an existing model and training from scratch defines the cost and timeline of any AI project.

References

  • Deep Learning — Goodfellow, Bengio, and Courville, 2016. The reference book on neural networks and deep learning.
  • CS231n: Convolutional Neural Networks for Visual Recognition — Stanford, 2024. Stanford course on neural networks and computer vision.
  • Build the Neural Network — PyTorch, 2024. Official tutorial for building neural networks with PyTorch.
  • Feature Visualization — Distill, 2017. Advanced techniques for visualizing and interpreting what neural networks learn.
  • Neural Networks — 3Blue1Brown, 2024. Visual series explaining the intuition behind neural networks.
  • Understanding LSTM Networks — Christopher Olah, 2015. Visual explanation of recurrent networks and LSTMs.

Related content

  • Artificial Intelligence

    Field of computer science dedicated to creating systems capable of performing tasks that normally require human intelligence, from reasoning and perception to language generation.

  • Large Language Models

    Massive neural networks based on the Transformer architecture, trained on enormous text corpora to understand and generate natural language with emergent capabilities like reasoning, translation, and code generation.

  • Embeddings

    Dense vector representations that capture the semantic meaning of text, images, or other data in a numerical space where proximity reflects conceptual similarity.

Concepts