
Vocabulary
A — C
- Ablation Study: A process where parts of an AI system (like specific layers or features) are removed to see how much they contribute to the overall performance.
- Agents (Agentic AI): Autonomous systems that use an LLM as a “brain” to use tools, browse the web, and execute multi-step tasks to achieve a goal.
- Alignment: The process of ensuring an AI’s goals and behaviors match human values and intentions (e.g., preventing it from being harmful).
- Attention Mechanism: The core of the Transformer architecture; it allows the model to “focus” on specific parts of the input text when predicting the next word.
- Autoregressive: A model that predicts the next item in a sequence based on previous items. Most LLMs are autoregressive, generating text one token at a time.
- Chain-of-Thought (CoT): A prompting technique where the AI is encouraged to “think out loud” or show its reasoning steps before providing a final answer.
- Context Window: The maximum amount of information (tokens) a model can “read” or remember at one single time.
- Constitutional AI: A method of training AI where the model is given a “constitution” (a set of rules) to self-evaluate and correct its own behavior.
D — F
- Diffusion: The primary technology behind modern image generators (like Midjourney or DALL-E) that creates data by gradually removing “noise” from an image.
- Distillation (Knowledge Distillation): The process of training a smaller, faster “student” model to mimic the performance of a large, complex “teacher” model.
- Emergent Behavior: Abilities that appear in large models that were not explicitly programmed or seen in smaller versions of those models.
- Embeddings: Numerical representations of words or concepts. In this “vector space,” words with similar meanings are mathematically close to each other.
- Epoch: One full pass of the entire training dataset through the neural network during training.
- Few-Shot Learning: Providing the model with a few examples of a task within the prompt to help it understand how to respond.
- Fine-Tuning: Taking a pre-trained model and training it further on a smaller, specific dataset to make it an expert in a certain field (e.g., medical or legal).
G — L
- Gantry: Infrastructure used to manage and deploy AI models at scale.
- Grounding: The practice of linking an AI’s responses to a specific, factual source of information (like a private document) to prevent hallucinations.
- Hallucination: When an LLM confidently generates information that is factually incorrect or nonsensical.
- Inference: The process of the model actually running and generating an output after it has already been trained.
- KV Cache (Key-Value Cache): A technical optimization that stores previous calculations so the model doesn’t have to re-process the entire conversation every time it generates a new word.
- Latent Space: The hidden mathematical “map” where the AI organizes everything it has learned; moving through this space changes the output’s style or content.
- LLM (Large Language Model): An AI trained on massive amounts of text to understand and generate human-like language.
- LoRA (Low-Rank Adaptation): A popular, efficient way to fine-tune models by only changing a tiny fraction of the model’s weights.
M — R
- Multimodal: An AI that can process and generate different types of data, such as text, images, audio, and video, simultaneously.
- NLP (Natural Language Processing): The broader field of computer science focused on the interaction between computers and human language.
- Overfitting: When a model learns its training data too well, memorizing it rather than understanding it, which makes it perform poorly on new, unseen data.
- Parameters: The “knobs and dials” inside a neural network. Generally, more parameters mean a more capable (but more expensive) model.
- Perplexity: A metric used to judge how well a model predicts a sample. Lower perplexity means the model is less “confused” by the data.
- Quantization: Reducing the precision of a model’s numbers (e.g., from 16-bit to 4-bit) to make it run faster and fit on smaller hardware like phones.
- RAG (Retrieval-Augmented Generation): A system where the AI looks up information from a database or the web before answering, ensuring its answer is up-to-date and factual.
- RLHF (Reinforcement Learning from Human Feedback): Fine-tuning a model based on humans ranking its answers, helping it learn what humans “prefer” in a response.
S — Z
Scaling Laws: Empirical observations that show AI performance improves predictably as you increase data, computing power, and model size.
System Prompt (Metaprompt): The hidden instructions given to an AI at the start of a session that define its personality, rules, and limitations.
Temperature: A setting that controls the “randomness” of the output. Low temperature (0.1) is focused and factual; high temperature (0.8+) is creative and varied.
Tokens: The basic units of text an LLM reads. A token is usually about 4 characters or 0.75 of a word.
Transformer: The specific neural network architecture (invented in 2017) that made modern LLMs possible.
Weights: The values within a neural network that determine how much importance to give to different inputs. These are “learned” during training.
Zero-Shot: Asking a model to perform a task it has never seen an example of before, relying entirely on its pre-existing knowledge.
