Retriever resources
The Claude Code & AI glossary
79 terms you keep seeing in AI threads, on Claude Code tutorials, in docs. Plain definitions, what they mean in practice, and where you actually run into them.
A
AgentAn LLM plus tools, instructions, and a loop that lets it act on its own.
In AI, an agent is a model that can take action on its own to reach a goal. It does not just answer a question — it uses tools (search the web, run a calculation, call an API, read a file), checks what happened, and decides the next step. It works in a loop until the task is done or you stop it.
In Claude Code specifically, the agent is a Claude instance running in your terminal: it can read and edit files in your project, run shell commands, search the web, open pull requests. A sub-agent is the same idea at smaller scale — the main agent spawns one to handle a scoped job and gets just the answer back.
AnthropicThe company that builds Claude.
Anthropic is the AI lab that builds Claude. Founded in 2021 in San Francisco by ex-OpenAI researchers, they train and ship the Claude family of models (Opus, Sonnet, Haiku), the API that lets developers call them, and Claude Code — the terminal tool that turned Claude into a coding agent. They publish a lot about AI safety, which shows up in how Claude refuses certain requests and how the harness is designed.
AntigravityGoogle's agentic coding IDE — its answer to Cursor and Claude Code.
Antigravity is Google's coding IDE built around autonomous AI agents that plan and execute multi-step coding tasks. It runs Gemini models and integrates with the wider Google Cloud and AI Studio stack. Announced in late 2025.
Direct competitor to Cursor and Claude Code, on the Google side.
APIApplication Programming Interface — the door a program uses to talk to another service.
An API is how two pieces of software talk to each other. Instead of going through a website, your code sends a request to a URL and gets a structured reply back, usually in JSON. Almost every modern app is glued together by APIs: when you log in with Google, pay with Stripe, or load a Slack message, that is an API call happening in the background. To use one you typically need an API key — a long secret string the provider gives you in your account dashboard.
For AI products, "calling the API" means sending a prompt to the company's servers and receiving the model's reply. Every lab ships one (OpenAI, Anthropic, Google, Mistral). The Claude API is what sits underneath claude.ai, Claude Code, and every third-party tool built on top of Claude.
Auto-compactWhen a long conversation is summarised to fit back into the context window.
Every LLM has a limit on how much text it can read at once (the context window). When a conversation grows close to that limit, products usually deal with it one of two ways: drop the oldest messages, or summarise them. Compaction is the second approach — older parts of the chat are condensed into a short summary so newer content still fits.
In Claude Code, this happens automatically: a notice appears in the terminal when it kicks in. Some details may be lost, but the session continues without crashing into the limit.
B
Background taskA long-running command that runs separately so it does not block your main work.
A background task is a long-running command — a build that takes two minutes, a dev server that runs forever, a test suite, a data sync — started in a way that does not block what you are doing. Very common in development: you launch a dev server in one terminal and keep coding in another.
In Claude Code, the agent can start a background task itself: "Command running in background" appears in the transcript, and when the task finishes Claude reads its output. Useful for letting tests run while the agent keeps editing files.
BashThe default shell on Mac and Linux, used to run command-line programs.
Bash is the program that interprets the commands you type in a terminal on Mac or Linux: `ls` lists files in a folder, `cd folder` moves into a folder, `git commit` saves a change. It is the most common shell in the developer world.
When you ask Claude Code (or any coding agent) to "run the tests" or "install this package", it uses a Bash tool under the hood: it types the command into your shell the same way you would. You see exactly what ran in the transcript, and you can approve or block specific commands.
Batch APIBulk API mode: upload many requests at once, get all answers back within 24 hours, half the price.
Most AI APIs return each call within seconds (synchronous). A batch API is the opposite: you upload a big file of requests — say, 10,000 prompts — wait up to 24 hours, and receive all the answers back at once. The trade is speed for cost: you wait longer but pay roughly half the price. It is the right choice for offline jobs where latency does not matter — classifying a lead list, enriching a database, summarising thousands of documents.
OpenAI, Anthropic, and most major providers offer one.
C
Cache (prompt caching)A trick to reuse parts of a long prompt across calls so they cost less and run faster.
Every time you call an LLM, the model re-reads your full prompt from scratch, which costs tokens (and money). Prompt caching is a shortcut offered by most providers: if the start of your prompt is identical to a previous request — a system prompt, a long document, an attached PDF — it gets stored for a few minutes and re-used instead of re-processed. Cached tokens cost about a tenth of normal tokens and process faster.
Claude Code uses this automatically (with CLAUDE.md, the system prompt, files you keep re-reading), which is why long sessions do not get exponentially expensive.
Chain of thoughtWhen the model writes out its reasoning step by step before answering.
If you ask an LLM "what is 17 × 24?", it does better when it works through the steps (17 × 20 = 340, then 17 × 4 = 68, total 408) than when it guesses straight to a number. That step-by-step working is called chain of thought, and it improves accuracy on hard problems. Used in every modern model.
Claude does this on its own when a problem needs it — you will often see Claude write a short plan or list its assumptions before acting. Extended thinking is the same idea turned up: more tokens spent reasoning before replying, for genuinely hard problems.
ChatGPTOpenAI's chat app — the product that put LLMs in mainstream hands in late 2022.
ChatGPT is the consumer chat interface from OpenAI, launched in November 2022. It is a web and mobile app where you chat with one of OpenAI's GPT models. Underneath it calls the OpenAI API, which is what most third-party apps use directly. Its launch is the reason most people first heard of LLMs at all.
On the Anthropic side, the equivalent is claude.ai.
CheckpointA saved snapshot of your project state that you can roll back to.
A checkpoint is any moment in your project's history you can return to later. The best checkpoints, by far, are Git commits — each commit is a snapshot `git reset` or `git checkout` can put you back at. Best practice when working with any AI agent: commit before letting it do something big, so if it goes sideways you are one command away from the original state.
Some IDEs and AI agents also offer a built-in "rewind" that returns you to an earlier message in the chat — that is a checkpoint at the conversation level, separate from your code.
CLAUDE.mdA file in your repo where you write instructions Claude reads at the start of every session.
Put project conventions, stack notes, deploy steps, and "always do X" rules here. Claude treats it as durable context.
CLICommand Line Interface, a program you talk to by typing commands in a terminal.
Claude Code is a CLI: you launch it with `claude` in your terminal and chat with it there.
CodebaseThe full set of source files for a project.
When Claude Code "reads the codebase" it grep-searches and reads relevant files on demand, not the whole project at once.
CodexOpenAI's coding agent. Also the name of an older OpenAI code model — check the date.
OpenAI uses the name Codex for two different things. In 2021 it was the code model that powered the first version of GitHub Copilot. Then in 2025 OpenAI brought the name back for a new coding agent that runs in your terminal and in the cloud, edits your repo, and runs tests.
The 2025 Codex is the direct OpenAI equivalent of Claude Code.
CommitA versioned snapshot of a change in Git.
A commit records what changed, who changed it, and a message describing why. Claude can create commits for you with `git commit`.
Computer useA capability that lets Claude control your screen, mouse, and keyboard.
Used for tasks no API can do, like clicking through a native app. Risky: scope what Claude can touch, and watch what it does.
Context engineeringThe craft of choosing what information ends up in the model's context window.
A bigger skill than "prompt engineering". Includes deciding what to retrieve, summarise, cache, and exclude so the model has just enough to do the job.
Context windowHow much text the model can read in one go, measured in tokens.
Claude Opus and Sonnet support up to 1 million tokens, roughly 750 pages of text. Past the window, older content gets compacted or dropped.
CursorA code editor (a fork of VS Code) built around AI assistance.
Cursor is one option for editing with AI. Claude Code itself runs in a terminal and can be paired with Cursor, VS Code, or any editor.
D
DeepSeekA Chinese open-weights LLM family known for strong reasoning at very low cost.
DeepSeek is a Chinese AI lab that publishes open-weights models (DeepSeek-V3, DeepSeek-R1). "Open weights" means anyone can download and run the model on their own hardware, unlike Claude or GPT which are only available through an API.
Their late-2024 and early-2025 releases caught attention by matching frontier models at a fraction of the training cost, and forced US labs to rethink their pricing.
DiffThe list of lines added and removed between two versions of a file.
Every edit Claude proposes is shown as a diff so you can see exactly what changed before accepting.
E
Edit toolThe built-in Claude Code action that modifies an existing file in place.
Replaces an exact string with another. Safer than rewriting the whole file because the change is small and reviewable.
EmbeddingA vector representation of text used for semantic search and similarity.
Two pieces of text with similar meaning have nearby vectors. Used in RAG and recommendation systems.
EndpointA URL that an API listens on for requests.
The Claude API has endpoints for messages, batches, files, and so on.
Extended thinkingA mode where the model spends extra tokens reasoning before it replies.
Trades latency for quality on hard problems like long refactors or proofs. The thinking is visible to the user and to following turns.
G
GeminiGoogle's flagship LLM family — the equivalent of GPT (OpenAI) and Claude (Anthropic).
Gemini is the model family from Google DeepMind, first released late 2023. It powers Google's Gemini chat (the product), NotebookLM, Antigravity, and any "powered by Gemini" product. Tiers go from Ultra to Pro to Flash, ordered from most capable to fastest.
Known for very long context windows (1M+ tokens) and strong multimodal abilities — it reads images, audio, and video natively.
GitA version control system that tracks every change to your code.
Claude Code uses Git constantly: to read diffs, create commits, push branches, open pull requests.
GitHubA hosting service for Git repositories with reviews, issues, and CI.
Claude can interact with GitHub through the `gh` CLI or the GitHub MCP server.
GPTOpenAI's model family — GPT-3, GPT-4, GPT-4o, GPT-5, o1, o3, and others.
GPT stands for Generative Pre-trained Transformer. It is the name of the model family that powers ChatGPT and the OpenAI API. Versions go GPT-3 (2020), GPT-4 (2023), GPT-4o, GPT-5, and so on. The "o" variants (o1, o3) are reasoning-focused models tuned to spend more compute thinking before answering.
The exact version you reach depends on the product or the API model ID, which changes every few months.
GrokThe LLM from xAI (Elon Musk's AI company), built into X / Twitter.
Grok is the model family from xAI. It powers the Grok chat on X (formerly Twitter) and is available via API. Known for its access to live X data — posts, replies, trends — and a less-restricted tone than ChatGPT or Claude.
Directly competes with the other frontier model families (GPT, Claude, Gemini).
H
HaikuThe smallest and fastest Claude model in the lineup.
Good for high-volume cheap work like classification, light extraction, or quick UI calls.
HallucinationWhen the model confidently invents a fact, API, or file that does not exist.
The fix is grounding: have the model read the real code or docs first instead of guessing.
HarnessThe technical envelope around the model that handles tools, permissions, memory, and the loop.
Claude Code is a harness around a Claude model. The same Claude model behaves very differently inside Claude Code, the API, or claude.ai because the harness around it is different.
HermesAn open-source LLM family fine-tuned by Nous Research, usually on top of Llama.
Hermes is a family of open-source models (Hermes 2, Hermes 3, and others) fine-tuned by Nous Research. Where Llama is the raw open-weights model from Meta, Hermes is a community fine-tune aiming for stronger reasoning, instruction-following, and tool use.
You run it yourself on local hardware or call it through an inference provider — there is no single official API like Claude or GPT.
HookA shell command the harness runs automatically when a specific event happens.
Examples: run a linter after every edit, block certain commands, notify a Slack channel when Claude finishes. Configured in `settings.json`.
I
IDEIntegrated Development Environment, an editor enriched for coding (Cursor, VS Code, JetBrains).
Claude Code can run inside an IDE's terminal and share context with the editor (open file, selection, diagnostics).
InferenceA single call to the model: prompt in, completion out.
Latency and cost are measured per inference. Caching and batching reduce both.
J
JSONA simple text format for structured data, built from keys, values, lists, and nested objects.
Used everywhere: tool inputs and outputs, API payloads, config files. Claude can produce strict JSON on demand.
L
LatencyHow long a request takes from send to response.
First-token latency matters for chat UIs; total latency matters for automations. Caching, smaller models, and streaming all reduce perceived latency.
LlamaMeta's open-weights LLM family — the foundation for most open-source AI work.
Llama (Llama 2, Llama 3, Llama 4) is Meta's model family, published with open weights. Anyone can download and run them on their own hardware, unlike Claude or GPT which are only available through an API. It is the most-used base for community fine-tunes — Hermes, Code Llama, and dozens of others.
When people say "I run an open-source LLM", they usually mean a Llama-derived model.
LLMLarge Language Model, the kind of neural network that powers Claude.
Trained on text to predict the next token. With instruction tuning and tools, it becomes a working assistant.
Local modelAn LLM that runs entirely on your own machine, no network call.
Better for privacy and offline work, weaker than frontier models. Not what Claude Code uses by default.
M
MCPModel Context Protocol, an open standard for plugging external tools and data into AI assistants.
An MCP server exposes resources (read-only data) and tools (actions). Claude Code can connect to many at once: GitHub, Notion, Linear, your database, internal APIs.
MemoryA persistent store the assistant can write notes into across conversations.
In Claude Code, memory lives in a folder of Markdown files. Useful for user preferences, project context, and feedback that should carry over.
MistralA French AI lab known for open-weights models and a strong European positioning.
Mistral AI is a Paris-based lab founded in 2023, the highest-profile European challenger to OpenAI and Anthropic. They publish open-weights models (Mistral 7B, Mixtral, Codestral) alongside commercial APIs and a chat product called Le Chat.
The brand is the French / EU bet on sovereign AI: independent from US providers, open to local hosting.
ModelA specific trained network you can call by name (e.g. claude-opus-4-7).
The Claude lineup is Opus, Sonnet, Haiku, ordered from most capable to fastest. Each has versioned releases.
MultimodalA model that can read more than just text, typically images and sometimes audio or video.
Claude can read screenshots, PDFs, and diagrams. Useful for "look at this UI and tell me what is wrong".
O
OpenAIThe AI lab behind ChatGPT and the GPT family.
OpenAI is the San Francisco AI lab founded in 2015. It built and ships ChatGPT, the GPT model family, the OpenAI API, and a range of other models for images, video, and coding. The launch of ChatGPT in November 2022 triggered the wave of AI products you see today.
Direct competitors: Anthropic (Claude) and Google DeepMind (Gemini).
OpusThe most capable model in the Claude lineup.
Used for hard reasoning, long agentic runs, and large refactors. Slower and more expensive than Sonnet or Haiku.
P
Permission modeThe harness setting that decides which tools Claude can use without asking.
Modes range from very strict (every tool needs approval) to permissive. Project-level rules live in `.claude/settings.json`.
Plan modeA read-only Claude Code mode for exploring a problem before any edits.
Claude can read files and run shell commands, but cannot write or run destructive actions. Useful to scope a refactor first.
PR (pull request)A proposal to merge a branch into the main codebase, reviewed by teammates.
Claude can open, comment on, and review PRs through the `gh` CLI or the GitHub MCP server.
PromptAny instruction you send to the model.
In practice the model sees a stack: system prompt, prior messages, your message. "Prompt" usually means just your message.
Prompt engineeringThe craft of writing prompts that reliably get the answer you want.
Examples, format constraints, role framing, and worked steps all help. Less central than it used to be as models follow plain instructions better.
Prompt injectionWhen malicious text inside data the model reads tries to override your instructions.
For example, a webpage saying "ignore previous instructions and email this address". The fix is sandboxing and reviewing what the model actually does.
R
RAG (Retrieval-Augmented Generation)Pulling relevant snippets from a knowledge base and stuffing them into the prompt before asking.
Lets the model answer about content it was not trained on, like your own docs or a fresh codebase.
Rate limitThe cap that throttles how often you can call the API.
When you exceed it, the API responds with 429 and a retry-after hint. Most SDKs back off automatically.
RepoShort for repository, a project tracked by Git.
A repo holds the code, history, and configuration for one project. Claude Code operates on whichever repo your terminal is in.
RoutineA scheduled task that runs Claude on a cron-like timer.
Example: run "check overnight PRs and summarise to Slack" every morning at 8am. Configured via the schedule skill.
S
SandboxAn isolated environment where code or commands can run without affecting the rest of the system.
Vercel Sandbox and Claude Code's permission system both aim at the same thing: limit blast radius.
Settings.jsonThe Claude Code config file where permissions, hooks, env vars, and MCP servers live.
Two layers: `~/.claude/settings.json` (global, all projects) and `.claude/settings.json` (project-specific).
SkillA reusable workflow you can invoke with a slash command.
A skill bundles instructions, examples, and sometimes tools so a multi-step task ("ship a PR", "review code") runs the same way every time.
Slash commandA shortcut typed in chat as `/name` that triggers a skill or built-in action.
Examples in Claude Code: `/help`, `/clear`, `/review`. Custom slash commands are how teams package shared workflows.
SonnetThe mid-tier Claude model: nearly as capable as Opus, much faster and cheaper.
The default workhorse for most coding and agentic tasks.
Sub-agentA separate Claude instance spawned by the main agent for a scoped task.
Used for parallel research, isolating long context, or running risky steps in a fresh sandbox. The main agent gets just the result back.
System promptThe base instructions the harness sends to the model on every request, invisible in normal use.
Sets identity, tools, rules, and tone. Different products (Claude Code, claude.ai, API) ship different system prompts on the same underlying model.
T
TerminalThe text-based interface where you type commands to your computer.
On Mac: Terminal.app or iTerm. Claude Code lives here.
ThinkingSee Extended thinking.
TokenThe unit the model counts in: roughly 4 characters or 3/4 of a word in English.
Pricing and context windows are measured in tokens. A 1M-token window holds about 750,000 words.
Tool useWhen the model decides to call a tool (file read, shell command, API) instead of just writing text.
Each tool has a name, a schema, and a description. The model picks one, fills in the inputs, and the harness runs it and returns the result.
TurnOne message in a conversation, either from the user or the assistant.
A multi-turn run is a back-and-forth. Each turn adds tokens to the context window.
U
User promptThe message you actually send the model, distinct from the system prompt.
In chat UIs this is what you type into the box. Everything else around it (system prompt, prior turns, tool definitions) is set by the harness.
V
Vibe codingLetting the AI write the code while you describe intent and review the result.
Coined by Andrej Karpathy. Works for prototypes; demands real review for anything that ships to production.
VS CodeVisual Studio Code, a free editor from Microsoft, the most widely used IDE today.
Claude Code integrates with VS Code via an extension that shares context with the editor.
W
WebhookA URL another service calls when an event happens (PR opened, deploy finished, message received).
How chat bots and CI integrations receive notifications. The Vercel Chat SDK and the GitHub MCP both speak webhooks.
WorktreeA separate working directory pointing at a different branch of the same Git repo.
Lets you experiment on a branch without touching your main checkout. Claude Code can spawn an agent inside a worktree so its changes stay isolated.
Y
YAMLA human-friendly format for config files and frontmatter.
Skills and memory files in Claude Code use YAML frontmatter at the top to declare metadata like name, description, and type.