What's the difference between a checkpoint and a LoRA?

A checkpoint is the full base model — billions of parameters trained from scratch, gigabytes on disk. A LoRA is a small adapter that nudges a checkpoint toward a specific style or character, typically tens to hundreds of megabytes. You can stack many LoRAs on one checkpoint; you only run one checkpoint at a time.

CFG (classifier-free guidance) scale controls how closely the model tries to follow your prompt versus how much creative latitude it takes. Low CFG (3 to 5) gives looser, more creative output; high CFG (10 to 15) sticks closer to the prompt but can over-saturate or look artificial. 6 to 8 is the practical sweet spot for most images.

Why do I need a negative prompt?

A negative prompt tells the model what to push away from — common artifacts (extra fingers, distorted faces, watermarks), unwanted styles, or specific elements you do not want. It is not strictly required, but a short negative prompt with the most common failure modes (low quality, bad anatomy, blurry) routinely improves output.

An illustrated AI dictionary spread showing terminology related to image generation and chat models.

AI Image and Chat Glossary — Terms Explained

Q: What is multimodal AI?

Multimodal AI is a model that handles more than one kind of input or output — text, images, audio, video. GPT-4o, Claude 3, Gemini are multimodal — they accept images alongside text and reason across both. The term distinguishes them from single-mode models like a pure text-to-image generator.

Charmloop Team· Editorial

May 28, 202616 min read

AI image generation and AI character chat have their own vocabulary, and it grew faster than any glossary could keep up with. This page is the practical one — every term you will run into on a settings panel, in a tutorial, in a Reddit thread, or in another guide, with two or three sentences explaining what it means and why you actually care. Skim it, search it, bookmark it.

The terms are organized alphabetically. Where a term is interchangeable with another (denoising / denoise / strength, for instance), the entry sits under the most common form.

A

Attention / weight syntax

A way of telling the model that certain parts of a prompt matter more or less than others. In most Stable Diffusion-based tools, (word) increases attention on a token, ((word)) increases it more, and [word] decreases it. Used to emphasize specific traits — "(red hair:1.3)" is a common form.

Autoregressive

A model that generates output one piece at a time, conditioned on what came before. Language models like GPT and Claude are autoregressive — each token is predicted from the tokens before it. Image generators usually are not (they generate in parallel), which is part of why image generation is faster than long-form text generation.

B

Base model

The trained foundation model that everything else builds on. Stable Diffusion 1.5, SDXL, Flux, DALL-E 3, GPT-4, Claude 3 — these are base models. LoRAs, fine-tunes, and prompts all modify the behavior of a base model rather than replacing it.

Batch generation

Generating multiple images in a single run, usually from the same prompt with different seeds. Useful for picking the best output of several attempts. Most tools cap batch size by tier — free tiers might allow 1 to 4 at a time, paid tiers more.

C

CFG scale (classifier-free guidance)

A number — usually between 1 and 20 — that controls how closely the model follows your prompt. Low values produce looser, more creative output; high values stick rigidly to the prompt but can over-saturate. 6 to 8 is the practical sweet spot. CFG is the single most useful slider to experiment with.

Checkpoint

A trained model file containing the full set of model weights, usually gigabytes on disk. SD 1.5 is a checkpoint. SDXL is a checkpoint. So is any community-trained fine-tune of either. You load one checkpoint at a time; LoRAs, embeddings, and other adapters layer on top of it.

Compaction

The process of summarizing earlier turns of a chat to fit within the context window when the conversation grows long. Almost every chat platform uses some form of compaction once you cross a few thousand tokens. Done well, compaction preserves the things that matter (key facts, ongoing scenarios). Done poorly, it loses them.

Context window

The maximum amount of text — prompt + history + system instructions + response — that a chat model can attend to in a single turn. Modern models range from 8K to 1M+ tokens. Bigger context windows make long conversations feel more continuous; they also cost more per token.

ControlNet

A family of conditioning extensions for image generators that lets you specify structure alongside the prompt — pose, edges, depth, segmentation. Hand the model a stick-figure pose and a prompt, get an image of a character in that pose. Powerful for composing specific shots; common in production workflows.

CSAM

Child sexual abuse material. Universally blocked across every legitimate AI platform; the platforms that allow adult content all explicitly prohibit it. Detection systems (PhotoDNA, classifier-based filters) operate on inputs and outputs across the industry. Mentioned here so the line is clear: no platform that calls itself "uncensored" makes an exception for this.

D

DALL-E

OpenAI's image generation family. DALL-E 3, integrated into ChatGPT, is the version most users encounter in 2026. Known for strong prompt understanding and scene composition; SFW-only.

Denoise / denoising strength

For image-to-image generation, the amount the model is allowed to change the input image. 0.1 is a tiny nudge; 0.9 is "use the input as loose inspiration." Sometimes called "strength." Most useful for img2img workflows where you want to refine an existing image rather than start from scratch.

Diffusion

The underlying technique behind most modern image generators. The model learns to remove noise from images; generation works by starting with pure noise and iteratively denoising toward the prompted result. Most "AI image generator" tools in 2026 are diffusion models under the hood.

E

Embedding (or textual inversion)

A small file — kilobytes to a few megabytes — that teaches the model a specific concept, style, or token by training a custom embedding vector. Less powerful than a LoRA but smaller and faster to train. Often used for specific styles or character likenesses.

ERP

Erotic roleplay. A common term in AI chat communities for adult roleplay scenarios. Platforms that allow adult content explicitly support it; SFW-leaning platforms block it via safety classifiers regardless of user age.

F

Fine-tune

A retraining of a base model on additional data to specialize it. Civitai is full of fine-tuned checkpoints — an anime fine-tune, a photorealistic fine-tune, a Studio-Ghibli-style fine-tune. Fine-tunes are full checkpoints; LoRAs are a lighter alternative for similar use cases.

Flux

A model family released in 2024 by Black Forest Labs, notable for strong prompt adherence and clean output. By 2026, Flux-derived checkpoints are widely used across hosted and self-hosted workflows; the family is often discussed alongside SDXL as one of the two dominant open-weight options.

FPS (in video)

Frames per second in AI-generated video. Most consumer I2V tools output at 24 to 30 fps. Higher fps means smoother motion; longer-clip generations sometimes trade fps for length.

G

GPU

The hardware that runs AI inference. Image generation and chat both depend on GPUs (Nvidia's H100, A100, RTX 4090, 5090, and various data-center-only variants are the names you will see). When a service is slow, it is usually because the GPU pool is saturated.

H

Hallucination

When a chat model produces output that is confidently wrong — a wrong fact, an invented citation, a fabricated quote. Image models hallucinate too — extra fingers, impossible jewelry, doors that lead nowhere. Hallucination is not a bug to be patched; it is intrinsic to how these models work. Plan around it.

I

I2V (image-to-video)

A video generation workflow where you provide a starting image and the model animates it forward in time. The opposite of T2V (text-to-video), which generates both subject and motion from a prompt alone. I2V is more controllable and produces higher quality on most subjects; see the I2V workflow guide for the full flow.

Inpainting

Re-generating a specific masked region of an image while leaving the rest untouched. Used for fixing flaws (a warped hand, a strange face), editing details (changing an outfit), or extending elements. Most production AI workflows include inpainting as a regular cleanup step.

IP-Adapter

A conditioning extension that lets you provide a reference image alongside your prompt. The model uses the reference to influence the output — particularly useful for character consistency, where you provide a face reference and ask the model to keep it across generations.

L

Latent space

The compressed mathematical representation of images that diffusion models actually work in. You do not interact with latent space directly; it is the room the model thinks in. Most image generators encode the input to latent space, manipulate it, and decode it back to pixels at the end. Mentioned here because tutorials will mention it and the term sounds more mysterious than it is.

LoRA (Low-Rank Adaptation)

A small adapter file that nudges a checkpoint toward a specific style, character, concept, or aesthetic without retraining the full model. LoRAs are tens to hundreds of megabytes; checkpoints are gigabytes. You can stack multiple LoRAs at varying strengths to combine effects.

M

Memory

In chat contexts, persistent recall of facts and context across sessions. Different from context window — memory survives session boundaries, context window does not. Implementation varies wildly: some platforms summarize and store explicitly; some retrieve from a vector store; some claim memory without much of one underneath.

Midjourney

A subscription-based image generator known for high-quality, aesthetically polished output out of the box. Strong on cinematic and stylized work; tight constraints on workflow; SFW-only. One of the most-named tools in any AI image conversation.

Multimodal

A model that handles more than one kind of input or output — text, images, audio, video. GPT-4o, Claude 3, Gemini are multimodal — they accept images alongside text and reason across both. The term distinguishes them from single-mode models like a pure text-to-image generator.

N

Negative prompt

The list of things you do not want in the output. "low quality, bad anatomy, extra fingers, watermark" is a typical starter set. The negative prompt is often the difference between a tool that produces consistent quality and one that produces visible artifacts on a fraction of generations.

NSFW

Not safe for work — content involving adult themes, including but not limited to sexual content. Tools and platforms have explicit policies. Some allow it (Charmloop, Civitai with the right filter settings, several smaller services); most do not (Midjourney, DALL-E, Firefly, ImageFX). The tools that allow it have hard lines — CSAM, non-consensual real-person deepfakes, and other clearly-illegal categories are universally blocked.

NSFW filter

A safety classifier that runs on inputs, outputs, or both to detect and block adult content. SFW-leaning platforms run aggressive filters; adult-permitting platforms run targeted filters (CSAM, real-person deepfakes) without the broad adult-content block.

O

Outpainting

Extending an image beyond its original borders. The opposite of cropping — you give the model an image and ask it to imagine what would be just outside the frame. Useful for converting square outputs to widescreen, or for generating environments around a centered character.

P

Prompt

The text you write to describe the image or response you want. The single most impactful thing you control over the output.

PuLID / InstantID

Face-preservation techniques that let the model lock onto a specific face from a reference image and reproduce it across generations. More aggressive than IP-Adapter for facial identity; usually gated to higher tiers because they cost more GPU time per generation.

R

RAG (Retrieval-Augmented Generation)

A pattern where a chat model retrieves relevant documents or facts from a database before responding, rather than relying purely on its training. Used to give chat models access to up-to-date information, company-specific knowledge, or persistent memory.

RLHF (Reinforcement Learning from Human Feedback)

A training step where human raters review model outputs and the model learns to produce more outputs like the highly-rated ones. The dominant alignment technique for chat models in 2023–25; newer approaches (DPO, RLAIF, constitutional AI) extend or replace parts of it.

S

Safety filter / safety classifier

A separate model that runs alongside the main generator to detect and block specific kinds of outputs — adult content, violence, hate, CSAM, real-person likenesses. Different platforms run different filters; the philosophy differs across the industry.

Sampler

The algorithm that controls how the diffusion model steps from noise to output. Names you will see: Euler, Euler a, DPM++, DPM++ 2M Karras, UniPC. Different samplers produce slightly different outputs at the same prompt and seed. The default sampler is usually fine; experiment if you want to dial in a specific look.

Seed

A number that controls the random starting noise for an image generation. The same prompt with the same seed produces (approximately) the same image on the same model. Useful for reproducibility — save the seed of an image you like and you can regenerate variants from the same starting point.

Steps

The number of denoising iterations the model runs from pure noise to final output. More steps = generally higher quality, but with diminishing returns past 30 to 40 on most modern samplers. 20 to 30 is the typical sweet spot.

System prompt

The instructions given to a chat model that set its persona, behavior, and constraints. The model receives the system prompt before the user's messages; it shapes everything that follows. Persona cards in AI character platforms are an elaborated form of system prompt.

T

T2I / T2V

Text-to-image and text-to-video. The generation modes where you provide only a prompt. Contrast with I2I (image-to-image) and I2V (image-to-video), where you also provide a starting image.

Token

The unit of text a language model processes. Roughly one token equals 0.75 English words. Pricing on most chat APIs is per-token, in and out. Context windows are measured in tokens. The platform's in-product currency is sometimes also called tokens but is unrelated — Charmloop calls its in-product currency "charms" specifically to avoid this confusion.

U

Upscaler

A model or algorithm that increases the resolution of an image. AI upscalers (ESRGAN, SwinIR, and various commercial versions) intelligently add detail rather than just stretching pixels. Most image generators max out around 1024x1024 native and upscale from there; the upscale step matters more than the native maximum.

Uncensored

In chat and image generation contexts, "uncensored" usually means the platform does not run a broad adult-content filter. It does not mean "no rules" — every legitimate platform that uses this term still blocks CSAM, non-consensual deepfakes, and other clearly-illegal categories. See the ai chat without filter guide for the longer treatment of what the term actually denotes in 2026.

V

VAE (Variational Autoencoder)

The component of a diffusion model that translates between pixel space (what you see) and latent space (where the model thinks). A different VAE can produce subtly different colors and details from the same checkpoint. Usually you do not touch the VAE; sometimes a specific model recommends a paired VAE.

Vector store / embedding store

A database that stores text or images as high-dimensional vectors, allowing similarity search. The backbone of most RAG systems and many AI memory implementations. Pinecone, Weaviate, Qdrant, Chroma are common names.

W

Weights

The learned parameters of a trained model — billions of numbers that encode what the model knows. Used interchangeably with "model" in casual speech. "Open-weight" models are ones where the weights are publicly downloadable; "closed-weight" or "proprietary" models keep the weights private.

Z

Zero-shot

The model performing a task it was never explicitly trained on, based purely on the prompt. Modern chat and image models are extensively zero-shot capable — you can describe a novel character or scenario and the model handles it without prior training on that exact thing.

That is the practical glossary. If a term you ran into is missing, the honest guide to choosing an AI image generator and the AI image prompts guide cover the most common ones in workflow context. For chat-specific terminology around memory and continuity, the AI chat with memory guide goes deeper on the practical side. And for the umbrella technical concepts that span both, the LoRA guide is the closest deep-dive on the adapter side of model customization.

The vocabulary moves fast; some of these terms will be outdated in twelve months and there will be five new ones to know. The shape of what each describes — a knob on the model, a piece of the pipeline, a category of content, a workflow pattern — stays stable even as the names rotate.

Поширені запитання

Почніть творити

Подивіться, що може згенерувати Charmloop

Генерація AI-зображень студійної якості. Картка не потрібна.

Спробувати студію безкоштовно Переглянути персонажів

Схожі статті

A grid of stylistically varied AI-generated images representing different image generator outputs.

Image Generation

The terms are organized alphabetically. Where a term is interchangeable with another (denoising / denoise / strength, for instance), the entry sits under the most common form.

A

Attention / weight syntax

Autoregressive

B

Base model

Batch generation

C

CFG scale (classifier-free guidance)

Checkpoint

Compaction

Context window

ControlNet

CSAM

D

DALL-E

OpenAI's image generation family. DALL-E 3, integrated into ChatGPT, is the version most users encounter in 2026. Known for strong prompt understanding and scene composition; SFW-only.

Denoise / denoising strength

Diffusion

E

Embedding (or textual inversion)

ERP

F

Fine-tune

Flux

FPS (in video)

Frames per second in AI-generated video. Most consumer I2V tools output at 24 to 30 fps. Higher fps means smoother motion; longer-clip generations sometimes trade fps for length.

G

GPU

H

Hallucination

I

I2V (image-to-video)

Inpainting

IP-Adapter

L

Latent space

LoRA (Low-Rank Adaptation)

M

Memory

Midjourney

Multimodal

N

Negative prompt

NSFW

NSFW filter

O

Outpainting

P

Prompt

The text you write to describe the image or response you want. The single most impactful thing you control over the output.

PuLID / InstantID

R

RAG (Retrieval-Augmented Generation)

RLHF (Reinforcement Learning from Human Feedback)

S

Safety filter / safety classifier

Sampler

Seed

Steps

System prompt

T

T2I / T2V

Text-to-image and text-to-video. The generation modes where you provide only a prompt. Contrast with I2I (image-to-image) and I2V (image-to-video), where you also provide a starting image.

AI Image and Chat Glossary — Terms Explained

Поширені запитання

What's the difference between a checkpoint and a LoRA?

What is CFG scale?

Why do I need a negative prompt?

What is multimodal AI?

Подивіться, що може згенерувати Charmloop

Схожі статті

Honest Guide to Choosing an AI Image Generator

How to Write AI Image Prompts That Work

What Is a LoRA in AI Image Generation?

AI Chat With Memory — What It Means

AI Image and Chat Glossary — Terms Explained

A

Attention / weight syntax

Autoregressive

B

Base model

Batch generation

C

CFG scale (classifier-free guidance)

Checkpoint

Compaction

Context window

ControlNet

CSAM

D

DALL-E

Denoise / denoising strength

Diffusion

E

Embedding (or textual inversion)

ERP

F

Fine-tune

Flux

FPS (in video)

G

GPU

H

Hallucination

I

I2V (image-to-video)

Inpainting

IP-Adapter

L

Latent space

LoRA (Low-Rank Adaptation)

M

Memory

Midjourney

Multimodal

N

Negative prompt

NSFW

NSFW filter

O

Outpainting

P

Prompt

PuLID / InstantID

R

RAG (Retrieval-Augmented Generation)

RLHF (Reinforcement Learning from Human Feedback)

S

Safety filter / safety classifier

Sampler

Seed

Steps

System prompt

T

T2I / T2V

Token

U

Upscaler

Uncensored

V

VAE (Variational Autoencoder)

Vector store / embedding store

W