Поширені запитання
Почніть творити
Подивіться, що може згенерувати Charmloop
Генерація AI-зображень студійної якості. Картка не потрібна.


Почніть творити
Генерація AI-зображень студійної якості. Картка не потрібна.
AI image generation and AI character chat have their own vocabulary, and it grew faster than any glossary could keep up with. This page is the practical one — every term you will run into on a settings panel, in a tutorial, in a Reddit thread, or in another guide, with two or three sentences explaining what it means and why you actually care. Skim it, search it, bookmark it.
The terms are organized alphabetically. Where a term is interchangeable with another (denoising / denoise / strength, for instance), the entry sits under the most common form.
A way of telling the model that certain parts of a prompt matter more or less than others. In most Stable Diffusion-based tools, (word) increases attention on a token, ((word)) increases it more, and [word] decreases it. Used to emphasize specific traits — "(red hair:1.3)" is a common form.
A model that generates output one piece at a time, conditioned on what came before. Language models like GPT and Claude are autoregressive — each token is predicted from the tokens before it. Image generators usually are not (they generate in parallel), which is part of why image generation is faster than long-form text generation.
The trained foundation model that everything else builds on. Stable Diffusion 1.5, SDXL, Flux, DALL-E 3, GPT-4, Claude 3 — these are base models. LoRAs, fine-tunes, and prompts all modify the behavior of a base model rather than replacing it.
Generating multiple images in a single run, usually from the same prompt with different seeds. Useful for picking the best output of several attempts. Most tools cap batch size by tier — free tiers might allow 1 to 4 at a time, paid tiers more.
A number — usually between 1 and 20 — that controls how closely the model follows your prompt. Low values produce looser, more creative output; high values stick rigidly to the prompt but can over-saturate. 6 to 8 is the practical sweet spot. CFG is the single most useful slider to experiment with.
A trained model file containing the full set of model weights, usually gigabytes on disk. SD 1.5 is a checkpoint. SDXL is a checkpoint. So is any community-trained fine-tune of either. You load one checkpoint at a time; LoRAs, embeddings, and other adapters layer on top of it.
The process of summarizing earlier turns of a chat to fit within the context window when the conversation grows long. Almost every chat platform uses some form of compaction once you cross a few thousand tokens. Done well, compaction preserves the things that matter (key facts, ongoing scenarios). Done poorly, it loses them.
The maximum amount of text — prompt + history + system instructions + response — that a chat model can attend to in a single turn. Modern models range from 8K to 1M+ tokens. Bigger context windows make long conversations feel more continuous; they also cost more per token.
A family of conditioning extensions for image generators that lets you specify structure alongside the prompt — pose, edges, depth, segmentation. Hand the model a stick-figure pose and a prompt, get an image of a character in that pose. Powerful for composing specific shots; common in production workflows.
Child sexual abuse material. Universally blocked across every legitimate AI platform; the platforms that allow adult content all explicitly prohibit it. Detection systems (PhotoDNA, classifier-based filters) operate on inputs and outputs across the industry. Mentioned here so the line is clear: no platform that calls itself "uncensored" makes an exception for this.
OpenAI's image generation family. DALL-E 3, integrated into ChatGPT, is the version most users encounter in 2026. Known for strong prompt understanding and scene composition; SFW-only.
For image-to-image generation, the amount the model is allowed to change the input image. 0.1 is a tiny nudge; 0.9 is "use the input as loose inspiration." Sometimes called "strength." Most useful for img2img workflows where you want to refine an existing image rather than start from scratch.
The underlying technique behind most modern image generators. The model learns to remove noise from images; generation works by starting with pure noise and iteratively denoising toward the prompted result. Most "AI image generator" tools in 2026 are diffusion models under the hood.
A small file — kilobytes to a few megabytes — that teaches the model a specific concept, style, or token by training a custom embedding vector. Less powerful than a LoRA but smaller and faster to train. Often used for specific styles or character likenesses.
Erotic roleplay. A common term in AI chat communities for adult roleplay scenarios. Platforms that allow adult content explicitly support it; SFW-leaning platforms block it via safety classifiers regardless of user age.
A retraining of a base model on additional data to specialize it. Civitai is full of fine-tuned checkpoints — an anime fine-tune, a photorealistic fine-tune, a Studio-Ghibli-style fine-tune. Fine-tunes are full checkpoints; LoRAs are a lighter alternative for similar use cases.
A model family released in 2024 by Black Forest Labs, notable for strong prompt adherence and clean output. By 2026, Flux-derived checkpoints are widely used across hosted and self-hosted workflows; the family is often discussed alongside SDXL as one of the two dominant open-weight options.
Frames per second in AI-generated video. Most consumer I2V tools output at 24 to 30 fps. Higher fps means smoother motion; longer-clip generations sometimes trade fps for length.
The hardware that runs AI inference. Image generation and chat both depend on GPUs (Nvidia's H100, A100, RTX 4090, 5090, and various data-center-only variants are the names you will see). When a service is slow, it is usually because the GPU pool is saturated.
When a chat model produces output that is confidently wrong — a wrong fact, an invented citation, a fabricated quote. Image models hallucinate too — extra fingers, impossible jewelry, doors that lead nowhere. Hallucination is not a bug to be patched; it is intrinsic to how these models work. Plan around it.
A video generation workflow where you provide a starting image and the model animates it forward in time. The opposite of T2V (text-to-video), which generates both subject and motion from a prompt alone. I2V is more controllable and produces higher quality on most subjects; see the I2V workflow guide for the full flow.
Re-generating a specific masked region of an image while leaving the rest untouched. Used for fixing flaws (a warped hand, a strange face), editing details (changing an outfit), or extending elements. Most production AI workflows include inpainting as a regular cleanup step.
A conditioning extension that lets you provide a reference image alongside your prompt. The model uses the reference to influence the output — particularly useful for character consistency, where you provide a face reference and ask the model to keep it across generations.
The compressed mathematical representation of images that diffusion models actually work in. You do not interact with latent space directly; it is the room the model thinks in. Most image generators encode the input to latent space, manipulate it, and decode it back to pixels at the end. Mentioned here because tutorials will mention it and the term sounds more mysterious than it is.
A small adapter file that nudges a checkpoint toward a specific style, character, concept, or aesthetic without retraining the full model. LoRAs are tens to hundreds of megabytes; checkpoints are gigabytes. You can stack multiple LoRAs at varying strengths to combine effects.
In chat contexts, persistent recall of facts and context across sessions. Different from context window — memory survives session boundaries, context window does not. Implementation varies wildly: some platforms summarize and store explicitly; some retrieve from a vector store; some claim memory without much of one underneath.
A subscription-based image generator known for high-quality, aesthetically polished output out of the box. Strong on cinematic and stylized work; tight constraints on workflow; SFW-only. One of the most-named tools in any AI image conversation.
A model that handles more than one kind of input or output — text, images, audio, video. GPT-4o, Claude 3, Gemini are multimodal — they accept images alongside text and reason across both. The term distinguishes them from single-mode models like a pure text-to-image generator.
The list of things you do not want in the output. "low quality, bad anatomy, extra fingers, watermark" is a typical starter set. The negative prompt is often the difference between a tool that produces consistent quality and one that produces visible artifacts on a fraction of generations.
Not safe for work — content involving adult themes, including but not limited to sexual content. Tools and platforms have explicit policies. Some allow it (Charmloop, Civitai with the right filter settings, several smaller services); most do not (Midjourney, DALL-E, Firefly, ImageFX). The tools that allow it have hard lines — CSAM, non-consensual real-person deepfakes, and other clearly-illegal categories are universally blocked.
A safety classifier that runs on inputs, outputs, or both to detect and block adult content. SFW-leaning platforms run aggressive filters; adult-permitting platforms run targeted filters (CSAM, real-person deepfakes) without the broad adult-content block.
Extending an image beyond its original borders. The opposite of cropping — you give the model an image and ask it to imagine what would be just outside the frame. Useful for converting square outputs to widescreen, or for generating environments around a centered character.
The text you write to describe the image or response you want. The single most impactful thing you control over the output.
Face-preservation techniques that let the model lock onto a specific face from a reference image and reproduce it across generations. More aggressive than IP-Adapter for facial identity; usually gated to higher tiers because they cost more GPU time per generation.
A pattern where a chat model retrieves relevant documents or facts from a database before responding, rather than relying purely on its training. Used to give chat models access to up-to-date information, company-specific knowledge, or persistent memory.
A training step where human raters review model outputs and the model learns to produce more outputs like the highly-rated ones. The dominant alignment technique for chat models in 2023–25; newer approaches (DPO, RLAIF, constitutional AI) extend or replace parts of it.
A separate model that runs alongside the main generator to detect and block specific kinds of outputs — adult content, violence, hate, CSAM, real-person likenesses. Different platforms run different filters; the philosophy differs across the industry.
The algorithm that controls how the diffusion model steps from noise to output. Names you will see: Euler, Euler a, DPM++, DPM++ 2M Karras, UniPC. Different samplers produce slightly different outputs at the same prompt and seed. The default sampler is usually fine; experiment if you want to dial in a specific look.
A number that controls the random starting noise for an image generation. The same prompt with the same seed produces (approximately) the same image on the same model. Useful for reproducibility — save the seed of an image you like and you can regenerate variants from the same starting point.
The number of denoising iterations the model runs from pure noise to final output. More steps = generally higher quality, but with diminishing returns past 30 to 40 on most modern samplers. 20 to 30 is the typical sweet spot.
The instructions given to a chat model that set its persona, behavior, and constraints. The model receives the system prompt before the user's messages; it shapes everything that follows. Persona cards in AI character platforms are an elaborated form of system prompt.
Text-to-image and text-to-video. The generation modes where you provide only a prompt. Contrast with I2I (image-to-image) and I2V (image-to-video), where you also provide a starting image.
The unit of text a language model processes. Roughly one token equals 0.75 English words. Pricing on most chat APIs is per-token, in and out. Context windows are measured in tokens. The platform's in-product currency is sometimes also called tokens but is unrelated — Charmloop calls its in-product currency "charms" specifically to avoid this confusion.
A model or algorithm that increases the resolution of an image. AI upscalers (ESRGAN, SwinIR, and various commercial versions) intelligently add detail rather than just stretching pixels. Most image generators max out around 1024x1024 native and upscale from there; the upscale step matters more than the native maximum.
In chat and image generation contexts, "uncensored" usually means the platform does not run a broad adult-content filter. It does not mean "no rules" — every legitimate platform that uses this term still blocks CSAM, non-consensual deepfakes, and other clearly-illegal categories. See the ai chat without filter guide for the longer treatment of what the term actually denotes in 2026.
The component of a diffusion model that translates between pixel space (what you see) and latent space (where the model thinks). A different VAE can produce subtly different colors and details from the same checkpoint. Usually you do not touch the VAE; sometimes a specific model recommends a paired VAE.
A database that stores text or images as high-dimensional vectors, allowing similarity search. The backbone of most RAG systems and many AI memory implementations. Pinecone, Weaviate, Qdrant, Chroma are common names.
The learned parameters of a trained model — billions of numbers that encode what the model knows. Used interchangeably with "model" in casual speech. "Open-weight" models are ones where the weights are publicly downloadable; "closed-weight" or "proprietary" models keep the weights private.
The model performing a task it was never explicitly trained on, based purely on the prompt. Modern chat and image models are extensively zero-shot capable — you can describe a novel character or scenario and the model handles it without prior training on that exact thing.
That is the practical glossary. If a term you ran into is missing, the honest guide to choosing an AI image generator and the AI image prompts guide cover the most common ones in workflow context. For chat-specific terminology around memory and continuity, the AI chat with memory guide goes deeper on the practical side. And for the umbrella technical concepts that span both, the LoRA guide is the closest deep-dive on the adapter side of model customization.
The vocabulary moves fast; some of these terms will be outdated in twelve months and there will be five new ones to know. The shape of what each describes — a knob on the model, a piece of the pipeline, a category of content, a workflow pattern — stays stable even as the names rotate.