# Image Generation

Use local or cloud-based Stable Diffusion, FLUX or DALL-E APIs to generate images.

Automatically generate images as replies to your messages for full immersion, generate from chat history and character information from the wand menu or slash commands, or use the /sd (anything_here) command in the chat input bar to make an image with your own prompt.

Most common Stable Diffusion generation settings are customizable within the SillyTavern UI.

Supports multiple image generation sources, both local and cloud-based
Various generation modes for characters, scenes, and custom prompts
Slash commands for easy image generation within chats
Interactive mode to trigger image generation based on natural language requests
Customizable prompt templates and prefixes for consistent style and quality
Character-specific prompt prefixes for tailored character images
Style presets to quickly switch between different image generation settings
Flexible visibility options for generated images in chat
Advanced ComfyUI integration for highly customizable workflows
Ability to view all generated images in a character gallery
Image swipes feature to regenerate images while keeping the same prompt
Options to edit prompts before generation and extend free-mode prompts
Integration with AI function calling for automatic image generation detection

# Supported sources

Source	Remarks
Block Entropy	All services will be discontinued on December 31, 2024
ComfyUI	Local, open source (GPL3), free of charge, see ComfyUI Configuration.
Draw Things	Local, Mac/iOS, free of charge
HuggingFace Serverless	Cloud, free of charge
NovelAI Diffusion	Cloud, requires an active subscription
OpenAI DALL-E 2/3	Cloud, requires an active subscription
Pollinations	Cloud, open source (MIT), free of charge
SD.Next / vladmandic	Local, open source (AGPL3), free of charge
SillyTavern Extras	Deprecated, not recommended
Stability AI	Cloud, paid
Stable Diffusion WebUI / AUTOMATIC1111	Local, open source (AGPL3), free of charge
Stable Horde	Cloud, open source (AGPL3), free of charge
TogetherAI	Cloud

# Generation modes

Wand menu item	Slash command argument	Description	Remarks
"Yourself"	`you`	A full-body portrait of the current character.	-
"Your Face"	`face`	A close-up portrait of the current character.	Forces a portrait aspect ratio.
"Me"	`me`	A portrait of the user persona.	-
"The Whole Story"	`scene`	A visual recap of the chat events.	-
"The Last Message"	`last`	A visual recap of the last chat message.	-
"Raw Last Message"	`raw_last`	Last message used as a prompt verbatim.	-
"Background"	`background`	A chat background based on story context.	Forces a wide landscape aspect ratio.

# How to generate an image

Use the "Image Generation" item in the extensions context menu (wand).
Type a /sd (argument) slash command with an argument from the Generation modes table. Anything else would trigger a "free mode" to make SD generate whatever you prompted. Example: /sd apple tree would generate a picture of an apple tree.
Look for a paintbrush icon in the context actions for chat messages. This will force the "Raw Message" mode for the selected message.

Every generation mode besides raw message and free mode will trigger a prompt generation using your currently selected main generation API to convert chat context into the SD prompt. You can configure the instruction template for generating prompts for every generation mode using the "SD Prompt Templates" settings drawer in the extensions panel.

# Tips and tricks for the `/sd` command usage

# View all generated images

To view all saved images for a character (including other chats), open a gallery from a "More..." dropdown menu on a character info panel, or use a /show-gallery slash command.

# Specify a negative prompt

Use a negative named argument before the prompt to enforce a specific negative prompt for this generation.

stscript
/sd negative="fries" cute tater farmer holding a tayto in a spud-field

# Include a character-specific prefix

Use a special {{charPrefix}} macro in free-prompt mode to include positive and negative prompt prefixes (if defined) for the current character.

stscript
/sd {{charPrefix}}, riding a bike

# Suppress a chat message

You can avoid posting a generated image into the chat by passing a quiet=true named argument. The image will still be added into the character gallery, and the command will produce a relative URL to the image that can be consumed by other commands.

The example below will send the generated image using Markdown as a user persona.

stscript
/sd quiet=true me | /send Here's a picture of me: ![my portrait]({{pipe}})

# Image swipes

Images swipes allow to reroll the image generation while keeping the same prompt. If a fixed seed is set, it will be randomized for the next generation.

To cycle through images, hover a mouse cursor (tap on mobile) over a generated image to reveal arrow buttons and swipes counter. Tapping right arrow on the latest image will generate a new one.

'Swipes' here is just a name, don't try the actual swiping gesture, as this will regenerate the message itself, not the attached image.

# Options

# Edit prompts before generation

Allow to edit the automatically generated prompts manually before sending them to the Stable Diffusion API.

# Use function tool

Uses function calling to automatically detect the intention to generate an image.

Requirements:

Must have image generation configured with a supported source.
Must use a supported Chat Completion API model and have function tool calling enabled in the AI Response settings.
The "Use function tool" option must be enabled in the Image Generation settings.
The user should express an intent to generate an image in the chat message, e.g. "Send me a picture of a cat".

The interactive mode will not trigger when the function tool is enabled.

# Use interactive mode

Allows to trigger an image generation instead of text as a reply to a user message that follows the special pattern:

Contains one of the following verbs: send, mail, imagine, generate, make, create, draw, paint, render
Followed by one of the following nouns (not further than 10 characters away): pic, picture, image, drawing, painting, photo, photograph
Followed by a target subject of image generation, could be optionally preceded by phrases like "of a" or "of this".

Examples of valid requests and captured subjects:

Can you please send me a picture of a cat => cat
Generate a picture of the Eiffel tower => Eiffel tower
Let's draw a painting of Mona Lisa => Mona Lisa

Some special subjects trigger a predefined generation mode:

'you, 'yourself' => "Yourself"
'your face', 'your portrait', 'your selfie' => "Your Face"
'me', 'myself' => "Me"
'story', 'scenario', 'whole story' => "The Whole Story"
'last message' => "The Last Message"
'background', 'scene background', 'scene', 'scenery', 'surroundings', 'environment' => "Background"

# Extend free-mode prompts

When using the interactive mode of the slash command, automatically extend free-mode generation subject descriptions by prompting your main API.

# Snap auto-adjusted resolutions

Snap image generation requests with a forced aspect ratio (portraits, backgrounds) to the nearest known resolution, while trying to preserve the absolute pixel counts. Refer to the "Resolution" dropdown for the list of possible options.

Recommended for SDXL models.

# Common prompt prefix

Added before every generated or free-mode prompt. Commonly used for setting the overall style of the picture.

Example: best quality, anime lineart.

Pro tip: Use {prompt} macro to specify where exactly the generated prompt will be inserted.

# Negative prompt

Characteristics of the image you don't want to be present in the output.

Example: bad quality, watermark.

# Character-specific prompt prefix

Any characteristics that describe the currently selected character. Will be added after a common prefix.

Example: female, green eyes, brown hair, pink shirt.

You can also specify a negative prompt prefix for any unwanted content. It will be combined with the general negative prompt.

Limitations:

Works only in 1-to-1 chats. Will not be used in groups.
Won't be used for backgrounds and free mode generations.

If you want to share the prefixes with others, tick the "Shareable" checkbox. This will save them with the character data, rather than your local settings.

Pro tip: If supported by the generation source, you can also use LoRAs/embeddings here, for example: <lora:DonaldDuck:1>.

# Styles

Use this to quickly save and restore your favorite style/quality presets to use them later or when switching between models. The following is included in the Style preset:

Common Prompt Prefix
Negative Prompt

You can also switch between styles using the /imagine-style command (or /sd-style or /img-style).

# Chat Message Visibility

Generated images inserted into the chat are hidden in the main API prompts by default, but this can be overriden individually per generation initiator ("Magic wand" icon, slash command, interactive mode). This can be used for making the experience more immersive by letting the characters "acknowledge" the images. Multimodal models in Chat Completions API may also 'see' the images if "Send inline images" is enabled.

A text message can be customized by changing the "Chat Message Template" under Image Prompt Templates. All regular macros can be used in this template, plus a special {{prompt}} macro to specify where the image prompt will be added.

# ComfyUI Configuration

ComfyUI is a fast and very flexible option for image generation.

If you're familiar with ComfyUI, the tl;dr is: make your workflow in ComfyUI, download it in API format, and paste it into the SillyTavern ComfyUI Workflow Editor. ST will submit your workflow to ComfyUI's API and you will get an image in your chat. But with great power comes great responsibility, and the main responsibility is inserting placeholders in your workflow JSON so you can change settings from SillyTavern.

If you're not familiar with ComfyUI, you can still use it to generate images in SillyTavern using the default workflow. Later, when you want great power, you can learn how to use ComfyUI...

# Controls

This panel allows you to configure and manage your ComfyUI integration with SillyTavern.

Enter the URL of your ComfyUI server in the ComfyUI URL input field. The default value is http://127.0.0.1:8188. If you are using SwarmUI, the default port for the managed ComfyUI server is 7821, 20 ports higher than the default port for SwarmUI.

After entering the URL, choose Connect to validate and establish a connection. The ComfyUI server must be accessible from the SillyTavern host machine.

# Workflow Management

Select a ComfyUI workflow from the dropdown menu. Two default workflows are provided:

Default_Comfy_Workflow.json: A basic text-to-image workflow supporting the most common image generation settings.
Char_Avatar_Comfy_Workflow.json: A sample image-to-image workflow that uses the character avatar, plus the prompt, to generate an image.

Use the following buttons to manage your workflows:

Open workflow editor to view and modify the selected workflow.
Create new workflow to create a new workflow with a custom name.
Delete workflow to remove the selected workflow.

# Workflow Editor

The ComfyUI Workflow Editor allows you to view and modify ComfyUI workflows for use with SillyTavern.

The main component of the editor is a large text area where you can insert or edit your ComfyUI workflow in JSON format.

To add a ComfyUI workflow to the editor, follow these steps:

Enable 'Dev Mode' in ComfyUI settings.
Use the 'Save (API Format)' option in ComfyUI to download the JSON data.
Create a new workflow in SillyTavern and open the editor.
Paste the downloaded JSON data into the text area.
Replace specific values with placeholders as needed for your use case.

Tips

You can add the API-format JSON file directly to the data/default-user/user/workflows directory in your SillyTavern installation. This will save you from steps 3 and 4.

Retain the original JSON file. If you need to open the workflow again in ComfyUI to make changes, it is much more convenient to edit the original file than the one with all the placeholders.

# Placeholders

The editor provides a list of predefined placeholders that can be used in your workflow JSON. These placeholders are replaced with dynamic values when the workflow is executed in SillyTavern.

Placeholders marked with ✅ are present in your workflow JSON. Placeholders marked with ❌ are not present in your workflow JSON. You can add these placeholders to your workflow JSON as needed. You do not need to add all the placeholders, only the ones that your workflow uses and you want to replace dynamically.

# Prompts

The %prompt% and %negative_prompt% placeholders are used to insert the image generation prompts into the workflow. These contain the final prompts generated by SillyTavern, including the generated prompt for your chosen /sd mode, the common prompt prefix, negative prompt, and character-specific prompt prefix.

For example, you may have tested your workflow with a prompt like "forest elf" in ComfyUI. To use this workflow in SillyTavern, you can replace the "forest elf" prompt with the %prompt% placeholder:

{
    "class_type": "CLIPTextEncode",
    "inputs": {
        "clip": ["4", 1],
        "text": "%prompt%"
    }
}

{
    "class_type": "CLIPTextEncode",
    "inputs": {
        "clip": ["4", 1],
        "text": "forest elf"
    }
}

Notice that the placeholder is wrapped in double quotes. This is important for the JSON format, and required by SillyTavern's placeholder replacement system. Even for numbers, you must use double quotes in the template JSON.

Sometimes the prompt (or other value) doesn't appear where you might expect. ComfyUI will remove nodes from the API version of the workflow if they are not necessary for the workflow to function in API mode.

For instance, this workflow uses a LoRA tag loader node with a prompt primitive so the workflow is clearer in UI mode:

The prompt primitive node will be removed from the API version of the workflow, so you insert the placeholder in the LoraTagLoader node. Find the text "apple tree" in the workflow and replace it with the %prompt% placeholder:

{
    "inputs": {
      "text": "%prompt%",
      "model": ["112", 0],
      "clip": ["112", 1]
    },
    "class_type": "LoraTagLoader",
    "_meta": {"title": "Load LoRA Tag"}
}

{
    "inputs": {
      "text": "apple tree",
      "model": ["112", 0],
      "clip": ["112", 1]
    },
    "class_type": "LoraTagLoader",
    "_meta": {"title": "Load LoRA Tag"}
}

In some cases you may need to make several replacements in the workflow JSON, even if the prompt appears only once in the UI.

# Model

The %model% placeholder will insert the value of the selected model in the image generation settings.

An example from the default text-to-image workflow:

{
    "class_type": "CheckpointLoaderSimple",
    "inputs": {
        "ckpt_name": "%model%"
    }
}

{
    "class_type": "CheckpointLoaderSimple",
    "inputs": {
        "ckpt_name": "sd15.safetensors"
    }
}

To load GGUF-quantized UNets, use a UNet Loader (GGUF) node in your workflow, choose a GGUF model in the SillyTavern model dropdown, and use the %model% placeholder in the node's settings like this:

{
    "inputs": {
      "unet_name": "%model%"
    },
    "class_type": "UnetLoaderGGUF",
    "_meta": {
      "title": "Unet Loader (GGUF)"
    }
}

{
    "inputs": {
      "unet_name": "flux1-dev-Q4_0.gguf"
    },
    "class_type": "UnetLoaderGGUF",
    "_meta": {
      "title": "Unet Loader (GGUF)"
    }
}

If you have model types other than the usual SD checkpoints in ComfyUI

Stable Diffusion checkpoints, SD UNets, and GGUF-quantized UNets all appear in the Model dropdown. Models of one type will not work with workflows/loader nodes expecting another type. If you choose an incompatible model type in ST, ComfyUI will report a problem with the loader node.

# Avatar images

Use the %user_avatar% and %char_avatar% placeholders to include the user and character avatars in the workflow. These placeholders are replaced with the PNG data of the avatars when the workflow is executed. The image data is encoded in base64 format, so you must decode it in your workflow. A popular choice for this task is the Load image (Base64) node.

In this example, the character avatar is loaded with a Load Image (Base64) node. It also uses an Image Resize node to rescale the image to whatever size is specified in the image generation settings:

Load image from base64 string and resize

Insert the %char_avatar%, %width%, and %height% placeholders into the JSON for the Load Image (Base64) and Image Resize nodes:

{
    "97": {
        "inputs": {
            "image": "%char_avatar%"
        },
        "class_type": "ETN_LoadImageBase64",
        "_meta": {"title": "Load Image (Base64)"}
    },
    "98": {
        "inputs": {
            "mode": "resize",
            "resize_width": "%width%",
            "resize_height": "%height%",
            "image": ["97", 0]
        },
        "class_type": "Image Resize",
        "_meta": {"title": "Resize image"}
    }
}

To get a base64-encoded image string for testing your workflow in ComfyUI, use any online tool that converts images to base64 strings. Here's an example string you can use for initial testing: sd-comfy-base64-test-string.txt.

# Other placeholders

Most other placeholders use the values of the corresponding controls in image generation settings, or the values that you specify with the /sd command:

%vae%, but most SD models include a VAE so the default workflows do not use this placeholder. Use it with custom workflows to load a VAE alongside a UNet, override the default VAE, etc.
%sampler%
%scheduler%
%steps%
%scale%
%width%
%height%
%denoise%: for the sample image-to-image workflow, vary the denoise amount between about 0.5 (barely-noticeable changes to the source image) and 1.0 (a completely different image as if no source image was used). Not used by the default text-to-image workflow because there's no point using a value other than 1.0 for text-to-image.
%clip_skip%: not used by the default workflows but available for custom workflows.

The %seed% placeholder will insert the seed value from the control if you have specified one. If you set the seed to -1, SillyTavern will generate a new random seed for each image in %seed%.

# Custom placeholders

You can add custom placeholders to your workflow:

Look for the "Custom" section below the predefined placeholders.
Click the "+" button to add a new custom placeholder.
Enter a name for the placeholder in the find field.
Enter the value that you want to replace the placeholder with in the replace field.

Custom placeholders will appear in a separate list below the predefined ones.

For example, you could replace the "SillyTavern" prefix for saved image filenames in the default workflow with a custom placeholder. Add a new custom placeholder with find set to filename_prefix and replace set to ServiceTesnor. Insert the new %filename_prefix% placeholder into your workflow JSON. Now you can change the filename prefix from SillyTavern to ServiceTesnor by changing the value of the custom placeholder.

{
    "class_type": "SaveImage",
    "inputs": {
        "filename_prefix": "%filename_prefix%",
        "images": ["8", 0]
    }
}

{
    "class_type": "SaveImage",
    "inputs": {
        "filename_prefix": "SillyTavern",
        "images": ["8", 0]
    }
}

# Comfy tricks

Read all the general information on this page so you're familiar with the image generation options. Options such as switchable styles and common prompt prefixes, when combined wih the total flexibility of ComfyUI workflows, allow you to create a wide variety of image generation setups.

# Loading LoRAs

Use a LoRA tag loader node (such as Load LoRA Tag) to load any LoRAs specified in the prompt. Now you can add as many LoRAs as you like to your prompt with tags like <lora:CroissantStyle:0.8>, and they will be loaded into your workflow. This will also make the "pro-tip" of using LoRAs in character-specific prompt prefixes work with ComfyUI.

# Setting workflow values from styles or slash-commands

You can use macros in custom placeholder values. As a practical example, let's say you sometimes want to generate images without a background, and you'd like this to be switchable with a slash-command or image style. Here's how you could do it:

Make a ComfyUI workflow that removes the image background, or not, depending on the value of an input
Use a custom placeholder to set the value of that input, but use {{getvar::remove_background}} as the replace value
Now you can set the value of remove_background with /setvar key=remove_background true or /setvar key=remove_background false before generating an image
The workflow will use the value you set to determine whether to remove the background
Make an image style "No background" with common prompt prefix {{setvar::remove_background::true}}
Use the style control or /imagine-style No background to set the value of remove_background to true before generating an image