How to Use Ovis Image - Text-to-Image Quickstart and Tips

Fast start with Diffusers (2 steps)

Official PyTorch script (repo workflow)

Prompt and sampling tips (for crisp text)

Try it without setup

Hardware and model notes

Install the Diffusers build that includes Ovis-Image support:

pip install git+https://github.com/DoctorKey/diffusers.git@ovis-image

Run the minimal Python snippet:

import torch
from diffusers import OvisImagePipeline

pipe = OvisImagePipeline.from_pretrained(
    "AIDC-AI/Ovis-Image-7B",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

prompt = (
    'A creative 3D artistic render where the text "OVIS-IMAGE" '
    "is written in a bold, expressive handwritten brush style using thick, wet oil paint. "
    "The paint is a mix of vibrant rainbow colors swirling together like toothpaste. "
    "The background is a clean artist's canvas with soft shadows and glossy texture, 4k detail."
)

image = pipe(prompt, negative_prompt="", num_inference_steps=50, true_cfg_scale=5.0).images[0]
image.save("ovis_image.png")

Official PyTorch script (repo workflow)

This follows the commands published by the authors.

git clone https://github.com/AIDC-AI/Ovis-Image.git
conda create -n ovis-image python=3.10 -y
conda activate ovis-image
cd Ovis-Image
pip install -r requirements.txt
pip install -e .

python ovis_image/test.py \
  --model_path AIDC-AI/Ovis-Image-7B/ovis_image.safetensors \
  --vae_path AIDC-AI/Ovis-Image-7B/ae.safetensors \
  --ovis_path AIDC-AI/Ovis-Image-7B/Ovis2.5-2B \
  --image_size 1024 \
  --denoising_steps 50 \
  --cfg_scale 5.0 \
  --prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The background is a clean artist's canvas with soft shadows, 4k detail."

Prompt and sampling tips (for crisp text)

Keep 45-50 denoising steps for clean edges; you can drop to 35+ for speed but text sharpness may soften.

true_cfg_scale around 5.0 balances adherence and natural textures.

For text-heavy layouts (posters, banners, UI mockups), describe font weight, material, and background simplicity explicitly.

Stick to 1024-resolution outputs when you want the best legibility; wider aspect ratios also work but may need higher step counts.

Hardware and model notes

The model uses BF16 and was profiled on single high-end GPUs; plan for a modern GPU with plenty of VRAM for fastest results.

Ovis-Image prioritizes bilingual text rendering quality while keeping a compact 7B parameter budget, so it is a good fit for posters, UI comps, and any scene where words must remain legible.

How to Use Ovis Image - Text-to-Image Quickstart and Tips

Table of Contents

Fast start with Diffusers (2 steps)

Official PyTorch script (repo workflow)

Prompt and sampling tips (for crisp text)

Try it without setup

Hardware and model notes