Ovis-Image is a 7B text-to-image model built on the Ovis 2.5 multimodal backbone. It is tuned for clear, accurate text inside images while staying light enough for a single high-end GPU.
Fast start with Diffusers (2 steps)
- Install the Diffusers build that includes Ovis-Image support:
pip install git+https://github.com/DoctorKey/diffusers.git@ovis-image- Run the minimal Python snippet:
import torch
from diffusers import OvisImagePipeline
pipe = OvisImagePipeline.from_pretrained(
"AIDC-AI/Ovis-Image-7B",
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
prompt = (
'A creative 3D artistic render where the text "OVIS-IMAGE" '
"is written in a bold, expressive handwritten brush style using thick, wet oil paint. "
"The paint is a mix of vibrant rainbow colors swirling together like toothpaste. "
"The background is a clean artist's canvas with soft shadows and glossy texture, 4k detail."
)
image = pipe(prompt, negative_prompt="", num_inference_steps=50, true_cfg_scale=5.0).images[0]
image.save("ovis_image.png")Official PyTorch script (repo workflow)
This follows the commands published by the authors.
git clone https://github.com/AIDC-AI/Ovis-Image.git
conda create -n ovis-image python=3.10 -y
conda activate ovis-image
cd Ovis-Image
pip install -r requirements.txt
pip install -e .
python ovis_image/test.py \
--model_path AIDC-AI/Ovis-Image-7B/ovis_image.safetensors \
--vae_path AIDC-AI/Ovis-Image-7B/ae.safetensors \
--ovis_path AIDC-AI/Ovis-Image-7B/Ovis2.5-2B \
--image_size 1024 \
--denoising_steps 50 \
--cfg_scale 5.0 \
--prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The background is a clean artist's canvas with soft shadows, 4k detail."Prompt and sampling tips (for crisp text)
- Keep 45-50 denoising steps for clean edges; you can drop to 35+ for speed but text sharpness may soften.
true_cfg_scalearound 5.0 balances adherence and natural textures.- For text-heavy layouts (posters, banners, UI mockups), describe font weight, material, and background simplicity explicitly.
- Stick to 1024-resolution outputs when you want the best legibility; wider aspect ratios also work but may need higher step counts.
Try it without setup
- Hugging Face Space:
AIDC-AI/Ovis-Image-7Blets you generate in the browser. - FAL playground:
fal-ai/ovis-imageprovides hosted inference with a simple form input.
Hardware and model notes
- The model uses BF16 and was profiled on single high-end GPUs; plan for a modern GPU with plenty of VRAM for fastest results.
- Ovis-Image prioritizes bilingual text rendering quality while keeping a compact 7B parameter budget, so it is a good fit for posters, UI comps, and any scene where words must remain legible.