Ovis-Image 是基于 Ovis 2.5 多模态骨干打造的 7B 文生图模型,主打 图片内文字清晰可读,单卡高端 GPU 就能跑通。
2 步快速体验(Diffusers)
- 安装包含 Ovis-Image 支持的 Diffusers 分支:
pip install git+https://github.com/DoctorKey/diffusers.git@ovis-image- 直接运行示例代码:
import torch
from diffusers import OvisImagePipeline
pipe = OvisImagePipeline.from_pretrained(
"AIDC-AI/Ovis-Image-7B",
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
prompt = (
'A creative 3D artistic render where the text "OVIS-IMAGE" '
"is written in a bold, expressive handwritten brush style using thick, wet oil paint. "
"The paint is a mix of vibrant rainbow colors swirling together like toothpaste. "
"The background is a clean artist's canvas with soft shadows and glossy texture, 4k detail."
)
image = pipe(prompt, negative_prompt="", num_inference_steps=50, true_cfg_scale=5.0).images[0]
image.save("ovis_image.png")官方 PyTorch 脚本跑法
按作者发布的命令即可跑通:
git clone https://github.com/AIDC-AI/Ovis-Image.git
conda create -n ovis-image python=3.10 -y
conda activate ovis-image
cd Ovis-Image
pip install -r requirements.txt
pip install -e .
python ovis_image/test.py \
--model_path AIDC-AI/Ovis-Image-7B/ovis_image.safetensors \
--vae_path AIDC-AI/Ovis-Image-7B/ae.safetensors \
--ovis_path AIDC-AI/Ovis-Image-7B/Ovis2.5-2B \
--image_size 1024 \
--denoising_steps 50 \
--cfg_scale 5.0 \
--prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The background is a clean artist's canvas with soft shadows, 4k detail."提示词与采样小贴士(让文字更锐利)
- 去噪步数保持在 45-50 步,想提速可降到 35+,但文字边缘会略软。
true_cfg_scale约 5.0 通常最稳,既听话又不假。- 文本密集的版式(海报、横幅、UI)要清楚写出字体粗细、材质和背景简洁度。
- 追求清晰度时优先 1024 分辨率;宽幅比例可用,但可能需要更高步数。
不想本地装环境?可以在线试
- Hugging Face Space:
AIDC-AI/Ovis-Image-7B,浏览器直接生成。 - FAL Playground:
fal-ai/ovis-image,表单输入即可出图。
硬件与模型要点
- 模型使用 BF16,设计为单卡高端 GPU 运行,建议准备充足显存以获得最快速度。
- Ovis-Image 以双语文字渲染为核心能力,7B 体量适合做海报、UI 草稿、需要字清晰可读的场景。