Introduction
CogKit is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's CogView (image generation) and CogVideoX (video generation) models. It streamlines multimodal tasks such as text-to-image (T2I), text-to-video (T2V), and image-to-video (I2V). Users must comply with legal and ethical guidelines to ensure responsible implementation.
Supported Models
Please refer to the Model Card for more details.
Environment Testing
This repository has been tested in environments with 1×A100
and 8×A100
GPUs, using CUDA 12.4, Python 3.10.16
.
- Cog series models typically do not support
FP16
precision (OnlyCogVideoX-2B
support); GPUs like theV100
cannot be fine-tuned properly (Will causeloss=nan
for example). At a minimum, anA100
or other GPUs supportingBF16
precision should be used. - We have not yet systematically tested the minimum GPU memory requirements for each model. For
LORA(bs=1 with offload)
, a singleA100
GPU is sufficient. ForSFT
, our tests have passed in an8×A100
environment.