From a947459bdad6e00a9169f4b97ce7c03ff161af3a Mon Sep 17 00:00:00 2001 From: Artiprocher Date: Thu, 7 Aug 2025 16:32:01 +0800 Subject: [PATCH] refine README --- examples/qwen_image/README.md | 1 + examples/qwen_image/README_zh.md | 1 + 2 files changed, 2 insertions(+) diff --git a/examples/qwen_image/README.md b/examples/qwen_image/README.md index cff180c..5509539 100644 --- a/examples/qwen_image/README.md +++ b/examples/qwen_image/README.md @@ -164,6 +164,7 @@ After enabling VRAM management, the framework will automatically choose a memory * `vram_limit`: VRAM usage limit in GB. By default, it uses all free VRAM on the device. Note that this is not a strict limit. If the set limit is too low but actual free VRAM is enough, the model will run with minimal VRAM use. Set it to 0 for the smallest possible VRAM use. * `vram_buffer`: VRAM buffer size in GB. Default is 0.5GB. A buffer is needed because large network layers may use more VRAM than expected during loading. The best value is the VRAM size of the largest model layer. * `num_persistent_param_in_dit`: Number of parameters to keep in VRAM in the DiT model. Default is no limit. This option will be removed in the future. Do not rely on it. +* `enable_dit_fp8_computation`: Whether to enable FP8 computation in the DiT model. This is only applicable to GPUs that support FP8 operations (e.g., H200, etc.). Disabled by default. diff --git a/examples/qwen_image/README_zh.md b/examples/qwen_image/README_zh.md index 9af0efd..f8ea52a 100644 --- a/examples/qwen_image/README_zh.md +++ b/examples/qwen_image/README_zh.md @@ -164,6 +164,7 @@ FP8 量化能够大幅度减少显存占用,但不会加速,部分模型在 * `vram_limit`: 显存占用量限制(GB),默认占用设备上的剩余显存。注意这不是一个绝对限制,当设置的显存不足以支持模型进行推理,但实际可用显存足够时,将会以最小化显存占用的形式进行推理。将其设置为0时,将会实现理论最小显存占用。 * `vram_buffer`: 显存缓冲区大小(GB),默认为 0.5GB。由于部分较大的神经网络层在 onload 阶段会不可控地占用更多显存,因此一个显存缓冲区是必要的,理论上的最优值为模型中最大的层所占的显存。 * `num_persistent_param_in_dit`: DiT 模型中常驻显存的参数数量(个),默认为无限制。我们将会在未来删除这个参数,请不要依赖这个参数。 +* `enable_dit_fp8_computation`: 是否启用 DiT 模型中的 FP8 计算,仅适用于支持 FP8 运算的 GPU(例如 H200 等),默认不启用。