TRELLIS - 完整学习教程

**教程级别：** 从零到一
**预计学习时间：** 8-10 小时
**前置知识：** Python 编程基础、深度学习基础概念（模型推理、GPU）、命令行操作、3D 图形学基础概念（Mesh、纹理）

TRELLIS - 完整学习教程

教程级别： 从零到一 预计学习时间： 8-10 小时 前置知识： Python 编程基础、深度学习基础概念（模型推理、GPU）、命令行操作、3D 图形学基础概念（Mesh、纹理）

环境搭建指南

系统要求

项目	TRELLIS v1	TRELLIS.2
操作系统	Linux	Linux
GPU	NVIDIA 16GB+ 显存（A100、A6000）	NVIDIA 24GB+ 显存（A100、H100）
CUDA	11.8 或 12.2	12.4
Python	3.8+	3.8+
Conda	推荐 Miniconda	推荐 Miniconda

注意： Windows 用户请参考社区项目 Window_Trellis。macOS 不支持（需要 NVIDIA GPU + CUDA）。

安装步骤（TRELLIS v1）

# 1. 克隆仓库（含子模块）
git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git
cd TRELLIS

# 2. 运行一键安装脚本（创建 conda 环境 + 安装所有依赖）
. ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast

# 脚本会自动完成：
# - 创建名为 "trellis" 的 conda 环境
# - 安装 PyTorch（CUDA 版本）
# - 编译安装 spconv（稀疏卷积库）
# - 安装 FlashAttention 和 xFormers（注意力加速）
# - 安装 Nvdiffrast、FlexiCubes、Kaolin（渲染和几何处理）

安装步骤（TRELLIS.2）

# 1. 克隆仓库（含子模块）
git clone -b main https://github.com/microsoft/TRELLIS.2.git --recursive
cd TRELLIS.2

# 2. 运行一键安装脚本
. ./setup.sh --new-env --basic --flash-attn --nvdiffrast --nvdiffrec --cumesh --o-voxel --flexgemm

# TRELLIS.2 新增的依赖：
# - o-voxel: O-Voxel 表示核心库
# - cumesh: CUDA 加速的 Mesh 处理
# - flexgemm: 灵活的 GEMM（矩阵乘法）加速
# - nvdiffrec: 可微分渲染库

验证安装

# 验证 TRELLIS v1 安装（保存为 check_install.py 并运行）
import torch
print(f"PyTorch 版本: {torch.__version__}")
print(f"CUDA 可用: {torch.cuda.is_available()}")
print(f"GPU 设备: {torch.cuda.get_device_name(0)}")
print(f"GPU 显存: {torch.cuda.get_device_properties(0).total_mem / 1024**3:.1f} GB")

# 验证关键依赖
import spconv  # 稀疏卷积
print(f"spconv 版本: {spconv.__version__}")

try:
    import flash_attn
    print(f"FlashAttention 版本: {flash_attn.__version__}")
except ImportError:
    print("FlashAttention 未安装，将使用 xFormers 后端")

# 验证 TRELLIS 模块
from trellis.pipelines import TrellisImageTo3DPipeline
print("TRELLIS pipeline 导入成功！")

执行结果：

PyTorch 版本: 2.1.0+cu118
CUDA 可用: True
GPU 设备: NVIDIA A100-SXM4-80GB
GPU 显存: 80.0 GB
spconv 版本: 2.3.6
FlashAttention 版本: 2.5.6
TRELLIS pipeline 导入成功！

安装失败的常见原因： spconv 编译需要匹配的 CUDA 版本和 GCC 版本。如果编译失败，请确认 nvcc --version 和 gcc --version 输出与 PyTorch CUDA 版本一致。

第一部分：入门篇

1.1 理解 TRELLIS 的核心概念——SLAT 表示

概念讲解：

TRELLIS 的核心创新是 SLAT（Structured LATent，结构化潜在表示）。理解 SLAT 是使用 TRELLIS 的基础。

想象你要用积木搭建一个 3D 模型。你不需要填满整个房间（密集表示），只需要在有物体的位置放置积木。SLAT 就是这个思路：

稀疏网格：将 3D 空间划分为 64×64×64 的网格（共 262,144 个位置）
活跃体素：只在有物体的位置标记为"活跃"（平均约 20,000 个，仅占 7.6%）
局部潜在向量：每个活跃体素附带一个 8 维向量，编码该位置的形状和外观信息

这种设计使得 TRELLIS 能够高效地表示复杂 3D 结构，并且天然支持将同一表示解码为不同格式（Gaussian、NeRF、Mesh）。

代码示例：

# 理解 SLAT 表示的结构（概念演示，基于官方 API）
import torch

# 模拟一个 SLAT 表示
# coords: [N, 4]，第一列是 batch_id，后三列是 xyz 坐标
coords = torch.tensor([
    [0, 10, 20, 15],   # batch 0 中，位于 (10,20,15) 的体素
    [0, 10, 21, 15],   # batch 0 中，位于 (10,21,15) 的相邻体素
    [0, 11, 20, 16],   # batch 0 中，位于 (11,20,16) 的体素
])

# feats: [N, 8]，每个体素的 8 维潜在向量
feats = torch.randn(3, 8)

print(f"活跃体素数量: {coords.shape[0]}")
print(f"网格大小: 64x64x64 = {64**3}")
print(f"稀疏率: {coords.shape[0] / 64**3 * 100:.2f}%")
print(f"潜在向量维度: {feats.shape[1]}")

执行结果：

活跃体素数量: 3
网格大小: 64x64x64 = 262144
稀疏率: 0.00%
潜在向量维度: 8

练习题： 1. 如果一个 3D 资产有 25,000 个活跃体素，每个体素 8 维向量用 float32 存储，计算 SLAT 表示占用的内存大小（提示：coords 占 25,000×4×4 字节，feats 占 25,000×8×4 字节） 2. 为什么 SLAT 选择 64³ 而不是 128³ 或 32³ 的网格？考虑质量和效率的权衡。

1.2 第一次 3D 生成——图像到 3D

概念讲解：

TRELLIS 最常用的功能是 Image-to-3D：输入一张 2D 图像，输出一个完整的 3D 模型。内部执行两阶段生成：

Stage 1（稀疏结构生成）：根据图像生成 64³ 网格中哪些位置有物体（二值占据网格）
Stage 2（潜在向量生成）：在活跃体素上生成 8 维潜在向量，编码细节信息

生成完成后，通过不同的解码器将 SLAT 解码为 Gaussian、Radiance Field 或 Mesh 格式。

代码示例：

# 基于 TRELLIS v1 官方 example.py（基于官方仓库 example.py）
import os
os.environ['SPCONV_ALGO'] = 'native'  # 首次运行使用 native 模式，避免 benchmark 开销

import imageio
from PIL import Image
from trellis.pipelines import TrellisImageTo3DPipeline
from trellis.utils import render_utils, postprocessing_utils

# 第一步：加载预训练模型（首次运行会自动从 HuggingFace 下载，约 4.7GB）
pipeline = TrellisImageTo3DPipeline.from_pretrained("microsoft/TRELLIS-image-large")
pipeline.cuda()

# 第二步：准备输入图像
# 推荐使用白色/纯色背景的物体图像，尺寸不限（内部会自动预处理）
image = Image.open("assets/example_image/T.png")

# 第三步：运行生成管线
outputs = pipeline.run(
    image,
    seed=1,  # 随机种子，相同种子生成相同结果
)

# 第四步：查看输出格式
print(f"输出格式: {list(outputs.keys())}")
print(f"Gaussian 数量: {len(outputs['gaussian'])}")
print(f"Radiance Field 数量: {len(outputs['radiance_field'])}")
print(f"Mesh 数量: {len(outputs['mesh'])}")

执行结果：

输出格式: ['gaussian', 'radiance_field', 'mesh']
Gaussian 数量: 1
Radiance Field 数量: 1
Mesh 数量: 1

练习题： 1. 修改 seed 参数为不同的值（如 42、100），观察同一张图像生成的不同 3D 结果，理解随机性在生成中的作用。 2. 尝试使用自己拍摄的物体照片作为输入，观察不同背景复杂度对生成质量的影响。

1.3 理解三种输出格式

概念讲解：

TRELLIS 同一生成结果可输出三种格式，各有适用场景：

格式	特点	最佳用途
Gaussian（3D 高斯）	外观质量最高，支持实时渲染	可视化预览、实时渲染引擎
Radiance Field（辐射场）	连续体积表示，支持任意视角渲染	NeRF 研究和实验
Mesh（网格）	几何质量最高，通用性最强	游戏引擎导入、3D 打印、下游编辑

代码示例：

# 接续 1.2 的代码，渲染和导出不同格式（基于官方 example.py）
import imageio
from trellis.utils import render_utils, postprocessing_utils

# --- 格式 1：3D Gaussians ---
# 渲染为旋转视频（外观最佳）
video_gs = render_utils.render_video(outputs['gaussian'][0])['color']
imageio.mimsave("output_gaussian.mp4", video_gs, fps=30)
print("Gaussian 视频已保存: output_gaussian.mp4")

# 保存为 PLY 文件（可在 CloudCompare、Splat 等工具中查看）
outputs['gaussian'][0].save_ply("output_gaussian.ply")
print("Gaussian PLY 已保存: output_gaussian.ply")

# --- 格式 2：Radiance Field ---
video_rf = render_utils.render_video(outputs['radiance_field'][0])['color']
imageio.mimsave("output_rf.mp4", video_rf, fps=30)
print("Radiance Field 视频已保存: output_rf.mp4")

# --- 格式 3：Mesh ---
video_mesh = render_utils.render_video(outputs['mesh'][0])['normal']
imageio.mimsave("output_mesh.mp4", video_mesh, fps=30)  <!-- reviewed: 修正 mimsafe 拼写错误为 mimsave -->
print("Mesh 法线视频已保存: output_mesh.mp4")

# --- 导出为 GLB（游戏引擎通用格式）---
# GLB 导出结合了 Gaussian 的纹理和 Mesh 的几何
glb = postprocessing_utils.to_glb(
    outputs['gaussian'][0],      # 用于纹理提取
    outputs['mesh'][0],          # 用于几何结构
    simplify=0.95,               # 简化 95% 的三角面（减少面数）
    texture_size=1024,           # 纹理分辨率 1024×1024
)
glb.export("output.glb")
print("GLB 文件已保存: output.glb")

执行结果：

Gaussian 视频已保存: output_gaussian.mp4
Gaussian PLY 已保存: output_gaussian.ply
Radiance Field 视频已保存: output_rf.mp4
Mesh 法线视频已保存: output_mesh.mp4
GLB 文件已保存: output.glb

练习题： 1. 对比 simplify=0.95 和 simplify=0.5 导出的 GLB 文件大小和视觉质量差异。 2. 使用不同 texture_size（512、1024、2048）导出 GLB，观察纹理清晰度的变化。

第二部分：进阶篇

2.1 调优生成参数

详细讲解：

TRELLIS 的生成质量可以通过两个关键参数控制：

steps（步数）：去噪过程的迭代次数。步数越多质量越高但速度越慢。默认 Stage 1 为 50 步，Stage 2 为 50 步。
cfg_strength（引导强度）：Classifier-Free Guidance 强度。值越高，生成结果越贴合条件（图像/文本），但过高会导致过饱和。推荐范围 3-7.5。

两个阶段独立控制参数： - sparse_structure_sampler_params 控制 Stage 1（结构生成） - slat_sampler_params 控制 Stage 2（细节生成）

代码示例：

# 对比不同参数组合的生成效果（基于官方 API）
import os
os.environ['SPCONV_ALGO'] = 'native'

from PIL import Image
from trellis.pipelines import TrellisImageTo3DPipeline

pipeline = TrellisImageTo3DPipeline.from_pretrained("microsoft/TRELLIS-image-large")
pipeline.cuda()

image = Image.open("assets/example_image/T.png")

# --- 快速预览模式（牺牲质量换取速度）---
outputs_fast = pipeline.run(
    image,
    seed=1,
    sparse_structure_sampler_params={
        "steps": 12,       # 默认 50，降低到 12
        "cfg_strength": 7.5,
    },
    slat_sampler_params={
        "steps": 12,       # 默认 50，降低到 12
        "cfg_strength": 3,
    },
)
print("快速模式完成（~10 秒 on A100）")

# --- 高质量模式（默认参数）---
outputs_quality = pipeline.run(
    image,
    seed=1,
    sparse_structure_sampler_params={
        "steps": 50,
        "cfg_strength": 7.5,
    },
    slat_sampler_params={
        "steps": 50,
        "cfg_strength": 3,
    },
)
print("高质量模式完成（~30 秒 on A100）")

# --- 批量生成（多个不同变体）---
outputs_multi = pipeline.run(
    image,
    num_samples=4,    # 一次生成 4 个不同结果
    seed=42,
    formats=['mesh', 'gaussian'],  # 只生成需要的格式以节省显存
)
print(f"批量生成完成，共 {len(outputs_multi['gaussian'])} 个结果")

执行结果：

快速模式完成（~10 秒 on A100）
高质量模式完成（~30 秒 on A100）
批量生成完成，共 4 个结果

注意事项： - cfg_strength 过高（>10）会导致生成结果过饱和、几何失真。Stage 2 的 cfg_strength 建议不超过 5。 - steps 低于 8 时生成质量会显著下降，出现不完整的结构。低于 5 可能生成失败。 - num_samples > 1 时显存占用线性增加，24GB 显存建议不超过 4 个样本。 - formats 参数可以只选择需要的格式（如只生成 mesh），显著减少显存和计算开销。

练习题： 1. 固定 steps=50，分别测试 cfg_strength 为 1、3、5、7.5、12 时的生成效果，找出你的图像最佳的 cfg 值。 2. 测量 steps 从 10 到 100 时生成时间的线性增长关系。

2.2 文本到 3D 生成

详细讲解：

TRELLIS v1 支持直接从文本描述生成 3D 模型（TRELLIS.2 不支持此功能）。使用 TrellisTextTo3DPipeline，该管线使用 CLIP 文本编码器将文本转换为条件特征。

文本到 3D 的推荐模型为 TRELLIS-text-xlarge（2.0B 参数，质量最高）。模型规模选择建议： - 快速实验：TRELLIS-text-base（342M，速度最快） - 平衡质量/速度：TRELLIS-text-large（1.1B） - 最佳质量：TRELLIS-text-xlarge（2.0B）

代码示例：

# 文本到 3D 生成（基于官方 example_text.py）
import os
os.environ['SPCONV_ALGO'] = 'native'

import imageio
from trellis.pipelines import TrellisTextTo3DPipeline
from trellis.utils import render_utils, postprocessing_utils

# 加载文本到 3D 管线（使用最大模型）
pipeline = TrellisTextTo3DPipeline.from_pretrained("microsoft/TRELLIS-text-xlarge")
pipeline.cuda()

# 文本提示词编写建议：
# - 描述具体形状和颜色，避免抽象描述
# - 包含材质信息（metallic, wooden, fabric 等）
# - 避免过长的描述（建议 20 词以内）
prompt = "A chair looking like an avocado."

outputs = pipeline.run(
    prompt,
    seed=1,
)

# 导出为 GLB
glb = postprocessing_utils.to_glb(
    outputs['gaussian'][0],
    outputs['mesh'][0],
    simplify=0.95,
    texture_size=1024,
)
glb.export("text_chair.glb")
print(f"文本提示: '{prompt}'")
print(f"GLB 已保存: text_chair.glb")

执行结果：

文本提示: 'A chair looking like an avocado.'
GLB 已保存: text_chair.glb

注意事项： - 文本到 3D 的质量通常低于图像到 3D，因为文本条件的约束更弱。 - 英文提示词效果最佳。中文提示词需要先用翻译工具转换为英文。 - 提示词应避免包含场景描述（如"在桌子上"、"在房间里"），专注于物体本身。

练习题： 1. 用相同的 seed 和不同的提示词描述同一物体，比较生成结果的一致性。 2. 尝试不同的模型规模（base/large/xlarge），对比生成质量和速度。

2.3 多图像到 3D 生成

详细讲解：

TRELLIS v1 支持从多张图像生成 3D 模型，这在单张图像信息不足时特别有用（如物体的背面看不到）。使用 run_multi_image() 方法，支持两种融合模式：

stochastic（随机模式）：随机采样不同视角的条件，适合视角间有重叠的场景
multidiffusion（多扩散模式）：同时考虑所有视角条件，适合视角差异大的场景

代码示例：

# 多图像到 3D 生成（基于官方 example_multi_image.py）
import os
os.environ['SPCONV_ALGO'] = 'native'

import numpy as np
import imageio
from PIL import Image
from trellis.pipelines import TrellisImageTo3DPipeline
from trellis.utils import render_utils

pipeline = TrellisImageTo3DPipeline.from_pretrained("microsoft/TRELLIS-image-large")
pipeline.cuda()

# 准备多张不同视角的图像
images = [
    Image.open("assets/example_multi_image/character_1.png"),  # 正面
    Image.open("assets/example_multi_image/character_2.png"),  # 侧面
    Image.open("assets/example_multi_image/character_3.png"),  # 背面
]

# 使用 multidiffusion 模式融合多视角信息
outputs = pipeline.run_multi_image(
    images,
    seed=1,
    mode='multidiffusion',  # 'stochastic' 或 'multidiffusion'
    sparse_structure_sampler_params={
        "steps": 12,
        "cfg_strength": 7.5,
    },
    slat_sampler_params={
        "steps": 12,
        "cfg_strength": 3,
    },
)

# 渲染对比视频
video_gs = render_utils.render_video(outputs['gaussian'][0])['color']
video_mesh = render_utils.render_video(outputs['mesh'][0])['normal']
video = [np.concatenate([frame_gs, frame_mesh], axis=1)
         for frame_gs, frame_mesh in zip(video_gs, video_mesh)]
imageio.mimsave("multi_image_result.mp4", video, fps=30)
print("多图像生成完成，视频已保存: multi_image_result.mp4")

执行结果：

多图像生成完成，视频已保存: multi_image_result.mp4

注意事项： - 多图像模式的显存占用高于单图像模式，建议使用 24GB+ 显存。 - 输入图像应展示同一物体的不同视角，不要混入不同物体的图像。 - multidiffusion 模式在视角差异大时效果更好，但速度比 stochastic 慢约 2 倍。

练习题： 1. 分别使用 stochastic 和 multidiffusion 模式生成同一组图像的 3D 模型，对比结果差异。 2. 尝试只用 2 张图像 vs 3 张图像，观察生成质量的差异。

2.4 使用 Gradio Web 界面

详细讲解：

TRELLIS 提供了基于 Gradio 的 Web 界面，无需编写代码即可交互式生成 3D 模型。适合快速实验和演示。

代码示例：

# 启动 Gradio Web 界面（TRELLIS v1）
cd /path/to/TRELLIS

# 图像到 3D 的 Web 界面
python app.py
# 启动后浏览器访问 http://127.0.0.1:7860

# 文本到 3D 的 Web 界面
python app_text.py
# 启动后浏览器访问 http://127.0.0.1:7860

# TRELLIS.2 的 Web 界面
cd /path/to/TRELLIS.2
python app.py              # 图像到 3D
python app_texturing.py    # 纹理生成

执行结果：

Running on local URL:  http://127.0.0.1:7860

注意事项： - Gradio 界面默认使用所有 GPU 显存，不适合在共享服务器上直接使用。 - 可通过 --server_port 参数修改端口，通过 --share 参数生成公网链接。

练习题： 1. 通过 Gradio 界面生成一个 3D 模型，并分别下载 Gaussian、Mesh、GLB 格式的输出，对比文件大小。

第三部分：高级篇

3.1 资产变体生成（保持结构，改变外观）

详细讲解：

TRELLIS v1 的独特能力是 资产变体生成（Asset Variant）：给定一个已有的 3D 模型，生成外观不同但几何结构相同的新变体。

原理：TRELLIS 的两阶段生成分离了结构（Stage 1）和外观（Stage 2）。变体生成复用已有模型的体素结构（跳过 Stage 1），仅重新生成潜在向量（Stage 2），用新的文本提示引导外观变化。

代码示例：

# 资产变体生成（基于官方 example_variant.py）
import os
os.environ['SPCONV_ALGO'] = 'native'

import imageio
import numpy as np
import open3d as o3d
from trellis.pipelines import TrellisTextTo3DPipeline
from trellis.utils import render_utils, postprocessing_utils

# 注意：变体生成使用 TextTo3D 管线（不是 ImageTo3D）
pipeline = TrellisTextTo3DPipeline.from_pretrained("microsoft/TRELLIS-text-xlarge")
pipeline.cuda()

# 加载基础 Mesh（需要是 PLY 格式的三角网格）
base_mesh = o3d.io.read_triangle_mesh("assets/T.ply")

# 用文本描述指定目标外观
variant_prompt = "Rugged, metallic texture with orange and white paint finish, suggesting a durable, industrial feel."

outputs = pipeline.run_variant(
    base_mesh,
    variant_prompt,
    seed=1,
    # 注意：变体模式只有 slat_sampler_params（没有 sparse_structure_sampler_params）
    # slat_sampler_params={
    #     "steps": 12,
    #     "cfg_strength": 7.5,
    # },
)

# 导出变体
glb = postprocessing_utils.to_glb(
    outputs['gaussian'][0],
    outputs['mesh'][0],
    simplify=0.95,
    texture_size=1024,
)
glb.export("variant_industrial.glb")
print(f"变体提示: '{variant_prompt}'")
print(f"变体 GLB 已保存: variant_industrial.glb")

# 生成多个变体
for i, style in enumerate(["wooden carved texture", "neon glowing cyberpunk style", "weathered stone texture"]):
    outputs = pipeline.run_variant(base_mesh, style, seed=i)
    glb = postprocessing_utils.to_glb(
        outputs['gaussian'][0], outputs['mesh'][0],
        simplify=0.95, texture_size=1024
    )
    glb.export(f"variant_{i}.glb")
    print(f"变体 {i} ({style}) 已保存")

执行结果：

变体提示: 'Rugged, metallic texture with orange and white paint finish, suggesting a durable, industrial feel.'
变体 GLB 已保存: variant_industrial.glb
变体 0 (wooden carved texture) 已保存
变体 1 (neon glowing cyberpunk style) 已保存
变体 2 (weathered stone texture) 已保存

注意事项： - 输入 Mesh 必须是 PLY 格式（Open3D 可读），其他格式需先转换。 - 变体生成只改变外观（颜色、材质），不改变几何结构。如果需要修改几何，需重新运行完整生成。 - run_variant 会将输入 Mesh 体素化为 64³ 网格，过于精细的细节可能在体素化过程中丢失。

3.2 TRELLIS.2 高级特性——PBR 纹理和分辨率控制

详细讲解：

TRELLIS.2 引入了多项重要升级：

PBR（Physically-Based Rendering，基于物理的渲染）纹理：生成 Base Color、Roughness、Metallic、Opacity 四种纹理贴图，适合游戏引擎和实时渲染
分辨率控制：支持 512³、1024³、1536³ 三种分辨率，通过级联上采样实现
O-Voxel 表示：支持任意拓扑结构（开放表面、非流形几何）

代码示例：

# TRELLIS.2 图像到 3D 生成（基于官方 TRELLIS.2 example.py）
import os
os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

import cv2
import imageio
from PIL import Image
import torch
from trellis2.pipelines import Trellis2ImageTo3DPipeline
from trellis2.utils import render_utils
from trellis2.renderers import EnvMap
import o_voxel

# 加载 HDR 环境贴图（用于 PBR 渲染）
envmap = EnvMap(torch.tensor(
    cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
    dtype=torch.float32, device='cuda'
))

# 加载 TRELLIS.2 管线（4B 参数模型）
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
pipeline.cuda()

image = Image.open("assets/example_image/T.png")

# --- 分辨率选择 ---
# 方案 A：512 快速预览（~3 秒 on H100）
mesh_512 = pipeline.run(image, pipeline_type='512', seed=1)[0]
print(f"512 分辨率生成完成")

# 方案 B：1024 标准质量（~17 秒 on H100）
mesh_1024 = pipeline.run(image, pipeline_type='1024_cascade', seed=1)[0]
print(f"1024 级联生成完成")

# 方案 C：1536 最高质量（~60 秒 on H100）
mesh_1536 = pipeline.run(image, pipeline_type='1536_cascade', seed=1)[0]
mesh_1536.simplify(16777216)  # 简化网格到约 16M 面
print(f"1536 级联生成完成")

# --- PBR 渲染 ---
video = render_utils.make_pbr_vis_frames(
    render_utils.render_video(mesh_1024, envmap=envmap)
)
imageio.mimsave("trellis2_pbr.mp4", video, fps=15)
print("PBR 渲染视频已保存")

# --- 导出 GLB（含 PBR 纹理）---
glb = o_voxel.postprocess.to_glb(
    vertices=mesh_1024.vertices,
    faces=mesh_1024.faces,
    attr_volume=mesh_1024.attrs,
    coords=mesh_1024.coords,
    attr_layout=mesh_1024.layout,
    voxel_size=mesh_1024.voxel_size,
    aabb=[[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
    decimation_target=1000000,  # 目标 100 万面
    texture_size=4096,          # 4K 纹理
    remesh=True,
    remesh_band=1,
    remesh_project=0,
    verbose=True
)
glb.export("trellis2_output.glb", extension_webp=True)
print("TRELLIS.2 GLB 已保存（含 PBR 纹理）")

执行结果：

512 分辨率生成完成
1024 级联生成完成
1536 级联生成完成
PBR 渲染视频已保存
TRELLIS.2 GLB 已保存（含 PBR 纹理）

注意事项： - TRELLIS.2 需要 24GB+ 显存。如果 OOM，确保设置了 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True。 - 1536 分辨率仅在 H100（80GB）上可运行。A100（40GB）建议最高使用 1024。 - pipeline_type 的 _cascade 后缀表示级联上采样：先 512 再上采样，质量优于直接高分辨率。

3.3 TRELLIS.2 纹理生成管线

详细讲解：

TRELLIS.2 提供了独立的纹理生成管线（Texturing Pipeline），可以为已有的 3D Mesh 添加基于参考图像的 PBR 纹理。这对于需要为手工建模或扫描得到的 Mesh 自动生成纹理的场景特别有用。

代码示例：

# TRELLIS.2 纹理生成（基于官方 example_texturing.py）
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

import trimesh
from PIL import Image
from trellis2.pipelines import Trellis2TexturingPipeline

# 注意：使用不同的 config_file 加载纹理管线
pipeline = Trellis2TexturingPipeline.from_pretrained(
    "microsoft/TRELLIS.2-4B",
    config_file="texturing_pipeline.json"  # 关键：指定纹理配置
)
pipeline.cuda()

# 加载目标 Mesh 和参考图像
mesh = trimesh.load("assets/example_texturing/the_forgotten_knight.ply")
image = Image.open("assets/example_texturing/image.webp")

# 运行纹理生成
output = pipeline.run(
    mesh,
    image,
    seed=42,
    resolution=1024,       # 内部处理分辨率
    texture_size=2048,     # 输出纹理分辨率
)

# 导出带纹理的 GLB
output.export("textured_output.glb", extension_webp=True)
print("纹理生成完成，已保存: textured_output.glb")

执行结果：

纹理生成完成，已保存: textured_output.glb

注意事项： - 纹理管线使用 config_file="texturing_pipeline.json"，不要使用默认配置。 - 输入 Mesh 需要合理的水密性（watertight）和 UV 展开。 - 参考图像的风格会直接映射到纹理上，选择风格清晰的参考图效果更好。

3.4 性能优化与显存管理

详细讲解：

TRELLIS 是显存密集型应用。以下是关键的性能优化策略：

优化策略 1：控制 spconv 算法选择

import os

# 首次运行使用 'native'（避免 benchmark 开销）
# 批量生产时使用 'auto'（benchmark 后自动选择最快算法）
os.environ['SPCONV_ALGO'] = 'native'  # 单次推理推荐
# os.environ['SPCONV_ALGO'] = 'auto'  # 批量推理推荐

优化策略 2：选择性格式生成

# 只生成需要的格式，减少不必要的解码开销
outputs = pipeline.run(
    image,
    seed=1,
    formats=['mesh'],  # 只生成 Mesh，跳过 Gaussian 和 RF
)

优化策略 3：显存回收

import torch
import gc

def generate_with_cleanup(pipeline, image, seed=1):
    """生成后自动清理显存"""
    outputs = pipeline.run(image, seed=seed, formats=['mesh', 'gaussian'])

    # 立即导出需要的格式
    from trellis.utils import postprocessing_utils
    glb = postprocessing_utils.to_glb(
        outputs['gaussian'][0], outputs['mesh'][0],
        simplify=0.95, texture_size=1024
    )
    glb.export("output.glb")

    # 清理显存
    del outputs
    torch.cuda.empty_cache()
    gc.collect()

    return glb

优化策略 4：TRELLIS.2 低显存模式

TRELLIS.2 默认启用 low_vram 模式，模型按需加载到 GPU，使用后卸载到 CPU：

# TRELLIS.2 默认就是 low_vram 模式，无需额外配置
# 如果遇到 OOM，检查：
# 1. 是否设置了 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
# 2. 是否关闭了其他占用 GPU 的进程
# 3. 是否选择了过高的分辨率（降低 pipeline_type）

注意事项： - 24GB 显存（如 RTX 4090）运行 TRELLIS v1 足够，但 TRELLIS.2 建议至少 A100 40GB。 - 批量生成时，每次生成后调用 torch.cuda.empty_cache() 可以避免显存碎片化。 - formats=['mesh'] 只生成 Mesh 比生成全部三种格式节省约 40% 显存。

3.5 最佳实践

输入图像准备：使用白色或纯色背景的图像。复杂背景会干扰模型对物体的理解。可以使用图像抠图工具（如 rembg）预处理。
随机种子管理：固定 seed 值确保结果可复现。批量生成时使用不同 seed 获得多样性。
GLB 导出参数：simplify=0.95 适合大多数场景（保留 5% 的面），如需高质量可降低到 0.8。texture_size=1024 是质量和文件大小的平衡点。
版本选择：快速原型和批量生成用 v1（速度快、显存低）；最终产品输出用 TRELLIS.2（PBR 纹理、高分辨率）。

第四部分：实战项目

项目需求

构建一个 批量 3D 资产生成管线，实现以下功能： 1. 从指定目录读取多张 2D 图像 2. 批量生成 3D 模型 3. 自动导出为 GLB 格式 4. 生成质量报告（文件大小、面数、顶点数） 5. 支持参数配置（质量/速度权衡）

本项目的知识点覆盖： - 1.2 节：图像到 3D 的基础管线调用 - 2.1 节：生成参数调优 - 1.3 节：GLB 格式导出 - 3.4 节：显存管理和批量处理优化

项目设计

batch_3d_generator/
├── generate.py          # 主脚本：批量生成逻辑
├── config.json          # 参数配置文件
├── input_images/        # 输入图像目录
│   ├── image1.png
│   ├── image2.png
│   └── image3.png
└── output/              # 输出目录（自动创建）
    ├── image1/
    │   ├── model.glb
    │   ├── preview.mp4
    │   └── metadata.json
    └── report.json      # 全局质量报告

完整实现代码

# generate.py —— 批量 3D 资产生成管线
# 基于 TRELLIS v1 官方 API（基于官方仓库 API）

import os
os.environ['SPCONV_ALGO'] = 'native'  # 使用 native 模式避免 benchmark 开销

import json
import time
import gc
import imageio
import torch
from pathlib import Path
from PIL import Image
from trellis.pipelines import TrellisImageTo3DPipeline
from trellis.utils import render_utils, postprocessing_utils


def load_config(config_path):
    """加载配置文件"""
    default_config = {
        "model": "microsoft/TRELLIS-image-large",
        "input_dir": "input_images",
        "output_dir": "output",
        "seed": 42,
        "formats": ["mesh", "gaussian"],
        "export_glb": True,
        "export_video": True,
        "sparse_structure_sampler_params": {
            "steps": 12,
            "cfg_strength": 7.5
        },
        "slat_sampler_params": {
            "steps": 12,
            "cfg_strength": 3
        },
        "glb_simplify": 0.95,
        "glb_texture_size": 1024
    }

    if Path(config_path).exists():
        with open(config_path, 'r') as f:
            user_config = json.load(f)
        default_config.update(user_config)

    return default_config


def collect_images(input_dir):
    """收集输入目录中的所有图像文件"""
    supported_ext = {'.png', '.jpg', '.jpeg', '.webp', '.bmp'}
    images = []
    for ext in supported_ext:
        images.extend(Path(input_dir).glob(f'*{ext}'))
        images.extend(Path(input_dir).glob(f'*{ext.upper()}'))
    return sorted(set(images))


def generate_single(pipeline, image_path, output_dir, config):
    """处理单张图像的完整流程"""
    name = image_path.stem
    item_dir = Path(output_dir) / name
    item_dir.mkdir(parents=True, exist_ok=True)

    # 加载图像
    image = Image.open(image_path).convert('RGB')
    print(f"  处理: {image_path.name} ({image.size[0]}x{image.size[1]})")

    # 生成 3D
    start_time = time.time()
    outputs = pipeline.run(
        image,
        seed=config["seed"],
        formats=config["formats"],
        sparse_structure_sampler_params=config["sparse_structure_sampler_params"],
        slat_sampler_params=config["slat_sampler_params"],
    )
    gen_time = time.time() - start_time
    print(f"  生成完成，耗时 {gen_time:.1f} 秒")

    # 收集元数据
    metadata = {
        "source_image": str(image_path),
        "image_size": list(image.size),
        "generation_time_seconds": round(gen_time, 1),
        "config": {
            "seed": config["seed"],
            "steps_structure": config["sparse_structure_sampler_params"]["steps"],
            "steps_slat": config["slat_sampler_params"]["steps"],
        },
        "outputs": {}
    }

    # 导出 GLB
    if config["export_glb"] and 'gaussian' in outputs and 'mesh' in outputs:
        glb = postprocessing_utils.to_glb(
            outputs['gaussian'][0],
            outputs['mesh'][0],
            simplify=config["glb_simplify"],
            texture_size=config["glb_texture_size"],
        )
        glb_path = str(item_dir / "model.glb")
        glb.export(glb_path)
        glb_size = Path(glb_path).stat().st_size / 1024 / 1024
        metadata["outputs"]["glb"] = {
            "path": glb_path,
            "file_size_mb": round(glb_size, 2),
            "vertices": len(glb.vertices),
            "faces": len(glb.faces),
        }
        print(f"  GLB 导出完成: {glb_size:.1f} MB, {len(glb.vertices)} 顶点, {len(glb.faces)} 面")

    # 导出预览视频
    if config["export_video"] and 'gaussian' in outputs:
        video = render_utils.render_video(outputs['gaussian'][0])['color']
        video_path = str(item_dir / "preview.mp4")
        imageio.mimsave(video_path, video, fps=30)
        metadata["outputs"]["preview_video"] = video_path
        print(f"  预览视频已保存")

    # 保存元数据
    with open(item_dir / "metadata.json", 'w') as f:
        json.dump(metadata, f, indent=2, ensure_ascii=False)

    # 释放显存（知识点 3.4：显存管理）
    del outputs
    torch.cuda.empty_cache()
    gc.collect()

    return metadata


def main():
    """主入口"""
    config = load_config("config.json")
    print("=" * 60)
    print("TRELLIS 批量 3D 资产生成管线")
    print("=" * 60)

    # 收集图像
    images = collect_images(config["input_dir"])
    print(f"找到 {len(images)} 张图像")

    if not images:
        print(f"错误：在 {config['input_dir']} 中未找到图像文件")
        return

    # 加载模型（知识点 1.2：管线加载）
    print(f"加载模型: {config['model']}...")
    pipeline = TrellisImageTo3DPipeline.from_pretrained(config["model"])
    pipeline.cuda()
    print("模型加载完成")

    # 批量处理（知识点 2.1：参数调优 + 知识点 3.4：显存管理）
    all_metadata = []
    total_start = time.time()

    for i, image_path in enumerate(images):
        print(f"\n[{i+1}/{len(images)}] 处理: {image_path.name}")
        try:
            metadata = generate_single(pipeline, image_path, config["output_dir"], config)
            all_metadata.append(metadata)
        except Exception as e:
            print(f"  错误: {e}")
            all_metadata.append({"source_image": str(image_path), "error": str(e)})

    total_time = time.time() - total_start

    # 生成全局报告
    success_count = sum(1 for m in all_metadata if "error" not in m)
    report = {
        "total_images": len(images),
        "successful": success_count,
        "failed": len(images) - success_count,
        "total_time_seconds": round(total_time, 1),
        "avg_time_per_image": round(total_time / max(len(images), 1), 1),
        "results": all_metadata,
    }

    report_path = Path(config["output_dir"]) / "report.json"
    report_path.parent.mkdir(parents=True, exist_ok=True)
    with open(report_path, 'w') as f:
        json.dump(report, f, indent=2, ensure_ascii=False)

    print("\n" + "=" * 60)
    print(f"全部完成！成功: {success_count}/{len(images)}")
    print(f"总耗时: {total_time:.1f} 秒")
    print(f"平均每张: {total_time/len(images):.1f} 秒")
    print(f"报告保存: {report_path}")
    print("=" * 60)


if __name__ == "__main__":
    main()

配置文件（config.json）：

{
    "model": "microsoft/TRELLIS-image-large",
    "input_dir": "input_images",
    "output_dir": "output",
    "seed": 42,
    "formats": ["mesh", "gaussian"],
    "export_glb": true,
    "export_video": true,
    "sparse_structure_sampler_params": {
        "steps": 12,
        "cfg_strength": 7.5
    },
    "slat_sampler_params": {
        "steps": 12,
        "cfg_strength": 3
    },
    "glb_simplify": 0.95,
    "glb_texture_size": 1024
}

代码解析

管线加载（知识点 1.2）：第 95-97 行使用 from_pretrained 加载模型，整个批量任务只加载一次模型，避免重复加载的开销。
参数配置（知识点 2.1）：config.json 中集中管理 steps 和 cfg_strength 参数，快速模式（steps=12）约 10 秒/张，高质量模式（steps=50）约 30 秒/张。
GLB 导出（知识点 1.3）：第 74-88 行结合 Gaussian 纹理和 Mesh 几何导出 GLB，通过 simplify 控制面数、texture_size 控制纹理分辨率。
显存管理（知识点 3.4）：第 99-101 行在每次生成后释放输出对象并清理 GPU 缓存，防止长时间批量运行时的显存泄漏。

扩展挑战

添加图像预处理：集成 rembg 库自动去除输入图像背景，提升生成质量。提示：pip install rembg，使用 rembg.remove(image) 去背景。
多 GPU 并行：修改代码支持多 GPU 并行生成，每个 GPU 处理不同的图像。提示：使用 torch.cuda.set_device() 和多进程。
质量评估自动化：集成 CLIP Score 自动评估生成 3D 模型与输入图像的一致性，将评分写入报告。

第五部分：常见问题与排查指南

常见错误及解决方案

错误信息	原因	解决方案
`CUDA out of memory`	GPU 显存不足	1. 设置 `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` 2. 使用 `formats=['mesh']` 只生成需要的格式 3. 降低 `steps` 参数 4. 关闭其他 GPU 进程
`spconv 编译失败`	CUDA/GCC 版本不匹配	确认 `nvcc --version` 和 PyTorch 的 CUDA 版本一致。尝试 `pip install spconv-cu118` 或 `spconv-cu121`
`FlashAttention 导入失败`	CUDA 版本不支持	设置 `os.environ['ATTN_BACKEND'] = 'xformers'` 使用 xFormers 后端
`模型下载超时`	HuggingFace 网络问题	使用镜像：`HF_ENDPOINT=https://hf-mirror.com` 或手动下载模型文件到本地
`生成的 3D 模型有残影/幽灵结构`	输入图像背景复杂	使用纯色背景图像，或使用 rembg 去除背景后再输入
`生成的 Mesh 不完整/缺失部分`	cfg_strength 过低或步数不足	提高 `cfg_strength` 到 5-7.5，增加 `steps` 到 30+
`GLB 文件过大`	面数和纹理分辨率过高	降低 `simplify` 参数值（如 0.98），降低 `texture_size`（如 512）
`TRELLIS.2 OOM on 24GB GPU`	1536 分辨率需要更大显存	使用 `pipeline_type='1024_cascade'` 代替 `'1536_cascade'`
`ModuleNotFoundError: No module named 'trellis'`	未在项目根目录运行	确保在 TRELLIS 仓库根目录下运行脚本，或将仓库路径添加到 `PYTHONPATH`
`xFormers 和 FlashAttention 冲突`	两个库不兼容	只安装其中一个。推荐 FlashAttention（更快），不兼容时用 xFormers

调试技巧

检查 GPU 显存使用：在生成前打印 torch.cuda.memory_allocated() / 1024**3 查看当前显存占用。如果接近上限，先生成小分辨率（512）测试，确认参数正确后再提高分辨率。
逐步排查生成质量：如果生成结果不理想，按以下顺序排查：
先检查输入图像质量（是否清晰、背景是否干净）
再检查 seed 是否固定（确保可复现）
然后调整 cfg_strength（先尝试 3-7.5 范围）
最后调整 steps（先尝试 30-50）
环境变量调试：设置 CUDA_LAUNCH_BLOCKING=1 可以在 CUDA 错误时获得精确的堆栈跟踪（会降低性能，仅用于调试）。

第六部分：学习路线推荐

官方文档推荐阅读顺序

TRELLIS GitHub README - 重点关注安装步骤和最小示例，快速跑通第一个生成
TRELLIS example.py - 完整的图像到 3D 示例代码，逐行理解每个步骤
TRELLIS example_text.py - 文本到 3D 示例，理解不同管线的差异
TRELLIS example_variant.py - 变体生成示例，理解两阶段生成的解耦特性
TRELLIS 论文 (arXiv:2412.01506) - 第 3 节方法论，深入理解 SLAT 表示和 Rectified Flow
TRELLIS.2 GitHub README - 理解 v2 的 API 变化和新特性
TRELLIS.2 example.py - v2 的完整使用示例，掌握 PBR 和分辨率控制
TRELLIS 训练代码 - 如果需要微调，阅读 train.py 和配置文件

TRELLIS - 完整学习教程

TRELLIS - 完整学习教程

环境搭建指南

系统要求

安装步骤（TRELLIS v1）

安装步骤（TRELLIS.2）

验证安装

第一部分：入门篇

1.1 理解 TRELLIS 的核心概念——SLAT 表示

1.2 第一次 3D 生成——图像到 3D

1.3 理解三种输出格式

第二部分：进阶篇

2.1 调优生成参数

2.2 文本到 3D 生成

2.3 多图像到 3D 生成

2.4 使用 Gradio Web 界面

第三部分：高级篇

3.1 资产变体生成（保持结构，改变外观）

3.2 TRELLIS.2 高级特性——PBR 纹理和分辨率控制

3.3 TRELLIS.2 纹理生成管线

3.4 性能优化与显存管理

3.5 最佳实践

第四部分：实战项目

项目需求

项目设计

完整实现代码

代码解析

扩展挑战

第五部分：常见问题与排查指南

常见错误及解决方案

调试技巧

第六部分：学习路线推荐

官方文档推荐阅读顺序

推荐进阶资源

相关技术拓展