TTS语音模型训练

可以先通过bilibili视频了解一下Bert-VITS2所生成的语音效果。

源码下载与环境准备

本文使用开源项目 Bert-VITS2 来生成模型。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


git clone git@github.com:fishaudio/Bert-VITS2.git

# 安装依赖的pip包
pip install -r requirements.txt

# 安装pytorch等
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# 检查依赖是否安装完毕
python text\chinese_bert.py

WebUI

执行以下命令可以运行一个WebUI：

1

python webui_preprocess.py

运行效果：

模型下载

按照WebUI上的指南，要能够开始训练数据，还需要先从huggingface上下载依赖的几个模型文件。

使用 git lfs 下载 huggingface 模型的时候可能会比较容易出错，此时可以使用替代的方案，使用简单的python代码下载，实测非常有效。代码如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


from pycrawlers import huggingface

hg = huggingface()

urls = ['https://huggingface.co/hfl/chinese-roberta-wwm-ext-large/tree/main',
        'https://huggingface.co/microsoft/wavlm-base-plus/tree/main',
        'https://huggingface.co/ku-nlp/deberta-v2-large-japanese-char-wwm/tree/main',
        'https://huggingface.co/microsoft/deberta-v3-large/tree/main']

hg.get_batch_data(urls)

当然需要首先：pip install pycrawlers

其他资源

安装过程中的疑难解法

pynini 安装失败

pip install -r requirements.txt 过程中可能会出现pynini安装失败的提示。不讲原因了，官方文档中也有提到"Pynini is neither designed for nor tested on Windows"。直接给解法：

1

conda install -c conda-forge pynini

torch 安装过程频繁失败

由于要下载3G+的内容，在网络不太稳定的情况下祭出暴力大招（大力出奇迹）：