unit_hifigan_HK_layer12.km2500_frame_TAT-TTS
Hokkien unit HiFiGAN based vocoder from fairseq:
- Trained with TAT-TTS data with 4 speakers in Taiwanese Hokkien accent. See here
for more training details.
Usage
import json
import os
from pathlib import Path
import IPython.display as ipd
from fairseq import hub_utils
from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.speech_to_text.hub_interface import S2THubInterface
from fairseq.models.text_to_speech import CodeHiFiGANVocoder
from fairseq.models.text_to_speech.hub_interface import VocoderHubInterface
from huggingface_hub import snapshot_download
import torchaudio
cache_dir = os.getenv("HUGGINGFACE_HUB_CACHE")
# speech synthesis
library_name = "fairseq"
cache_dir = (
cache_dir or (Path.home() / ".cache" / library_name).as_posix()
)
cache_dir = snapshot_download(
f"facebook/unit_hifigan_HK_layer12.km2500_frame_TAT-TTS", cache_dir=cache_dir, library_name=library_name
)
x = hub_utils.from_pretrained(
cache_dir,
"model.pt",
".",
archive_map=CodeHiFiGANVocoder.hub_models(),
config_yaml="config.json",
fp16=False,
is_vocoder=True,
)
with open(f"{x['args']['data']}/config.json") as f:
vocoder_cfg = json.load(f)
assert (
len(x["args"]["model_path"]) == 1
), "Too many vocoder models in the input"
vocoder = CodeHiFiGANVocoder(x["args"]["model_path"][0], vocoder_cfg)
tts_model = VocoderHubInterface(vocoder_cfg, vocoder)
tts_sample = tts_model.get_model_input(unit)
wav, sr = tts_model.get_prediction(tts_sample)
ipd.Audio(wav, rate=sr)
数据统计
相关导航
The African Regional Intellectual Property Organization (ARIPO) is an inter-governmental organization (IGO) that facilitates cooperation among Member States in intellectual property matters, with the objective of pooling financial and human resources and seeking technological advancement for economic, social, technological, scientific and industrial development.

