I've trained a 22KHz Fre-GAN to 150,000 iterations, using Clipper's dataset. Then I fine-tuned it on TalkNet Twilight.
Talking, HiFi-GAN:
https://u.smutty.horse/mcdyfbrvzru.oggTalking, Fre-GAN:
https://u.smutty.horse/mcdyfbwwkwm.oggSinging, HiFi-GAN:
https://u.smutty.horse/mcdyjcbbhas.oggSinging, Fre-GAN:
https://u.smutty.horse/mcdyjbzzoop.oggGenerator (not fine-tuned):
https://drive.google.com/file/d/1igzaoSx5iPiokFyCzoFSglor5H7Ip-3N/view?usp=sharingDiscriminator (not fine-tuned):
https://drive.google.com/file/d/11KLm-NO0-kHnwhDY5Oswof7Ao3y2FZ1y/view?usp=sharingThe breathing sounds a little less robotic on Fre-GAN, but there's no dramatic difference aside from that. I don't have the patience to train it until the noise cleans up (1M+ steps?), so I think I'll stick with HiFi-GAN and do something else.