This is a refactored and enhanced version ofDiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanismbased on the originalpaperandimplementation,which provides:
- Cleaner code structure: useless and redundant files are removed and the others are re-organized.
- Better sound quality: the sampling rate of synthesized audio are adapted to 44.1 kHz instead of the original 24 kHz.
- Higher fidelity: improved acoustic models and diffusion sampling acceleration algorithms are integrated.
- More controllability: introduced variance models and parameters for prediction and control of pitch, energy, breathiness, etc.
- Production compatibility: functionalities are designed to match the requirements of production deployment and the SVS communities.
Overview | Variance Model | Acoustic Model |
---|---|---|
![]() |
![]() |
![]() |
- Installation & basic usages:SeeGetting Started
- Dataset creation pipelines & tools:SeeMakeDiffSinger
- Best practices & tutorials:SeeBest Practices
- Editing configurations:SeeConfiguration Schemas
- Deployment & production:OpenUTAU for DiffSinger,DiffScope (under development)
- Communication groups:QQ Group(907879266),Discord server
- Progress since we forked into this repository:SeeReleases
- Roadmap for future releases:SeeProject Board
- Thoughts, proposals & ideas:SeeDiscussions
TBD
TBD
- Paper:DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
- Implementation:MoonInTheRiver/DiffSinger
- Denoising Diffusion Probabilistic Models (DDPM):paper,implementation
- DDIMfor diffusion sampling acceleration
- PNDMfor diffusion sampling acceleration
- DPM-Solver++for diffusion sampling acceleration
- UniPCfor diffusion sampling acceleration
- Rectified Flow (RF):paper,implementation
- HiFi-GANandNSFfor waveform reconstruction
- pc-ddspfor waveform reconstruction
- RMVPEand yxlllc'sforkfor pitch extraction
- Vocal Removerand yxlllc'sforkfor harmonic-noise separation
Any organization or individual is prohibited from using any functionalities included in this repository to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.
This forked DiffSinger repository is licensed under theApache 2.0 License.