DiffSinger (OpenVPI maintained version)

This is a refactored and enhanced version ofDiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanismbased on the originalpaperandimplementation,which provides:

Cleaner code structure: useless and redundant files are removed and the others are re-organized.
Better sound quality: the sampling rate of synthesized audio are adapted to 44.1 kHz instead of the original 24 kHz.
Higher fidelity: improved acoustic models and diffusion sampling acceleration algorithms are integrated.
More controllability: introduced variance models and parameters for prediction and control of pitch, energy, breathiness, etc.
Production compatibility: functionalities are designed to match the requirements of production deployment and the SVS communities.

Overview	Variance Model	Acoustic Model

User Guidance

Tiếng Trung giáo trình / Chinese Tutorials:Text,Video

Installation & basic usages:SeeGetting Started
Dataset creation pipelines & tools:SeeMakeDiffSinger
Best practices & tutorials:SeeBest Practices
Editing configurations:SeeConfiguration Schemas
Deployment & production:OpenUTAU for DiffSinger,DiffScope (under development)
Communication groups:QQ Group(907879266),Discord server

Progress & Roadmap

Progress since we forked into this repository:SeeReleases
Roadmap for future releases:SeeProject Board
Thoughts, proposals & ideas:SeeDiscussions

Architecture & Algorithms

TBD

Development Resources

TBD

References

Original Paper & Implementation

Paper:DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Implementation:MoonInTheRiver/DiffSinger

Generative Models & Algorithms

Denoising Diffusion Probabilistic Models (DDPM):paper,implementation
- DDIMfor diffusion sampling acceleration
- PNDMfor diffusion sampling acceleration
- DPM-Solver++for diffusion sampling acceleration
- UniPCfor diffusion sampling acceleration
Rectified Flow (RF):paper,implementation

Dependencies & Submodules

HiFi-GANandNSFfor waveform reconstruction
pc-ddspfor waveform reconstruction
RMVPEand yxlllc'sforkfor pitch extraction
Vocal Removerand yxlllc'sforkfor harmonic-noise separation

Disclaimer

Any organization or individual is prohibited from using any functionalities included in this repository to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

License

This forked DiffSinger repository is licensed under theApache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 1,112 Commits
augmentation		augmentation
basics		basics
checkpoints		checkpoints
configs		configs
data		data
deployment		deployment
dictionaries		dictionaries
docs		docs
inference		inference
modules		modules
preprocessing		preprocessing
samples		samples
scripts		scripts
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-onnx.txt		requirements-onnx.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffSinger (OpenVPI maintained version)

User Guidance

Progress & Roadmap

Architecture & Algorithms

Development Resources

References

Original Paper & Implementation

Generative Models & Algorithms

Dependencies & Submodules

Disclaimer

License

About

Releases 14

Languages

License

openvpi/DiffSinger

Folders and files

Latest commit

History

Repository files navigation

DiffSinger (OpenVPI maintained version)

User Guidance

Progress & Roadmap

Architecture & Algorithms

Development Resources

References

Original Paper & Implementation

Generative Models & Algorithms

Dependencies & Submodules

Disclaimer

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 14

Languages