SiliconSense
Unlock all · $26.87
← Lab

🎙 A great voice dataset

A voice model is 90% about the dataset. Garbage in, garbage out — the model can never sound better than the audio you train it on. This guide shows how to prepare it right, step by step, with our tools.

In short (TL;DR)

📏 How much you need (and the limit)

5–8 minour sweet spot — fits the limit and gives a quality model.
~9–10 minupper edge for WAV — see the limit below.
<3 mintoo little — the voice will be flat.

⚠️ Upload limit — 50 MB. A 48 kHz mono WAV is ~5–6 MB per minute, so 50 MB holds about 8–9 minutes. A 14-minute dataset comes out to ~80 MB — it simply won’t upload. So aim for 5–10 minutes (ideally 5–8) of dense clean voice without pauses (Vocal Stitch strips them). 5–8 varied minutes is plenty for a good model.

✅ What the audio must be like

👍 Do
  • Vocals only (a cappella), no instrumental
  • Dry sound — no reverb or echo
  • One voice, no backing vocals
  • Clean, no noise or hiss
  • No clipping or distortion
  • Lossless (WAV / FLAC) for the source
  • Mono, 48 kHz
👎 Kills quality
  • Music / instrumental in the background
  • Reverb and echo — RVC’s #1 enemy
  • Noise, hiss, hum
  • A second voice, choir, backing
  • Heavy effects (autotune, distortion)
  • Long pauses and silence
  • MP3/OGG as the dataset source

🎭 Variety beats quantity

Varied 5–8 minutes beat the same minutes of monotone, single-note reading. Make sure the dataset covers:

🗣 So the model doesn’t lisp — the Ж, З, Р, Ш, Щ, Ч sounds

RVC’s weak spot is sibilants and the rolled “r”. If they’re scarce or slurred in the data, the model mangles them. So deliberately add clear, unhurried Ж, З, Р, Ш, Щ, Ч sounds — the easiest way is to record a few Russian tongue-twisters, clearly and slowly:

Read them clearly and slowly — that gives the model clean samples of the hard sounds so it stops “swallowing” them.

🛠 The pipeline, step by step

1
Gather source audio of the voice

Gather recordings of the target voice so that after cleanup you have 5–10 minutes of clean vocals: songs, podcasts, voice notes. The cleaner the source, the less cleanup later.

2
Separate vocals from music (if from songs)

If the voice is already clean (a cappella or a mic recording) — skip this. If it’s a song — split out the vocals with any vocal remover and keep only the vocal stem, ideally with reverb removed.

🔗 vocalremover.org (free, in-browser)
3
Cut the pauses and join into one file

Drop the vocal files into our Vocal Stitch — it cuts the silence between phrases and joins everything into one continuous 48 kHz mono WAV. That’s your ready dataset.

🎚 Open Vocal Stitch →
4
Check the material (optional)

You can run it through the Track Analyzer — check duration and loudness, make sure there’s no clipping or dropouts.

🎛 Open Track Analyzer →
5
Train the voice

Upload your finished dataset to Train a voice — that’s it. Training is fully automatic: every parameter (sample rate, method, training length) is tuned for you, nothing to choose. In a few minutes you get a model (available 6 hours — download it or use it right away in Change Timbre).
Quality depends only on the dataset — that’s why steps 1–4 are everything.

🎙 Open Train a voice →
6
Apply the voice

In Change Timbre: upload the vocal you want to re-voice, pick your model and set the Pitch — “No change” if both voices are the same gender, or “Male → female” / “Female → male” when converting across genders. Nothing else to tweak — the rest is automatic.
If the voice mangles sounds, that’s not a setting — it’s the dataset: go back to clean vocals and the Ж/З/Р sounds (see the block above).

💡 From the community’s experience: the source vocal (the one you re-voice) works best when it’s close in pitch range to the target voice — or match it with the Pitch shift. And with clear diction: if the source swallows words, the model copies that. A well-articulated vocal in the right range = a noticeably better result.
🎚 Open Change Timbre →

🚫 Common mistakes

⚖️ Use only your own voice or one you have the rights/permission for. Cloning someone’s voice without consent is prohibited.

🎚 Start: Vocal Stitch →
Like the tools? Unlock the full catalog
Ready style prompts for 811 artists · 🧪 Lab (12 tools) · 50 𝄞 monthly. One-time payment, no subscription.