Practical Guide to Foo DSP SoundTouch: Time Stretching & Pitch Shifting Explained

Foo DSP SoundTouch: Best Settings for Clean Tempo and Pitch ChangesChanging tempo or pitch without introducing obvious artifacts is one of the most common needs in audio processing — for DJs, audio engineers, app developers, and hobbyists. Foo DSP’s SoundTouch algorithm (commonly distributed as the SoundTouch library) is designed for high-quality time-stretching and pitch-shifting. This article walks through how the algorithm works in practice, which settings most affect audio quality, recommended defaults for different use cases, practical tips for minimizing artifacts, and examples of how to integrate and test settings.


What SoundTouch does (brief technical overview)

SoundTouch implements time-stretching and pitch-shifting using a combination of:

  • Tempo change: altering the playback speed without changing pitch.
  • Pitch change: shifting pitch without changing tempo.
  • Rate change: changing both tempo and pitch together (resampling).

The library uses a time-domain processing approach based on the synchronized overlap-add (SOLA) method and other techniques for transient handling and adaptive crossfading. Key parameters control the analysis window, sequence length, and overlap — these directly impact transient fidelity, smearing, and computational load.


Key parameters that affect quality

  • Analysis window (also called sequence length): length of the audio segment used to compute overlaps and crossfades. Longer windows produce smoother, more natural-sounding results for sustained tones but can smear transients and rhythmic detail. Shorter windows preserve transients but may introduce phasing and roughness on sustained sounds.

  • Overlap length: amount of overlap between consecutive analysis windows. Higher overlap improves pitch consistency and reduces artifacts at window boundaries but increases CPU usage.

  • Seek window / search window: used to find the best alignment between segments for overlap; tighter search reduces tempo artifacts but can miss optimal alignment during expressive material.

  • QuickSeek / use anti-aliasing flags: some builds include optimizations that change how aggressively the algorithm searches or filters; these trade quality for speed.

  • Pitch/tempo separation mode (rate vs tempo vs pitch): choose the right mode depending on whether you want pitch-only, tempo-only, or both changed.


Below are starting points. Always A/B test with representative material.

  • General music (balanced):

    • Sequence length: 50–80 ms
    • Overlap: 8–16 ms
    • Search/seek window: 5–20 ms
    • Rationale: Balances transient clarity and tonal smoothness.
  • Percussive/beat-heavy tracks (preserve transients):

    • Sequence length: 20–40 ms
    • Overlap: 6–12 ms
    • Search window: small (3–10 ms)
    • Rationale: Shorter sequences reduce smearing on drums and sharp attacks.
  • Vocal-centric material (naturalness):

    • Sequence length: 70–120 ms
    • Overlap: 12–24 ms
    • Search window: medium (10–30 ms)
    • Rationale: Voices benefit from longer windows to retain formant stability.
  • Extreme pitch shifts (> +/- 1 octave) or extreme tempo changes (> +/- 50%):

    • Expect artifacts regardless. Consider using multiband techniques, formant correction, or granular synth replacements.

Practical tips to minimize artifacts

  • Prefer small incremental changes: multiple small adjustments (e.g., 2–5% steps) often sound cleaner than a single large leap, especially for real-time adjustments.

  • Use anti-aliasing/resampling filters when using rate change to avoid high-frequency artifacts.

  • For pitch shifting, apply formant correction if available to avoid “chipmunk” or “monster” vocal timbres when shifting vocals.

  • Pre-process: apply gentle noise gating, de-essing, or transient shaping before time/pitch processing to give the algorithm cleaner material to work with.

  • Post-process: mild EQ, transient enhancement, or reverb can mask minor artifacts and make results sound more natural in context.

  • Test with multiple genres and instruments — a setting that works well for synth pads may fail for live drums or solo voice.


Example parameters in code (C++ pseudocode using SoundTouch)

#include "SoundTouch.h" SoundTouch st; st.setSampleRate(44100); st.setChannels(2); // For balanced music st.setTempoChange(10.0f);  // +10% tempo st.setPitchSemiTones(0.0f); // no pitch change st.setSetting(SETTING_SEQUENCE_MS, 60); st.setSetting(SETTING_OVERLAP_MS, 12); st.setSetting(SETTING_SEEKWINDOW_MS, 15); // Feed audio and receive processed output... 

Note: exact API names may differ depending on build/version. Check the SoundTouch header for current constants.


Testing methodology

  • Use a short, diverse test set: drum loop, solo vocal, sustained instrument (pad), and a complex full mix.
  • Render 0%, small (±5–10%), moderate (±20–30%), and extreme changes to hear behavior across ranges.
  • Listen for specific artifacts: transient smearing, flanging/phasiness, pitch instability, and formant distortion.
  • Measure CPU usage and latency for real-time contexts; increase overlap or sequence length only if CPU budget allows.

Troubleshooting common issues

  • Smearing/blurred transients: reduce sequence length, reduce overlap, or increase transient-preserving pre-processing.
  • Metallic/phasey sound: lower overlap or adjust seek window to improve alignment.
  • Vocals sounding unnatural after pitch shift: add formant correction or increase sequence length.
  • High CPU usage: lower overlap and increase sequence length slightly; or use faster build/optimizations.

When to choose other algorithms

For extreme transformations, highest-fidelity mastering, or pitch-accurate musical transposition, consider alternatives:

  • Phase vocoder / hybrid algorithms for smoother time-frequency control.
  • Granular synthesis for creative effects.
  • Dedicated commercial pitch-correction tools with formant modeling for vocals.

Summary (quick reference)

  • Balanced music: sequence 50–80 ms, overlap 8–16 ms.
  • Percussive: sequence 20–40 ms, overlap 6–12 ms.
  • Vocals: sequence 70–120 ms, overlap 12–24 ms.
  • Keep search/seek windows moderate and test changes incrementally.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *