Stem Splitter

project-photo
  • Overview :

    Stem Splitter is a web application that allows users to upload .mp3 or .wav files and automatically split them into two or four separate audio stems using the Demucs AI model. Once processed, the application delivers downloadable links to the separated stems, hosted securely on AWS S3.

  • My Role :

    Software Engineer

  • Tools Used :

    Python, Flask, Demucs, Next.js, TypeScript, WaveSurfer.js, AWS (S3), Google Cloud Platform (GCP), Vercel Blob Storage

Overview

System Architecture:

  1. 1.) Audio files are first uploaded to Vercel Blob Storage, then downloaded to a local VM for processing.

  2. 2.) Using Python and CLI-based commands, the Demucs model is triggered with either a 2-stem or 4-stem configuration.

  3. 3.) Once the stems are generated, they're uploaded to AWS S3, and the resulting links are sent back to the frontend.

  4. 4.) Users upload audio files via a clean UI. Files are temporarily stored on Vercel Blob Storage with auto-deletion set for 24 hours.

  5. 5.) The frontend then sends the file URL and selected stem mode to the backend via a POST request.

  6. 6.) Within 2 to 3 minutes, users receive links to the processed stems.

  7. 7.) Playback is enabled with interactive waveforms powered by WaveSurfer.js.

auth-diagram

Key Challenges

  • 1. Running ML Models on a Budget:

    One of the biggest hurdles in this project was finding an affordable and scalable way to run machine learning models. The Demucs model requires a minimum of 8GB of RAM to operate effectively, which immediately ruled out many common serverless solutions like AWS Lambda, which caps memory at 6GB. To overcome this, I turned to Google Cloud Functions, which not only supports higher memory allocations but also allows you to run serverless tasks without maintaining an always-on virtual machine—keeping costs low while meeting the technical requirements.

  • 2. Handling Large File Uploads and Long Processing Times:

    Another significant challenge involved file handling and processing time limitations on the frontend. Vercel’s default file upload limit is just 4MB, which isn’t suitable for audio files. To address this, I integrated Vercel Blob Storage, which allows for larger file uploads and temporary storage. However, once the file was uploaded, I ran into another issue: Vercel’s request timeouts. Since splitting audio with Demucs can take up to 3 minutes, the requests were failing before the backend could complete its work. The solution was to deploy the frontend on Google Cloud, which supports longer request durations and offers a smoother integration with the backend, all while maintaining low operational costs.