Ajay Arora

SongGen: AI-Powered Song Co-Creation Framework

Aug 31, 2024

Live demo showing SongGen's interactive song creation process, featuring real-time genre changes, instrumental editing, and lyric refinement

SongGen: AI-Powered Song Co-Creation Framework

SongGen is an AI-based songwriting and co-creation framework I developed as part of my MEng thesis at MIT. Unlike existing AI music tools, SongGen features a chat interface with a trained AI songwriter assistant that can make granular song edits andemulates the natural back-and-forth of human collaboration.

Full thesis here: SongGen: AI-Powered Song Co-Creation Framework

The Problem with Current AI Music Tools

While tools like Suno.ai have made song creation accessible to everyone, they often fall short in terms of user interaction and creative control. Users typically input a description and get a complete song, but there's no way to iteratively refine or guide the generation process. This creates a disconnect between the user's vision and the final output.

How SongGen Works

SongGen addresses these limitations through several changes:

  1. Interactive Chat Interface: Powered by GPT-4, the system engages users in natural conversation to understand their musical vision and preferences.

  2. Modular Architecture: The system combines:

    • An AI songwriter assistant for processing user input
    • A specialized lyrics generation model
    • Integration with Suno.ai's audio generation capabilities
    • A robust frontend for real-time interaction
  3. Granular Control: Users can:

    • Generate and edit lyrics section by section
    • Modify individual instruments and existing genres
    • Provide feedback and see immediate changes
    • Guide the overall direction of the song
  4. Advanced Audio Editing: The system integrates multiple specialized models:

    • Instrument separation models to isolate and modify individual tracks
    • Genre transformation models to adapt existing songs to different styles
    • Real-time audio processing for immediate feedback
    • Custom audio editing pipelines for precise control over musical elements

Technical Implementation

The system uses the following components:

  • The OpenAI Chat Completions API handles user interaction and coordinates different components
  • A custom songwriting assistant class manages eight core functions for lyrics and audio generation
  • The system maintains state throughout the conversation to ensure consistency
  • Robust error handling ensures reliable operation even with complex requests
  • Integration with external audio models through a modular API architecture

Results and Impact

Our human evaluations showed that SongGen significantly outperformed Suno.ai in:

  • Steerability: Users have precise control over the generation process
  • Expressiveness: The system better captures user's artistic vision
  • Personalization: Songs reflect user preferences more accurately
  • User satisfaction: Higher engagement and creative fulfillment

Future Directions

The framework is designed to be extensible, with planned features including:

  • Voice-based interaction
  • Real-time voice conversion
  • Artist emulation for on-demand song generation
  • Enhanced personalization capabilities

Just one step towards bridging the gap between human creativity and AI capabilities.