Ajay Arora

Live demo showing SongGen's interactive song creation process, featuring real-time genre changes, instrumental editing, and lyric refinement

SongGen: AI-Powered Song Co-Creation Framework

SongGen is an AI-based songwriting and co-creation framework I developed as part of my MEng thesis at MIT. Unlike existing AI music tools, SongGen features a chat interface with a trained AI songwriter assistant that can make granular song edits andemulates the natural back-and-forth of human collaboration.

Full thesis here: SongGen: AI-Powered Song Co-Creation Framework

The Problem with Current AI Music Tools

While tools like Suno.ai have made song creation accessible to everyone, they often fall short in terms of user interaction and creative control. Users typically input a description and get a complete song, but there's no way to iteratively refine or guide the generation process. This creates a disconnect between the user's vision and the final output.

How SongGen Works

SongGen addresses these limitations through several changes:

Interactive Chat Interface: Powered by GPT-4, the system engages users in natural conversation to understand their musical vision and preferences.
Modular Architecture: The system combines:
- An AI songwriter assistant for processing user input
- A specialized lyrics generation model
- Integration with Suno.ai's audio generation capabilities
- A robust frontend for real-time interaction
into a plug and play framework that can be used with any audio and lyric generation model.
Granular Control: Users can:
- Generate and edit lyrics section by section
- Modify individual instruments and existing genres via tags
- Provide feedback and see immediate changes
- Guide the overall direction of the song
Advanced Audio Editing: The system integrates multiple specialized models:
- Instrument separation models to isolate and modify individual tracks
- Real-time audio processing for immediate feedback and voice conversion

Technical Implementation

The system uses the following components:

The OpenAI Chat Completions API handles user interaction and coordinates different components
A custom songwriting assistant class manages eight core functions for lyrics and audio generation
The system maintains state throughout the conversation to ensure consistency
Robust error handling ensures reliable operation even with complex requests
Integration with external audio models through a modular API architecture

Results and Impact

Our human evaluations showed that SongGen significantly outperformed Suno.ai in:

Steerability: Users have precise control over the generation process
Expressiveness: The system better captures user's artistic vision
Personalization: Songs reflect user preferences more accurately
User satisfaction: Higher engagement and creative fulfillment

Future Directions

The framework is designed to be extensible, with planned features including:

Voice-based interaction
Artist emulation for on-demand song generation
Enhanced personalization capabilities

Just one step towards bridging the gap between human creativity and AI capabilities.