SongGen: AI-Powered Song Co-Creation Framework
Aug 31, 2024
Live demo showing SongGen's interactive song creation process, featuring real-time genre changes, instrumental editing, and lyric refinement
SongGen: AI-Powered Song Co-Creation Framework
SongGen is an AI-based songwriting and co-creation framework I developed as part of my MEng thesis at MIT. Unlike existing AI music tools, SongGen features a chat interface with a trained AI songwriter assistant that can make granular song edits andemulates the natural back-and-forth of human collaboration.
Full thesis here: SongGen: AI-Powered Song Co-Creation Framework
The Problem with Current AI Music Tools
While tools like Suno.ai have made song creation accessible to everyone, they often fall short in terms of user interaction and creative control. Users typically input a description and get a complete song, but there's no way to iteratively refine or guide the generation process. This creates a disconnect between the user's vision and the final output.
How SongGen Works
SongGen addresses these limitations through several changes:
-
Interactive Chat Interface: Powered by GPT-4, the system engages users in natural conversation to understand their musical vision and preferences.
-
Modular Architecture: The system combines:
- An AI songwriter assistant for processing user input
- A specialized lyrics generation model
- Integration with Suno.ai's audio generation capabilities
- A robust frontend for real-time interaction
-
Granular Control: Users can:
- Generate and edit lyrics section by section
- Modify individual instruments and existing genres
- Provide feedback and see immediate changes
- Guide the overall direction of the song
-
Advanced Audio Editing: The system integrates multiple specialized models:
- Instrument separation models to isolate and modify individual tracks
- Genre transformation models to adapt existing songs to different styles
- Real-time audio processing for immediate feedback
- Custom audio editing pipelines for precise control over musical elements
Technical Implementation
The system uses the following components:
- The OpenAI Chat Completions API handles user interaction and coordinates different components
- A custom songwriting assistant class manages eight core functions for lyrics and audio generation
- The system maintains state throughout the conversation to ensure consistency
- Robust error handling ensures reliable operation even with complex requests
- Integration with external audio models through a modular API architecture
Results and Impact
Our human evaluations showed that SongGen significantly outperformed Suno.ai in:
- Steerability: Users have precise control over the generation process
- Expressiveness: The system better captures user's artistic vision
- Personalization: Songs reflect user preferences more accurately
- User satisfaction: Higher engagement and creative fulfillment
Future Directions
The framework is designed to be extensible, with planned features including:
- Voice-based interaction
- Real-time voice conversion
- Artist emulation for on-demand song generation
- Enhanced personalization capabilities
Just one step towards bridging the gap between human creativity and AI capabilities.