Hey Dad, Can We Get Suno? No, We Have a 70B Music Gen Model At Home (This Is an AI Submission)
By FrogCity on April 12, 2026 6:04 pm
Not my usual slop, artificial slop in this case.
I am aware uploading aislop is against the rules, but in this case I thought it would be a fun reflection of this week since I spent a lot of time working on the system that made this, mostly an exercise to get up to speed on systems that leverage multiple pretrained models and tokens all blending together. In the spirit of the rules of human creativity, I think I met that with this since it was not a 10 second "make a metal song" and swiftly uploading the output. It was much debugging, trial and error, learnings, false starts, AND THEN IT was a "make a metal song" which I swiftly uploaded.
My motivation is not "this that this that" -> produced music -> "oh more of this actually" -> produced music but more of a system of rich intermediate representations that can be converted to human intermediate representations such as transcription, midi, chord progressions, lyrics, genres, style, etc. Something I can grow from as a human musician. But this was a necessary first step to better understand what can be leveraged.
Also one thing that shocked me was I always assumed music gen models were similar to language models in that there's a pretraining phase followed by Q&A RL, distribution loss where there's a subjective "many answers are right" way of steering the model. Which I thought music gen would have many learnings different from my speech enhancement background which is very much "one right answer" space for what the model should do. Much to my surprise, Sota in music gen is basically just the pretraining portion where you do fit it to the "one right answer" methodology and just do things like topn and temp to get different results out of the distribution at inference time. (this assumes autoregressive but diffusion has a similar concept just happens in the conditioning part) (like most things, flow matching one recent example, there's no reason that can't be brought into the music gen domain. Probably Suno has enough thumbs up and down feedback to already be doing it)
Once again I don't feel too bad about "not doing a decent musical wb this week" because I hit some other music progression. Played two live songs and had a great time. We crushed!
For one of the songs played bass live for the first time which was amazing and fun. So fucking funny to listen back to it and sections where "the band sounds pretty shitty", really it's just the bass is off and missed the first note of the chord progression. And sections where "damn this band has a good sound" the bass was locked in. Much different from an instrument like vocals or guitar where if you are dragging the band down, the finger is squarely pointed at you hahaha.
Also pulled off a full bandwidth scream of at least a good 8 seconds, 100% distortion, zero pain, great technique. It was the scream entering into the keyboard solo. Have made a lot of progress in vocals especially distortion. Things are starting to come together in terms of control, consistency, comfort and not losing the coordination under pressure and timing alignment to the band, etc. Tone is also really really improved and that one instance sounded quite good imo and got a number of comments and questions such as "omg that sounded amazing but are you killing your voice?" and "ah I really want to do that, can you teach me?" and made some new vocalist friends to progress on our distortions together.
I'd love to meet some vocal friends here too and collaborate. One of my goals is to collaborate this year, and I also only plan on doing 26 weeks this year (may go more but that's my goal) so the time is half up already.
Now to work on sidechaining in latent token space!
Audio works licensed by author under:
CC0 Creative Commons Zero (Public Domain)
