Stability AI unveils a new audio model capable of generating 6-minute songs

6 Min Read Last Updated May 21, 2026

Written by Lucas Ropek

Inside This Article

A new era of AI‑composed songs
Four models for different use cases
From open weights to enterprise‑grade access
“Generation is the beginning, not the end”
From mobile sketches to studio‑ready tracks
Launching into a contentious AI music landscape
New leadership and a push toward professionals
Creative potential and open questions

Stability AI has unveiled a powerful new audio model capable of generating full-length songs of more than six minutes from just a text prompt, marking a significant leap forward in AI-assisted music creation. The launch positions the company, best known for its image generator Stable Diffusion, as a serious contender in the fast‑moving world of AI music tools.

A new era of AI‑composed songs

The new release, grouped under the “Stable Audio 3.0” banner, is designed to produce what Stability describes as “full tracks with complex, dynamic musical structure up to six minutes in length.” This goes far beyond the short loops and clips typical of earlier music models, allowing users to create full songs complete with intros, builds and outros.

Stability AI stresses that these models are trained on fully licensed material and are intended to be safe for commercial use, a key point as AI audio tools face increasing scrutiny from artists, labels and regulators. The company presents Stable Audio 3.0 as “a model family trained on fully licensed data, designed to be the foundation for what the audio community builds next.”

Four models for different use cases

Under the Stable Audio 3.0 umbrella, Stability AI is rolling out four distinct models tailored to different hardware and creative needs. At the compact end, there is a small sound‑effects model and a small music model, each with 459 million parameters, designed to run directly on consumer‑grade devices such as phones and laptops. These versions can generate shorter music and SFX clips locally, making them attractive for mobile creators, indie developers and game studios.

For more ambitious projects, the company is offering medium and large variants with 1.4 billion and 2.7 billion parameters respectively. These more powerful models can generate songs lasting up to around six minutes and 20 seconds while maintaining musical coherence and melodic development across the entire track. They are aimed at creators who want long‑form compositions with richer arrangements and more nuanced sonic textures.

From open weights to enterprise‑grade access

One of the most notable aspects of the announcement is the way Stability AI is choosing to distribute these models. The small and medium models are being released as open weights, meaning developers and researchers can download, inspect and fine‑tune them for their own applications. The company frames this move as an “open invitation to experiment with generative audio,” encouraging the broader community to “see what’s under the hood” and build their own tools on top of the technology.

The largest, most capable model will not be freely downloadable. Instead, it will be accessible via paid API and self‑hosted enterprise deployments. Stability is targeting businesses that need to generate audio at scale, promising high performance and lower latency for platforms that may want to create large volumes of music or sound design on demand. Companies above a certain revenue threshold will need a dedicated enterprise license, underscoring that this tier is aimed squarely at commercial customers.

“Generation is the beginning, not the end”

In its messaging, Stability AI goes out of its way to stress that Stable Audio 3.0 is a professional‑grade creative toolkit rather than a novelty demo. On its product site, the company organizes its promise to users around three pillars: “Built open,” “Built to customize,” and “Built to own.” The emphasis is on control, flexibility and legal clarity for anyone integrating the models into their workflows.

“Generation is the beginning of the process, not the end,” the company says, describing how creators will be able to modify specific sections within an AI‑generated track, extend or shorten passages, and effectively treat the output as raw material for further production work. Stability also highlights what it calls “artist‑first controllability,” saying the system is designed to follow prompts closely so producers can specify genre, mood and instrumentation and receive outputs that align with their creative intent.

From mobile sketches to studio‑ready tracks

By offering multiple model sizes, Stability AI is clearly aiming to cover the entire spectrum of music‑making, from rough sketches to polished, release‑ready songs. The smaller models, which can run on consumer devices, are ideal for quick idea generation: a beat for a social video, a looping ambient bed for a game level, or a short jingle for a product demo.

The medium and large models move into more demanding territory, where a track needs to hold together over several minutes and support layered instrumentation, transitions and evolving arrangements. Stability presents the medium model as a fully open, full‑song engine suitable for independent studios that want to host and customize it on their own infrastructure. The large model, accessed through managed services, is pitched as the flagship choice for platforms and enterprises that require reliability, scale and commercial support.

Launching into a contentious AI music landscape

Stable Audio 3.0 is arriving at a time when AI music technology is advancing rapidly, but also attracting lawsuits and intense debate. Several high‑profile AI music startups have been challenged over how they sourced training data and whether their models unlawfully imitate copyrighted recordings. Those disputes have put a spotlight on licensing practices across the entire sector.

Stability AI is attempting to set itself apart by foregrounding the provenance of its training data. The company says the new models are built on “fully licensed datasets” and points to ongoing work with major labels and music companies as it develops new tools. That positioning is designed to reassure artists, rights holders and corporate clients that they can experiment with AI‑generated music without stepping straight into legal gray zones.

New leadership and a push toward professionals

The launch also aligns with a broader internal shift at Stability AI toward professional audio and the music business. The company has brought in experienced industry figures to help shape its offering for working musicians, producers and studios, signaling that it wants Stable Audio 3.0 to sit inside real production workflows rather than exist solely as a tech showcase.

This mirrors a wider trend in the AI music space, where tools that began as experimental web apps are now being recast as serious platforms for labels, publishers and streaming services. By combining open models for the developer community with an enterprise‑grade flagship system, Stability AI is trying to straddle both worlds at once.

Creative potential and open questions

For creators, the promise of typing in a description like “a 1980s summertime acid jazz track with vintage synths and funky psychedelic leads” and receiving a fully formed, six‑minute song is both exciting and unsettling. Stable Audio 3.0 can generate everything from club‑ready drum and bass and festival‑scale techno to lo‑fi beats and cinematic soundscapes, giving artists and producers an expansive palette to prototype ideas or even finish tracks.

But as AI systems cross the threshold from short clips to entire compositions, familiar questions become more urgent: who owns the resulting music, how should royalties be handled, and what does this mean for human musicians trying to earn a living? Stable Audio 3.0 does not answer those questions on its own, but its capabilities ensure they will be increasingly difficult for the music industry to ignore.

Comments

Explore More

Create, Plan & Grow Your Social Media!

SmartPostly is your AI-powered social media assistant. Generate captions, find hashtags, plan your content — in seconds.