Alibaba Launches Qwen-Image-Edit With Text-Based AI Image Editing

Alibaba’s Qwen research group has released Qwen-Image-Edit, an open-source model designed to carry out image edits through text prompts. The system expands the Qwen-Image foundation model, a 20-billion parameter network introduced earlier in the month, and adapts its strengths in text rendering to cover a wider range of editing tasks.

Dual Encoding Design

At the core of the system is a dual pipeline. One part, Qwen2.5-VL, interprets the meaning of a scene, while another, a variational autoencoder, reconstructs its visual detail. This arrangement allows the model to handle two levels of control. Semantic edits can restructure or restyle an image, while appearance edits focus on local, precise changes.

The Qwen team describes semantic edits as higher-level modifications. A portrait can be reimagined in a Studio Ghibli style. A street scene can be reskinned to resemble a Lego model. Objects can be rotated to show angles not visible in the original, including full 180-degree views. These are broad transformations that shift the scene but keep its identity intact.

Appearance edits, on the other hand, address smaller details. A strand of hair can be erased, a single letter recolored, or a signboard added with a reflection generated in the water beside it. These edits leave most of the original unchanged, touching only the requested regions.

Text Editing Features

Another significant part of the model is its ability to work with text in images. It can add, remove, or correct writing in both English and Chinese. The edits preserve font, size, and layout. This feature has been used on posters, signage, and calligraphy.

In one demonstration, errors in a generated calligraphy piece were corrected step by step. The model adjusted characters through bounding-box instructions until the final version matched the intended classical form. This chained editing method gives users control over high-stakes cases where accuracy cannot be compromised.

Range of Uses

The demonstrations published so far cover both creative and practical applications. In one case, Qwen-Image-Edit was used to refine a wedding photograph, adding graffiti to an archway for one version and removing it for another. In another, the system generated a series of MBTI-themed emoji packs based on Qwen’s capybara mascot. A different example showed how the model could reskin a Manhattan cityscape to resemble a miniature Lego set.

Potential uses extend from advertising and design to casual personal edits. Designers could adjust logos or signage, while individuals could change backgrounds, modify clothing, or clean up portraits. The team has also pointed to cultural preservation, where the model has been applied to correcting classical Chinese calligraphy for archiving purposes.

Benchmarks and Performance

The developers report that Qwen-Image-Edit delivers state-of-the-art performance across public benchmarks. Specific scores for editing tasks were not released, but Qwen-Image itself has ranked among the strongest systems for image generation and text rendering in independent evaluations such as AI Arena. Human raters in those tests favored its outputs in many cases against competing models.

Access and Licensing

The model is available under an Apache 2.0 license. Developers can download and run it locally, deploy it on cloud infrastructure, or integrate it into applications. Access is also possible through Qwen Chat, Hugging Face, ModelScope, and GitHub.

For enterprises, Alibaba Cloud provides an API through its Model Studio platform. The service is priced at $0.045 per image after an initial quota of 100 free images valid for 180 days. The current deployment is in the Singapore region, with five requests per second allowed and two concurrent tasks per account. Supported image resolutions range from 512 to 4,096 pixels, with file sizes up to 10 MB. Outputs are stored temporarily on Alibaba Cloud Object Storage for download.

Industry Context

Qwen-Image-Edit reflects a broader shift in generative AI tools. Early systems were focused on single-purpose generation. Newer releases, including this one, are moving toward a combination of creation and correction, allowing more practical use in production settings.

By offering fine-grained editing, bilingual text support, and open licensing, the system lowers barriers for professional users while also remaining approachable for casual experimentation. Whether for advertising design, cultural preservation, or individual photo edits, the tool adds another option in a growing field of AI-driven image software.

Notes: This post was created using GenAI tools.

Read next: Poll: Most Americans Fear AI’s Impact on Politics, Jobs

Byadmin

Dual Encoding Design

Text Editing Features

Range of Uses

Benchmarks and Performance

Access and Licensing

Industry Context

Related

By admin

Related Post

Pixel 10 launch live – last-minute rumors ahead of today’s Made by Google event

Bug in Windows 11 update reportedly breaks some SSDs – and Microsoft says it’s now ‘investigating’

UK firms at at risk of more cyber incidents – here’s how to stay protected

You missed

Pakistan NCCIA Cracks Down on Illegal Betting, Forex, Binary Apps

Mobilink Bank Modernizes to Latest Cloud Native Temenos Core Banking with Systems Limited

Pep Guardiola reveals his DAUGHTER told him to shave off his moustache as Man City boss opens up on being ‘overwhelmed’ by Oasis reunion

Arsenal are dealt ‘MAJOR injury blow’ in attack less than a week into the new season – and ‘could be forced back into the transfer market’