- Published on
MiniMax-M3 debuts with majorベンチマーク performance at just 5 to 10% cost GPT-5.5 and Gemini 3.1 Go above Pro.
- Authors

- Name
- aimode.news
- @aimode_news
Great news in the area of corporate AI was announced on weekends when the Chinese AI startup MiniMax published its highly anticipated M3 large language model on Sunday night, combining border animal coding and agent performance with a 1-million token context window and native multimodality to a fraction of the cost of leading proprietary models, with prices starting in the context of its new month token plans only at US$20. The management of the company also announced plans to provide the model over the next ten days under an open source license, including “Open Weights”, which allows a complete company download and free customization. Currently it is about the MiniMaxAPI at a special price of $0.3 per 1 million input tokens and $1.20 per million output tokens (new cache) available for the next week and thus beats proprietary US giants like Google, OpenAI and Anthropic has a good value for money, but also exceeds the performance of the latest models of the two predecessors at selected benchmarks. Even at its full price of $0.6/2.40 per million input/output tokens, MiniMax-M3 remains at only 8–20 % of the cost of leading proprietary US models. The traditional matrix, which regulates the development of large language models, has long been dictating a rigid choice: software developers can either access top-notch closed-source intelligence behind restrictive APIs or use flexible, cost-effective open models that get stuck in multi-stage thinking, dense coding tasks and huge data sequences. MiniMax-M3 places this paradigm fundamentally on the head. By unifying these two historically separate boundary functions, M3 introduces a measure of comprehensive benefit which was previously limited to expensive, closed ecosystems, whereby the base line of open systems is effectively shifted and at the same time the operational computing requirement required for the execution of complex development loops is drastically minimized. Price overview of the VentureBeat Frontier AI Model API
Model | Input | Output | Total cost | Source |
MiMo-V2.5 Flash | 0.10 $ | 0.30 $ | 0.40 $ |
deep-sea-v4-flash | 0.14 $ | 0.28 $ | 0.42 $ |
deep seak v4-pro | 0.435 $ | 0.87 $ | 1,305 $ |
MiniMax-M3 | 0.30 $ | 1,20 $ | 1,50 $ (only for limited time) |
Gemini 3.1 Flash-Lite | 0,25 $ | 1,50 $ | 1,75 $ | |
MiMo-V2.5 | 0.40 $ | 2,00 $ | 2,40 $ |
Grok 4.3 low context | 1,25 $ | 2,50 $ | 3,75 $ | |
GLM-5 | 1,00 $ | 3,20 $ | 4,20 $ | |
Kimi-K2.6 | 0.95 $ | 4,00 $ | 4,95 $ |
GLM-5.1 | 1,40 $ | 4,40 $ | 5,80 $ | |
Grok 4.3 High Context | 2,50 $ | 5,00 $ | 7,50 $ | |
Qwen3.7-Max | 2,50 $ | 7,50 $ | 10,00 $ |
Gemini 3.5 Flash | 1,50 $ | 9,00 $ | 10,50 $ | |
Gemini 3.1 Pro Preview ≤200K | 2,00 $ | 12,00 $ | 14,00 $ |
GPT-5.4 | 2,50 $ | 15,00 $ | 17,50 $ |
Gemini 3.1 Pro Preview >200K | 4,00 $ | 18,00 $ | 22,00 $ |
Claude Opus 4.8 | 5,00 $ | 25,00 $ | 30,00 $ | |
GPT-5.5 | 5,00 $ | 30,00 $ | 35,00 $ |
The new MiniMax Sparse Attention (MSA) technology helps to keep the cost of the model low
The core of the efficiency of the model lies in an architectural departure from classical TransformerNetworks. Standard attention mechanisms scale square ($O(N^2)$)
, which means that the computational and financial expenditure explodes with increasing length of the text sinks. In order to combat this “inherent error”, the engineering team MiniMax Sparse Attention (MSA), implements a clean, expandable saving attitude plan. To illustrate this innovation, imagine the traditional full attention as a editor who reads a whole library of ground every time he has to check a single sentence. MSA acts as an intelligent indexing employee and uses a pre-filtering phase to divide key value matrices (KV) into high-precision blocks. At operator level, MSA uses a “KV Outer Gather Q” approach. The system treats KV blocks as an outer loop and dynamically only aggregates the specific queries they meet. Since each data block is read exactly once and the memory access remains strictly coherent, the hardware utilisation rises abruptly. In internal tests, MSA is running more than four times faster than alternative open source solutions such as flash space settings or flash moba. When managing a maximum context length of 1 million tokens, the computing requirement of M3 per token decreases to only 1/20 of the previous generation model, which leads to a 9-fold acceleration in the prefill phase and a 15-fold increase during decoding. Instead of taking a pre-trained text network and merging it with a separate vision model, MiniMax M3 has developed as a native multimodal system of “Step Zero”. The company has redesigned its data acquisition machine to mix naturally nested sequences from text, images and visual components and thus scale the entire pre-training body to over 100 trillion tokens. This deep data orientation allows the model to translate complex visual geometries such as programming diagrams or coordinate maps into structural code without losing contextual fidelity. Using standardized evaluations, M3 validates this technical path. The model has a value of 59.0% for SWE-Bench Pro, a metric for autonomous agents, and thus lies ahead of closed models such as GPT-5.5 and Gemini 3.1 Pro. It reaches 66.0% for Terminal Bench 2.1, 74.2% MCP Atlas and 83.5 at BrowseComp and exceeds the benchmark score of Claude Opus 4.7 from 79.3 for autonomous surfing and information retrieval. Compared to Anthropics' newly released premium frontier model Claude Opus 4.8 from last week, however, the competitive edge of the efficient savings-attention presence of M3 becomes clear with directly comparable, tool-intensive agent benchmarks. In the field of pure code modification on SWE-Bench Pro, the M3 value of 59.0% falls below the peak value of 69.2% of Opus 4.8. A similar performance Delta is shown in automated system environments via Terminal-Bench 2.1; While the M3 terminal execution score of 66.0% can effectively withstand the Opus 4.7 base value of the previous generation of 66.1%, it is behind the updated Opus 4.8 architecture, which reaches 74.6%. Furthermore, evaluations that follow the continuous GUI interaction on the OSWorld-verified sandbox determine the automated computer usage from M3 to 70.0%, compared to a higher validation rate of 84.4%, which is secured by Opus 4.8. These standardized evaluations illustrate the structural compromises that currently define the ecosystem: Closed source systems such as Opus 4.8 retain absolute margin advantages in hypercomplex argumentation vectors, but M3 provides a high-performance basis for local automated Tier 1 operation without the additional premium of API subscription fees for closed doors. When M3 is positioned next to the powerful inference metrics of the newly developed Open-Weight model DeepSeek-V4 Pro Max, it is maintained in all core agencies categories and at the same time maintains close benefits in specialized code synthesis. In the software engineering matrix of SWE-Bench Pro, the resolution efficiency of M3 with 59.0% is clearly above the value of DeepSeek-V4 Pro Max of 55,4 %. In command line environments, however, competition pressure is increased. For the Bench Terminal, DeepSeek-V4 Pro Max is slightly above the 66.0% mark of M3 with an accuracy of 67.9%. In web orchestration and open-world browsing simulations, the two architectures practically achieve a statistical parity, with M3 having a value of 83.5 % for BrowseComp, compared with 83.4 % for DeepSeek. With 74.2% MCP Atlas Tool-Use Framework, M3 also secures a close advantage over 73.6% of DeepSeek. This close tuning shows that DeepSeek, with special high-effort argumentation modes, manages a huge parameter footprint of a total of 1.6 trillion, but the block-filtered Sparse-Attention Mechanism from MiniMax leads to directly competitive execution efficiencies without requiring extensive parameter activation scaling. The MiniMax Code AI agent offers Agentic Team functions
MiniMax converts these architectural advantages into immediate benefits through an updated product suite divided into independent applications, customizable subscription levels and a pure developer infrastructure. For end-user orchestration, MiniMax Code is the flagship implementation, an AI agent product to maximize M3's multi-level capabilities. MiniMax Code works via the web or native desktop apps and operates a “agency team” that is able to divide extensive technical tasks into multi-stage, simultaneous workflows. The system is based on an opposing cable harness loop “Producer + Verifizierer”. While an agent instance code generates, tests and reflects a secondary verifier dance aggressively, so that the network can correct itself and work autonomously for days without human supervision. Due to its native visual base, MiniMax Code supports direct computer usage. A developer can output a cross-application voice request via his telephone so that the model opens a localized company ERP client and fills data tables directly from an opened Excel table in stacks. For custom setups, developers can integrate M3 directly into existing workflows using an API key (sk-cp).
) compatible with common alternative IDE environments such as Claude Code, Cursor, Roo Code and Cline. The API introduces a switchable “denk mode”. When M3 is activated, M3 forwards the computing power for profound thinking and long-term planning. If it is disabled, the model runs with minimal latency for a fast text completion. The accompanying Token plan modelled an aggressive pricing strategy based on common multimodal quotas. Three options are available for annual billing:
Plus (20 $/month): Provides approximately 1.7 billion tokens per month and manages 3–4 simultaneous agents. Maximum (50 $/month): Provides approximately 5.1 billion tokens per month, manages 4–5 simultaneous agents and adds 3 automated video clips per day via Hailuo 2.3. Ultra (120 $/month): Provides approximately 9.8 billion tokens per month, allows 6–7 simultaneous agents and expands video capacity to 5 daily clips. Open weights make M3 much more attractive for corporate use
to publish the promise of MinMax, M3 as part of an Open-Weights license model – weightings and technical documentation on HuggingFace and GitHub - has considerable strategic importance for business infrastructure managers. However, it is still necessary to specify exactly under which license the weights will be available and whether they are available for use by consumers, e.g. MIT, Apache 2.0 or the new OpenMDW license. If so, the bill looks like this:
Feature/Model attribute | Closed API providers (e.g. GPT-5.5, Opus 4.7) | Open Weights Frontier (MiniMax M3) |
Data protection and borders | Requires external API requests; potential data acquisition vectors. | Complete local isolation; runs completely in private user clusters. |
Custom optimization | Limited to basic fine tuning wrapper or fast engineering. | Full pipeline control; The architecture allows a comprehensive adaptation of adapters/weights. |
Cost vector consistency | bound to unlimited API price models per token. | Calculator effort reduced to 1/20; mitigates the hardware height. |
By direct shipping of the underlying model weights to the community, MiniMax deviates from the approach preferred by large American AI laboratories, excluding the public. For business users subject to strict compliance and data protection rules, open weightings mean that they can execute M3 locally on internal hardware. This device completely eliminates the risk of data leaks associated with public APIs. In addition, engineer teams allow custom fine tuning steps to be taken, modify internal architectures, or embed specific system requirements deeply into the model's next – and thus transform a standard system into an extremely targeted proprietary asset. The first reactions of the community are consistently positive
The developer ecosystem immediately responded to the operational benchmarks of M3 and highlighted its long-term autonomous behavior and cost performance profile. A main focus of the discussion is a 12-hour automated verification test, where M3 with the reproduction of a winner of the ICLR 2025 Outstanding Paper Award with the title “Learning Dynamics of LLM Finetuning was commissioned. How MiniMax’s own researchers @MikaStars39 highlighted X:
“M3 ran autonomously for almost 12 hours, independently produced 18 Commits and 23 experimental figures and brought the core experiments to run:
it voted with the predicted probability trends in the SFT- level
could clearly observe the central squeeze effect for DPO experiments
validated the extend reduction method proposed in the original paper.’;
At the same time, developers of developer tools stressed the practical economic advantages of the new model’s attention mechanism. The official team behind the Agent-KI-Coding system Cline released a warning that confirmed compatibility from the first day, and explained:
“The new MiniMax-M3 is its first model with a 1-million context, multimodal and agent coding function. Congratulations to @MiniMax AI for the breakthrough in the Sparse-Attention architecture, which reduces computing power and costs to a twentieth of its predecessor generation.”
This sharp decline in execution costs changes the developer's view of the relationship between financial investment and performance. The technical commentator @jumperz explained this disorder and pointed out how M3 breaks a historical pattern in the pricing of machine learning:
By eliminating limitations in context scaling through basic optimizations of the attention level instead of Brute Force hardware scaling, MiniMax has created a highly efficient open source baseline. M3 shows that the next stage of agent development is driven forward not only by larger data sets, but also by efficient architectural decisions that make it accessible to the wider open source community at a border level. For companies that build an autonomous software development or agent infrastructure, MiniMax M3 offers the ultimate value for money. While DeepSeek-V4 Pro offers a microscopic price advantage of US$0.195 per million tokens, MiniMax M3 justifies its minor surcharge by providing superior autonomous software engineering resolution rates (59.0% SWE-Bench Pro). It is even more important that the calculation goes far beyond the API diagram, as M3 is an open weighting model. Through the local deployment of M3 weights in private enterprise cloud, companies handle the pursuit of the cloud data output completely, eliminate the structural provider commitment and can implement custom prefix caching models on internal hardware. This technical approach transforms a high-efficiency run-time budget into a sustainable, private corporate asset.
