TLDRs;
- Alibaba’s Qwen3-Max-Thinking achieved perfect scores in AIME and HMMT, marking China’s first flawless AI math performance.
- OpenAI’s GPT-5 Pro also self-reported perfect results, setting up a new East–West rivalry in reasoning AI.
- Verification concerns linger, as Alibaba’s results lack third-party validation or evidence of closed-book testing.
- API access opens doors for developers and investors, with potential cost-performance advantages across Asia-Pacific markets.
Alibaba’s artificial intelligence division has unveiled Qwen3-Max-Thinking, an advanced reasoning model that stunned observers by scoring a perfect 100% in two of the world’s toughest mathematics competitions, the American Invitational Mathematics Examination (AIME) and the Harvard-MIT Mathematics Tournament (HMMT).
This marks a significant milestone for China’s AI industry. It is reportedly the first time a Chinese-developed model has matched or exceeded Western benchmarks in reasoning-heavy academic tests.
The announcement places Alibaba’s AI efforts shoulder-to-shoulder with OpenAI’s GPT-5 Pro, which also self-reported flawless results in the same contests earlier this year.
A Leap for China’s AI Ambitions
According to Alibaba, Qwen3-Max-Thinking is built atop Qwen3-Max, the company’s largest AI model boasting over one trillion parameters. Released in late September, the Qwen3-Max architecture represents Alibaba’s boldest step toward creating general-purpose reasoning models that can compete globally in complex problem-solving tasks.
The math victories are symbolic as much as technical. For years, elite competitions like the AIME and HMMT have been used as unofficial benchmarks for evaluating the reasoning depth and abstract thinking capacity of large language models (LLMs). Perfect accuracy in such events signals that Qwen3-Max-Thinking is closing the performance gap with Western-developed systems.
However, questions remain about transparency and verification. Alibaba’s claims, while headline-grabbing, lack third-party confirmation. Neither the AIME nor HMMT maintains public leaderboards for AI models, and no independent audit has yet verified whether the results were achieved under closed-book, internet-free conditions, a crucial factor in determining authenticity.
Verification Gaps Raise Skepticism
Despite the celebration, experts have urged caution. The absence of public verification means it is unclear whether Qwen3-Max-Thinking truly achieved 100% accuracy under standardized conditions.
Unverified results have become a recurring issue in AI benchmarking, as companies race to claim superiority in domains like reasoning, coding, and mathematics.
Further complicating the picture, details remain murky on whether the 2025 versions of the contest problems were used or if the AI had prior exposure to similar data during training. Without contamination controls, safeguards ensuring the model hadn’t seen test data before, perfect scores are difficult to validate.
While Alibaba’s announcement has sparked excitement, critics warn that without reproducibility, the victory could remain symbolic rather than scientific.
Developers and Investors Eye API Potential
Beyond benchmark bragging rights, Alibaba’s AI strategy has real commercial implications. The company recently opened API access to Qwen3-Max-Thinking, inviting developers to test its reasoning capabilities in real-world applications.
For software and data teams, this introduces new possibilities for cost-performance routing, dynamically choosing between AI providers based on pricing, accuracy, or latency. Developers in the Asia-Pacific region, particularly those seeking local AI infrastructure options, may find Qwen’s ecosystem attractive if it offers competitive pricing and reliable regional support beyond Singapore.
Investors are also watching closely. If Qwen3-Max-Thinking can handle complex reasoning tasks while maintaining affordability, Alibaba could carve out a niche among enterprise developers and AI startups looking for alternatives to U.S. providers. The success of such models could signal a new balance in global AI infrastructure, where Chinese models rival or even outperform Western ones in specific tasks.


