• China's DeepSeek has published a new research paper introducing a novel AI training method designed to scale models more easily.
  • The method, called Manifold-Constrained Hyper-Connections (mHC), allows models to share richer internal communication while preserving training stability and computational efficiency.
  • Analysts have described the approach as a 'striking breakthrough' that could minimize the extra cost of training while yielding much higher performance.
  • The release of the paper, co-authored by founder Liang Wenfeng, signals DeepSeek's internal capabilities and willingness to share important findings with the industry.

Quick Summary

China's DeepSeek has initiated 2026 with the publication of a new AI training method that industry analysts are calling a significant advancement for the sector. The research paper introduces a technique designed to scale large language models more effectively without the instability often associated with growing model sizes. By enabling models to share richer internal communication in a constrained manner, the method preserves training stability and computational efficiency.

The paper, co-authored by founder Liang Wenfeng, details a process dubbed Manifold-Constrained Hyper-Connections (mHC). This approach addresses the challenge of maintaining performance as models grow, a critical hurdle in current AI development. Analysts suggest this innovation could shape the evolution of foundational models and allow the company to bypass compute bottlenecks, potentially unlocking new leaps in intelligence.

The Technical Innovation: Manifold-Constrained Hyper-Connections

The Chinese AI startup published a research paper on Wednesday describing a method to train large language models that could shape "the evolution of foundational models." The paper introduces what DeepSeek calls Manifold-Constrained Hyper-Connections, or mHC, a training approach designed to scale models without them becoming unstable or breaking altogether.

As language models grow, researchers often try to improve performance by allowing different parts of a model to share more information internally. However, this increases the risk of the information becoming unstable. DeepSeek's latest research enables models to share richer internal communication in a constrained manner, preserving training stability and computational efficiency even as models scale.

By redesigning the training stack end-to-end, the company is signaling that it can pair rapid experimentation with highly unconventional research ideas. This technical feat is viewed by industry observers as a statement of DeepSeek's internal capabilities.

The approach is a 'striking breakthrough.'
Wei Sun, Principal Analyst for AI at Counterpoint Research

Industry Analysts React to the Breakthrough

Analysts have reacted positively to the publication, describing the approach as a "striking breakthrough." Wei Sun, the principal analyst for AI at Counterpoint Research, noted that DeepSeek combined various techniques to minimize the extra cost of training a model. She added that even with a slight increase in cost, the new training method could yield much higher performance.

Sun further stated that DeepSeek can "once again, bypass compute bottlenecks and unlock leaps in intelligence," referring to the company's "Sputnik moment" in January 2025. During that time, the company unveiled its R1 reasoning model, which shook the tech industry and the US stock market by matching top competitors at a fraction of the cost.

Lian Jye Su, the chief analyst at Omdia, told Business Insider that the published research could have a ripple effect across the industry, with rival AI labs developing their own versions of the approach. "The willingness to share important findings with the industry while continuing to deliver unique value through new models showcases a newfound confidence in the Chinese AI industry," Su said. He added that openness is embraced as "a strategic advantage and key differentiator."

Context: The Road to R2 and Market Position

The paper comes as DeepSeek is reportedly working toward the release of its next flagship model, R2, following an earlier postponement. R2, which had been expected in mid-2025, was delayed after Liang expressed dissatisfaction with the model's performance. The launch was also complicated by shortages of advanced AI chips, a constraint that has increasingly shaped how Chinese labs train and deploy frontier models.

While the paper does not mention R2, its timing has raised eyebrows. DeepSeek previously published foundational training research ahead of its R1 model launch. Su said DeepSeek's track record suggests the new architecture will "definitely be implemented in their new model."

However, Wei Sun is more cautious regarding the timeline. "There is most likely no standalone R2 coming," Sun said. Since DeepSeek has already integrated earlier R1 updates in its V3 model, she believes the technique could form the backbone of DeepSeek's V4 model instead. Despite these innovations, reports suggest that DeepSeek's updates to its R1 model failed to generate much traction in the tech industry, with distribution remaining a challenge compared to leading AI labs like OpenAI and Google, particularly in Western markets.

"Deepseek can 'once again, bypass compute bottlenecks and unlock leaps in intelligence.'"

Wei Sun, Principal Analyst for AI at Counterpoint Research

"The willingness to share important findings with the industry while continuing to deliver unique value through new models showcases a newfound confidence in the Chinese AI industry."

Lian Jye Su, Chief Analyst at Omdia

"Openness is embraced as 'a strategic advantage and key differentiator.'"

Lian Jye Su, Chief Analyst at Omdia

"There is most likely no standalone R2 coming."

Wei Sun, Principal Analyst for AI at Counterpoint Research

Frequently Asked Questions

What is DeepSeek's new AI training method?

DeepSeek introduced a method called Manifold-Constrained Hyper-Connections (mHC), designed to scale large language models more easily while maintaining stability and computational efficiency.

Why is this development considered a breakthrough?

Analysts describe it as a 'striking breakthrough' because it allows models to share richer internal communication without instability, potentially bypassing compute bottlenecks and yielding higher performance at minimal extra cost.

How does this relate to DeepSeek's upcoming R2 model?

While the paper was released as DeepSeek reportedly works on R2, analysts are divided on its implementation. Some believe the new architecture will be used in R2, while others suggest it may be integrated into a V4 model instead of a standalone R2 release.