Expert Q&A: China's new DeepSeek AI model

Claimed to be orders of magnitude cheaper and less energy-intensive than other AI models, China's DeepSeek put the cat among the pigeons when it was released in January. We asked a range of experts for their thoughts on the development.

Adobe Stock

Meet the experts

Professor Harin Sellahewa, Professor of Computing and Dean of Faculty of Computing, Law and Psychology, University of Buckingham

Prof Anthony G Cohn, FREng, FLSW, Professor of Automated Reasoning, University of Leeds, and Foundation Models Lead at the Alan Turing Institute

Dr Lukasz Piwek, Senior Lecturer in Data Science and member of the University's Institute for Digital Security and Behaviour, University of Bath

Q: How does DeepSeek R1 differ from other models?

LP: DeepSeek R1 distinguishes itself by merging chain‐of‐thought reasoning with a highly cost‐efficient, open‐source architecture. It attains performance levels comparable to top models like OpenAIs o1 and GPT-4o. For example, its score of approximately 90.8% on the MMLU (Massive Multitask Language Understanding) benchmark - an assessment spanning diverse academic and professional tasks - places it among leading systems, yet its trained for only about US$5.6 million, roughly 5–6% of competitors costs. R1 also utilises computational innovations such as mixture-of-experts (MoE) and multi-head latent attention (MLA) to reduce memory overhead and boost efficiency. Most impressively, it has been released as open-source under the MIT License with full code and detailed documentation—making it one of the only models of this power level to be released so openly besides Meta's products like LLaMA.

HS: DeepSeek, like Open AI’s ChatGPT or Google’s Gemini, is a Generative AI tool that can create content such as text, images, programming code and solve math problems. Both have AI models, commonly known as Multimodal Large Language Models (MLLMs), trained using vast amount of data representing diverse fields.”

DeepSeek uses the Mixture of Experts (MoE) architecture for its AI model. Put it simply, it’s a like being able to consult many experts from various disciplines and domains to generate a response to a user query, commonly referred to as a prompt. What differentiates DeepSeek from other Generative AI tools is its ability to consult only a subset of its experts who are most relevant to the user prompt as opposed to consulting all experts for every prompt. This means DeepSeek can respond faster than other Generative AI tools. Also, it uses less computational power as it does not use all of its AI model to respond to prompts.

Due to the restrictions on the sale of powerful computing chips to China, the developers of DeepSeek claim to have used computational resources that are less powerful than those available to the developers of ChatGPT

Due to the restrictions on the sale of powerful computing chips to China, the developers of DeepSeek claim to have used computational resources (processors) that are less powerful than those available to the developers of ChatGPT to create its AI model.

DeekSeek’s AI model is open source, which means third parties, such as developers, can use DeepSeek’s AI model to develop innovative solutions. In terms of accuracy, DeepSeek’s responses are generally on par with competitors.

AC: Whilst fundamentally still the same kind of AI model – a neural generative AI system based on transformers - DeepSeek R1 employs a so called “Mixture of Experts”  (MoE)technology whereby rather than having a single neural model, it is based on a number of separate models which be  selectively activated depending on the particular input – each such “expert model” is only 37B out of the total of 671B for the entire model; moreover it relies principally on reinforcement learning during its training process, unlike some of its competitors. This not only makes the model quicker to train, but also quicker (and cheaper) to respond to questions.  Other leading models also use MoE architectures but details are usually scant so it’s hard to compare the architectures.

One major difference of DeepSeek R1 in comparison to OpenAI’s o1 model is that although both use extended “chain-of-thought” reasoning – in DeepSeek the chain is explicit and displayed to the user dynamically (even resulting in termination as it realises it has mentioned a forbidden topic such as Tiananmen Square, o1 does not display the actual chain, though after the reasoning terminates some kind of edited version is available to the user.

DeepSeek is also open-source, meaning anyone can download it and run it on their own IT infrastructure (at their own cost), and can modify or customise it as desired.  It is released under a very flexible MIT licence which allows many uses, including commercial ones.  Most other leading edge AI models such as the offerings from OpenAI (GTP series) and Anthropic are closed source and one can only run the models on the company’s own servers (or sometimes on third parties such as Microsoft’s Azure platform where one can run a number of models from different companies including from OpenAI).  Even Meta’s Llama, whilst opensource has restrictions on its use.

 

Q: Do the claims around DeepSeek stack up?

AC: Certainly R1 is a generally impressive LLM, producing results at or fairly close to the state of the art.  My experience is that the “chain-of-thought” reasoning it produces to produce answer can be very long, which can result in very slow inference times for certain kinds of questions. When comparing different LLMs and considering their capabilities one always needs to take account of the fact that benchmarks are imperfect and may not reflect performance on new queries posed by a user.  Benchmarks generally contain only multiple choice questions which greatly simplifies automated evaluation, and means that the possible responses of an LLM are constrained by the possible set of answers.  Also, particularly for well known benchmarks which have been in the open literature for a while, there is the danger of so called “leakage” – i.e. that the model may have been trained on the benchmark on which it is subsequently evaluated, so may overestimate  the underlying capability of the LLM. Of course this issues potentially affects the evaluation of any LLM, not just DeepSeek models.  The challenge of evaluation is that there is an underlying tension between measuring what’s easy to measure, and measuring the true capability of an LLM, and there may be a large gap between the two.

MORE FROM THE ENGINEER'S AI FOCUS WEEK

HS: Restrictions on the sale of powerful computing chips to China limit DeepSeek’s ability to use state-of-art hardware technology its competitors have access to. DeepSeek’s founder Liang Wenfung did not have several hundred million pounds to invest in developing the DeepSeek MMLM, the AI brain of DeepSeek, at least not that we know of.

Comprehensive studies of DeepSeek’s capabilities are limited. DeepSeek has shown to perform better at some tasks than others. For example, DeepSeek has shown to be better at solving math problems than providing detailed summaries of articles with broad contextual information.

LP: Independent benchmarks and community evaluations support DeepSeek R1’s performance and efficiency claims. Analyses indicate that R1 consistently achieves competitive reasoning accuracy and demonstrates efficiency improvements of up to 3.2× over similarly sized models in complex tasks. These findings - drawn from its detailed technical documentation and assessments on platforms like Hugging Face - confirm that its resource-efficient design is not just theoretical. While some metrics derive from DeepSeek's internal reports, independent, peer-reviewed studies are expected to further validate these results.

 

Q: What impact might DeepSeek R1 have on AI development and use?

HS: The Open-Source accessibility of DeepSeek is a game changer. It enables innovators and entrepreneurs to leverage DeepSeek’s AI capabilities to develop innovative technological solutions.

The combination of low development costs, faster responses, accuracy, and cheap and Open-Source accessibility make DeepSeek an attractive alternative to the more established Generative tools such as ChatGPT.

AC: The fact that DeepSeek seems to have produced a highly performant model at relatively low cost, and on previous generation hardware, will clearly give additional impetus to existing, predominantly US-based providers, to develop more cost-effective training regimes. 

It is not entirely clear how DeepSeek has trained their models

However, it is not entirely clear how DeepSeek has trained their models and to what extent they rely on distillation of third-party models. If that is the case then the performance is not purely due to DeepSeek technology alone and means they may always lag behind the models they distil from. 

LP: DeepSeek R1’s breakthrough in reducing training costs while delivering top-tier reasoning capabilities could democratise advanced AI. Its open-source nature and lower resource requirements enable smaller companies and academic institutions to deploy high-performance models without massive infrastructure investments. This accessibility could foster rapid innovation in areas like coding, mathematics, and decision-making. By providing full technical transparency, R1 encourages community-led improvements and derivative research. DeepSeek’s success challenges the assumption that only billion-dollar ventures can produce frontier AI, potentially reshaping competitive dynamics in the global AI industry.

 

Q: Do you see potential drawbacks alongside benefits?

HS: There are legitimate concerns on DeepSeek’s data collection and privacy policy. Its Privacy Policy states they collect user-provided information such as date of birth (where applicable), username, email address and/or telephone number, and password. DeepSeek may collect users’ text or audio input, prompt, uploaded files, feedback, chat history, or other content that they provide when using DeepSeek’s AI models and services. Moreover, automatically collected data includes keystroke patterns or rhythms, which can be used as a biometric to identify individuals.

As with any other Generative AI tool, one must be careful of what data and information they want to share with DeepSeek. Sharing of security or commercially sensitive information with AI tools could have unintended consequences if such data and information become accessible to individuals who are not permitted to have them, or if AI models learn from such data and information and use them to update itself.

MORE FROM ARTIFICIAL INTELLIGENCE

The potential drawbacks of DeepSeek are not necessarily the collection of user-provided or the automatically collected data per say, because other Generative AI tools also collect similar data. DeepSeek has legal obligations and rights, which includes the requirement to “comply with applicable law, legal process or government requests, as consistent with internationally recognised standards”. Given that information collected by DeepSeek is stored in servers located in the People's Republic of China, personal data of users outside of China might not be protected by data protection regulations they might normally expect.

The solution is to ensure AI tools are trustworthy and responsible. A global effort is required to agree on a regulatory framework that fosters responsible innovation, protects rights, and assures trust in AI.

LP: Despite its promising design, DeepSeek R1 faces several challenges. Its aggressive optimisation using previously mentioned MoE and MLA - while significantly reducing resource usage - may lead to inconsistencies or reduced performance on edge-case tasks that demand extensive reasoning. 

Being developed and deployed in China introduces risks related to data privacy, regulatory constraints and censorship

Additionally, the distillation process used to produce smaller, more accessible variants of R1 has raised concerns about potential losses in nuanced capabilities, although robust scientific confirmation is still needed to confirm this. Finally, being developed and deployed in China introduces risks related to data privacy, regulatory constraints, and censorship, which might limit its global applicability. Yet again, independent evaluations will be essential to address these technical and geopolitical concerns.

AC: Unless you download DeepSeek and run it locally, you will be uploading information to a server located in China, which is potentially viewable by the Chinese government and the Chinese military.  Whilst one should always be careful about uploading sensitive data to anywhere in the cloud, the risks may be greater in this case.

It is not clear what the business model is for DeepSeek going forward;  they may up their costs, or change the terms of their licence (currently the completely unrestricted MIT license). 

It is well known that DeepSeeks censors certain questions, e.g. about Tiananmen Square. Apparently this is achieved via “guard rails” in the app, though it may be possible to circumvent these in downloaded versions; nevertheless DeepSeek will presumably have been trained on data curated culturally appropriately for the Chinese context, and at the fine-tuning stage. The model’s outputs may therefore not reflect western values. This is something that requires further research. It is interesting to note however, that DeepSeeks image generation model, Janus Pro 7B (when tested on the huggingface hosting platform) responds with an image including a US flag when prompted (in English) with “Give me an image of a cat that’s proud of its country” – just as ChatGPT-o does.