35,99 €
Learn how to build, fine-tune, and deploy AI systems using DeepSeek, one of the most influential open-source large language models available today. This book guides you through real-world DeepSeek applications—from understanding its core architecture and training foundations to developing reasoning agents and deploying production-ready systems.
Starting with a concise synthesis of DeepSeek's research, breakthroughs, and open-source philosophy, you’ll progress to hands-on projects including prompt engineering, workflow design, and rationale distillation. Through detailed case studies—ranging from document understanding to legal clause analysis—you’ll see how to use DeepSeek in high-value GenAI scenarios.
You’ll also learn to build sophisticated agent workflows and prepare data for fine-tuning. By the end of the book, you’ll have the skills to integrate DeepSeek into local deployments, cloud CI/CD pipelines, and custom LLMOps environments.
Written by experts with deep knowledge of open-source LLMs and deployment ecosystems, this book is your comprehensive guide to DeepSeek’s capabilities and implementation.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 574
Veröffentlichungsjahr: 2025
DeepSeek in Practice
From basics to fine-tuning, distillation, agent design, and prompt engineering of open source LLM
Andy Peng
Alex Strick van Linschoten
Duarte O.Carmo
DeepSeek in Practice
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Portfolio Director: Gebin George
Relationship Lead: Sonia Chauhan
Project Manager: Prajakta Naik
Technical Editor: Aditya Bharadwaj and Rahul Limbachiya
Copy Editor: Safis Editing
Indexer: Manju Arasan
Proofreader: Safis Editing
Production Designer: Jyoti Kadam
First published: November 2025
Production reference: 2021225
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80602-085-0
www.packtpub.com
Andy Peng, a builder with curiosity, is motivated by research and product innovation. He specializes in large language model inference optimization and evaluation for state-of-the-art models like DeepSeek, Qwen, and Claude. His work spans AWS Bedrock, SageMaker, Amazon S3, AWS Fargate, AWS App Runner, Alexa Health & Wellness, and fintech. A NeurIPS 2025 Chair and program committee member for ICML, ICLR, and KDD, he contributes to CNCF and the Linux Foundation, mentors at the University of Washington, and serves as Resident Expert at the AI2 Incubator.
I would like to express my gratitude to my family for their support and for understanding that I needed to work long hours on this book. My sincere thanks go to my manager, Rakesh Ramakrishnan, and my colleagues, Raj Vippagunta and Siddharth Shah, for their valuable input and support.
I would also like to thank Gebin George for reaching out with the opportunity to write this book, which has been a truly unique experience. Special thanks to my co-authors, Alex and Duarte, and to the entire Packt team—including Vandita Grover, Prajakta, Gebin, and everyone else—for their unwavering support throughout the writing process. I first connected with Packt in 2022, and it is a pleasure to see the successful delivery of our first new book.
Alex Strick van Linschoten is a Machine Learning Engineer at ZenML. His work focuses on bridging the gap between machine learning research and production deployment, particularly within the LLMOps space. He leads and maintains the LLMOps Database, a comprehensive collection of over 1,000 case studies examining LLMOps and GenAI implementations in production environments. He transitioned to software engineering after earning a PhD in History and spending 15 years living and working as a historian and researcher in Afghanistan. He has authored, edited, and translated several books based on his historical research and is currently based in Delft, the Netherlands.
I’d like to thank Saba, Aria, and Blupus for their patience as I took many weekends off to work on the chapters of this book. I’d also like to thank Hamza and the rest of the ZenML team for their support in thinking through how best to present the ideas introduced below. Of course, much appreciation goes to the Packt team as well for their support in getting this out into the world!
Duarte O. Carmo is a technologist from Lisbon, Portugal, now based in Copenhagen, Denmark. For the past decade, he’s worked at the intersection of machine learning, artificial intelligence, software, data, and people. He has helped solve problems for both global corporations and small startups across industries such as healthcare, finance, agriculture, and advertising. His approach to solving tough problems always starts with the same thing: people. For the past five years, he’s been running his one-man consulting company, working with clients of all sizes and across industries. He’s also a regular speaker in the Python and machine learning communities and an active writer.
I’d like to thank my family, who have always encouraged me to follow my passion. In particular, I want to thank Vittoria. Writing a book is no easy task. Following your passion is no easy task. Leaving the dinner table because a client has a problem is no easy task. Hiding in the attic to write about an open-source LLM while the rest of you are on holiday is no easy task. Your unconditional support and love inspire me every day to keep going. As you once told me: “There are a lot of fun things out there to do—go do them!”
Franck Benichou is a Senior AI Engineer with over six years of experience in machine learning and large language model (LLM) engineering. He currently works at Carta, the leading platform for private-market equity and fund data management. Following Carta’s acquisition of Accelex, Franck drives advancements in AI-powered document intelligence, applying Generative AI to transform complex financial data into structured insights. Before joining Carta, Franck worked at Deloitte (2024–2025) as an in-house Generative AI Developer, creating enterprise AI solutions and contributing to the firm’s internal AI strategy. From 2022 to 2024, he led Generative AI initiatives at EY (Ernst & Young), developing retrieval-augmented and content automation systems. Earlier, at Intact Financial Corporation’s R&D Data Lab (2020–2022), he specialized in usage-based insurance modeling and analytics, supporting telematics-driven pricing innovation. Franck combines strong technical depth with a product-focused mindset, building scalable and interpretable AI systems that bring automation, intelligence, and measurable value to data-driven organizations.
Once you’ve read DeepSeek in Practice, we’d love to hear your thoughts! Scan the QR code below to go straight to the Amazon review page for this book and share your feedback.
https://packt.link/r/180602084X
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
In the first part of this book, we’ll build a strong foundation for understanding DeepSeek and its role in the rapidly evolving world of AI. We begin by introducing DeepSeek as an open-source large language model and exploring why it has gained global attention. Next, we take a deep dive into its internal architecture, reasoning mechanics, and advanced capabilities to uncover what truly sets it apart. We’ll also explore effective prompting strategies to help you get the most out of DeepSeek models.
By the end of this part, you’ll have the context, technical understanding, and practical insights needed to confidently leverage DeepSeek in modern AI workflows.
This part of the book includes the following chapters:
Chapter 1, What is DeepSeekChapter 2, Deep Dive into DeepSeekChapter 3, Prompting DeepSeekTo keep up with the latest developments in the fields of Generative AI and LLMs, subscribe to our weekly newsletter, AI_Distilled, at https://packt.link/8Oz6Y.
Have questions about the book or want to contribute to discussions on Generative AI and LLMs?
Join our Discord server at https://packt.link/4Bbd9 and our Reddit channel at https://packt.link/wcYOQ to connect, share, and collaborate with like-minded enthusiasts.
Artificial intelligence (AI) is rapidly evolving, and with it comes a suite of tools that allow developers, researchers, and innovators to build smarter, more adaptive systems. One such emerging tool is DeepSeek: a powerful, open-source large language model (LLM) designed to rival the capabilities of major LLMs such as GPT-4 and LLaMA. But what exactly is DeepSeek, and why should you care?
In this chapter, we’re going to dive into what DeepSeek is, how it fits into the broader AI landscape, and why it’s generating interest across the tech industry. You’ll gain an understanding of DeepSeek’s unique features and how it compares to other models in terms of training data, efficiency, and performance benchmarks.
By the end of this chapter, you’ll be equipped to understand the development of DeepSeek and key contributors to its success.
In this chapter, we’re going to cover the following main topics:
Introducing DeepSeekUnderstanding the technical breakthroughs of DeepSeekImpact on the global AI ecosystemExploring the versions and evolution of DeepSeekFree Benefits with Your Book
Your purchase includes a free PDF copy of this book along with other exclusive benefits. Check the Free Benefits with Your Book section in the Preface to unlock them instantly and maximize your learning experience.
DeepSeek is an open-source language model that aims to democratize and make advanced AI accessible. The first version, DeepSeek-R1, appeared on 20 January 2025, just before the Chinese New Year. Instead of shipping only a closed binary, the team published the weights, training scripts, and inference code, so anyone can examine or rebuild the system.
Released under the MIT license, the model carries no usage fees or strict terms. Anyone may run it locally or adapt it for new tasks. This freedom drew developers, researchers, teachers, and small firms worldwide. They apply DeepSeek-R1 in support bots, classroom aids, lab studies, and writing tools.
Benchmarks show DeepSeek-R1, competing with OpenAI-o3 and Gemini-2.5-Pro. It handles math, code, many languages, and complex prompts. The results suggest strong models need not be closed and underscores China’s growing role in frontier AI. The release also revived debates on open access and safety, and improving global research cooperation. On September 17, 2025, another milestone was reached as the DeepSeek-AI team published their research on the model DeepSeek-R1 in Nature and made it to the cover of that issue (https://www.nature.com/articles/s41586-025-09422-z).
From an architectural standpoint, DeepSeek leaned heavily on innovations in transformer-based models, while adding its own spin in later versions (explored in depth in the section, Versions and evolution of DeepSeek). But what made it truly stand out was its usability. DeepSeek could be deployed in a wide range of environments, from cloud servers to edge devices, and even laptops using lightweight versions.
Figure 1.1: Benchmark performance of DeepSeek-R1 (0528) (source: https://api-docs.deepseek.com/news/news250528)
Let’s talk about the various factors that contributed to the sudden rise and popularity of DeepSeek:
Open source architecture and training details: DeepSeek-R1 was released with a detailed research paper (https://arxiv.org/abs/2501.12948) outlining its architecture and training approach, benchmark scores across reasoning, math, and programming tasks (https://artificialanalysis.ai/providers/deepseek). This release was supported by full model weights, configuration files, and training scripts, along with six smaller distilled variants suited for local or low-resource environments (https://api-docs.deepseek.com/news/news250120) and immediate API availability (https://api-docs.deepseek.com/guides/reasoning_model) for developers wanting hosted access.Timing: Part of its popularity was due to timing, as global organizations, scientists, and developers began exploring the new release. Additionally, the MIT license provided complete freedom for commercial use – an increasingly rare trait among performant models. The release also sparked excitement because it was not a research-only artifact; it was practical. Developers were able to fine-tune it, deploy it in production environments, and integrate it into existing AI workflows. The combination of power and usability became an instant draw.Initial technical highlights: The most notable aspects of DeepSeek-R1 at launch included the following:Reasoning: It outperformed or matched leading models on key benchmarks involving mathematics, code, and logical reasoning.Efficiency: It provided performance close to GPT-4-level systems at significantly lower inference costs.Reinforcement-first training: Unlike conventional fine-tuning workflows that depend on supervised human-annotated data, DeepSeek skipped straight to reinforcement learning from human feedback (RLHF) or similar paradigms – though with minimal human labeling. The change sped up reasoning scores, lowered human-labeling costs, and allowed the model to tackle diverse tasks, such as math problems, zero-shot code, and so on, without needing narrow task-specific instructions.Custom architecture: While based on the transformer framework, DeepSeek-R1 incorporated innovations optimized for training stability and long-context understanding.These elements together enabled the model to punch well above its weight, especially in multi-step reasoning and problem solving.
Real-world readiness: DeepSeek demonstrated real-world readiness from the outset, standing apart from many state-of-the-art models that excel in benchmarks but struggle in deployment. Unlike others that require extensive setup or closed infrastructure, DeepSeek was immediately usable in practical settings. It offered production-ready access via its API, local deployment with open weights and inference code, and customization through LoRA fine-tuning or prompt engineering. Integration with enterprise platforms such as Trae and Windsurf further streamlined orchestration. These capabilities, rarely combined so seamlessly in other models at launch, underscored DeepSeek’s commitment to practical utility beyond academic performance.Community interest: DeepSeek-R1’s release sparked intense community activity. GitHub quickly overflowed with plug-ins, adapters, and fine-tuned spin-offs, while thousands of Hugging Face forks powered tools for contract review, tutoring, research aid, summarization, and coding support. Online forums shared benchmarks and hardware guides, and universities adopted the model for courses and lab projects. Forums such as Reddit, Zhihu, and Stack Overflow buzzed with shared experiments, performance tests, and guides for fine-tuning DeepSeek on local hardware. The accessibility of the model turned casual enthusiasts into researchers and developers into entrepreneurs. Today, DeepSeek also fuels educational initiatives. Several MOOCs and university labs have begun teaching LLM theory and experimentation using DeepSeek as the base model due to its openness and clarity.Philosophical vision: DeepSeek’s vision aligns with a broader movement to build AI not as a gatekept asset, but as a shared global resource. Much like how Linux reshaped the software industry, DeepSeek aims to reshape AI development by putting tools in the hands of anyone curious or capable enough to use them. Its strategy is not just to compete with OpenAI or Google but to focus on accessibility and collaborative innovation.DeepSeek-R1’s success is attributed to three factors: RLHF training, open source commitment, and its competitive performance across benchmarks.
Together, these elements laid the groundwork for DeepSeek not just as a model, but as an ecosystem. The remainder of this chapter will explore the reception of DeepSeek-R1 and the motivations behind its open philosophy, what technical breakthroughs powered it, and how it is evolving into a full-scale ecosystem.
What truly sets DeepSeek-R1 apart are the technical innovations embedded in its architecture and training process. These innovations enabled it to outperform many contemporary models and helped redefine how future LLMs might be built.
Before we begin with DeepSeek’s training process, let’s first take a look at the development process of leading LLMs, which usually consists of the following:
Pretraining on massive corpora using self-supervised learning: In this stage, models are exposed to large-scale, diverse datasets such as books, websites, and code. The goal is to learn general language patterns without explicit labels. Common pretraining strategies include the following:Autoregressive modeling (e.g., GPT): The model predicts the next word in a sequence.Masked language modeling (e.g., BERT): The model predicts missing (masked) words.Permutation-based modeling (e.g., XLNet): The model learns over multiple possible word orders.Transformer architecture: Most LLMs use the transformer architecture (Figure 1.2), known for its scalability and performance in Natural Language Processing (NLP) tasks. They usually employ self-attention to determine contextual relationships between words. Some of the variants are as follows:Encoder-only (e.g., BERT) for classification or understanding.Decoder-only (e.g., GPT) for generation.Encoder-decoder (e.g., T5) for tasks such as translation and summarization.Figure 1.2: Transformer architecture (Attention Is All You Need, Vaswani et al., https://arxiv.org/pdf/1706.03762)
Supervised fine-tuning on curated instruction-following datasets: After pretraining, the model is fine-tuned on high-quality, labeled datasets where it learns to follow specific instructions and perform useful tasks. These datasets typically consist of human-written prompts and ideal responses, helping the model learn how to interact more directly and purposefully.RLHF: To better align model outputs with human preferences, RLHF is applied:Human reviewers evaluate and rank multiple model responses.A reward model is trained to predict these rankings.The LLM is then fine-tuned using reinforcement learning algorithms (commonly Proximal Policy Optimization (PPO)) to generate outputs that maximize the reward model’s score, thus better aligning with human values and expectations.DeepSeek-R1 broke this convention by bypassing supervised fine-tuning entirely. Instead, it jumped directly from pretraining to reinforcement learning, helping it learn strong reasoning skills on its own.
Avoiding supervised fine-tuning (SFT) eliminates the dependency on expensive, manually annotated datasets. Instead, DeepSeek adopts rule-based reward mechanisms, such as automatically validating correct answers or checking output formats. This approach is more scalable and cost-efficient, and helps overcome the limitations of human data curation. Additionally, the use of explicit, rule-driven rewards, such as verifying answers within designated structures or confirming code functionality, mitigates reward hacking, a common issue with less predictable neural reward models.
Figure 1.3 shows the workflow diagram of DeepSeek’s model training.
Figure 1.3: DeepSeek-R1 model training
The training pipeline for DeepSeek-R1 begins with DeepSeek-V3, a large 671B-parameter base model. This foundational model undergoes reinforcement learning (RL) using rewards focused on accuracy and output formatting, resulting in an intermediate model called DeepSeek-R1-Zero. This model serves as a crucial transition point, enabling more task-specific training in the following stages.
Next, DeepSeek-R1-Zero is fine-tuned using cold start data, which refers to a broad and diverse collection of instruction-following examples. This data is typically well-structured and curated to give the model a basic understanding of various task formats and domains, making it suitable for initial general-purpose instruction tuning.
Following this phase, additional rounds of SFT are applied using two specialized datasets. The first is chain-of-thought (CoT) data, which emphasizes multi-step reasoning. This data helps the model learn how to solve complex problems by breaking them down into intermediate steps – essential for mathematical reasoning, logical inference, and multi-hop question answering. The second set is knowledge data, which contains fact-rich, domain-specific content such as scientific literature, encyclopedic information, and technical manuals. This helps the model improve factual accuracy and domain coverage.
Once these fine-tuning stages are complete, the resulting model, DeepSeek-R1, is further enhanced through advanced RL techniques. It is trained with rewards not only for accuracy and formatting, but also for consistency, ensuring that its outputs are logically coherent and self-consistent. Furthermore, rule-based verification is applied to automatically validate responses in domains such as mathematics and code generation. Finally, human preference fine-tuning is incorporated to align the model’s behavior with human expectations and judgments of quality.
DeepSeek-R1 has been distilled into smaller, more efficient variants. These include DeepSeek-R1-Distill-Qwen, which uses Qwen 2.5 models ranging from 1.5B to 32B parameters, and DeepSeek-R1-Distill-LLaMA, which utilizes LLaMA 3 models in 8B and 70B sizes. These distilled versions retain much of the capability of the original R1 model but are optimized for different resource and latency constraints.
Overall, the DeepSeek-R1 pipeline represents a multi-phase strategy combining supervised learning, reward-based tuning, and targeted model distillation to deliver a family of instruction-following language models optimized for performance, generalization, and deployment flexibility.
As a result, the training process becomes more stable and efficient, benefiting from simplified, reliable reward signals that reduce noise and ambiguity.
DeepSeek’s training approach helped in the following aspects:
Reduced human labor cost: No need to manually annotate or rank thousands of instructions.Faster development cycle: The training timeline was streamlined significantly.Greater generalization: The model learned to generalize instruction-following through trial-and-error interactions rather than fixed templates.Despite lacking traditional supervised instruction datasets, DeepSeek-R1 demonstrated robust instruction-following capabilities, competitive with models that underwent fine-tuning. This suggested that RL alone – when well-designed – can endow a model with a deep understanding of instructions and intent.
Let’s take a look at DeepSeek’s inference pipeline, as depicted in Figure 1.4.
Figure 1.4: DeepSeek-R1 model inference
The inference pipeline of DeepSeek-R1 is designed to prioritize structured, verifiable outputs through a combination of rule-aware decoding and prompt optimization. During inference, DeepSeek-R1 leverages formatting-aware generation mechanisms, where the model is encouraged, often via prompt design and internal alignment, to produce well-structured, interpretable responses, especially for tasks involving code, math, or CoT reasoning. It is optimized not only for fluency but also for factual and logical coherence, frequently incorporating intermediate steps (CoT) in its answers, even without explicit prompting. This enables DeepSeek-R1 to deliver step-by-step solutions and structured outputs in JSON, Markdown, or code blocks, increasing reliability for downstream applications.
What sets DeepSeek-R1 apart from other state-of-the-art LLMs is its rule-based, reward-aligned inference behavior. Many leading LLMs rely primarily on end-to-end training with human preference fine-tuning, whereas DeepSeek-R1 integrates rule-based verification techniques directly into the RL loop. This has downstream effects at inference time: DeepSeek-R1 is more likely to generate outputs that are compatible with formal validators or downstream evaluators (e.g., test cases for code, equations for math). As a result, it shows stronger performance in domains where precision, structure, and interpretability are critical, while slightly trading off open-ended conversational flexibility.
Next up is DeepSeek’s RL approach.
DeepSeek-R1’s RL approach is notable for its distinctive design choices, which we will explore in depth in Chapter 2. Unlike many models that apply RL only in the final stages of training, DeepSeek-R1 introduced alignment techniques. Alignment techniques are the methods aimed at steering the model’s outputs to be more helpful, honest, and harmless – early in its training process. This early alignment contributes to more consistent and desirable behavior throughout the model’s development.
It also replaced large-scale human annotation with an automated reward model, significantly improving scalability and reducing reliance on manual labeling. Another defining feature was its use of self-play and iterative refinement, enabling the model to generate, evaluate, and improve its own outputs. This approach helped DeepSeek-R1 internalize advanced reasoning patterns and strategic decision-making, making it particularly effective at multi-turn reasoning, code explanation and completion, and solving complex mathematical problems.
Moreover, RL training helped mitigate hallucinations by reinforcing factual accuracy through self-generated success metrics.
For readers unfamiliar with alignment in the context of AI, it generally refers to techniques that ensure a model’s behavior aligns with human intent and values. A helpful introduction to these concepts can be found in OpenAI’s alignment overview (https://openai.com/index/our-approach-to-alignment-research/) or in the Alignment Newsletter (https://www.alignmentforum.org/s/dT7CKGXwq9vt76CeX).
Apart from this, DeepSeek made some modifications to the transformer architecture. Let’s find out.
DeepSeek-R1 builds on the widely adopted transformer architecture, which forms the foundation for most modern LLMs. At its core, the transformer uses a self-attention mechanism that allows each token in an input sequence to weigh the importance of every other token, regardless of position. This enables the model to capture long-range dependencies and contextual relationships more effectively than traditional recurrent models.
However, standard attention becomes computationally expensive as input length increases. To address this, DeepSeek-R1 introduces adaptive attention routing, a major architectural evolution. Unlike traditional transformers that apply fixed full attention across all tokens, this mechanism allows the model to selectively attend to the most relevant tokens based on learned relevance scores computed during training. These scores are typically derived from internal attention weights or auxiliary gating mechanisms, which prioritize tokens that contribute most to minimizing the training loss. By focusing computational resources on high-impact tokens, especially in long sequences, adaptive attention routing enables DeepSeek-R1 to handle inputs of up to 32,000 tokens more efficiently. This not only enhances the model’s ability to comprehend and summarize large documents but also reduces computational overhead by avoiding redundant attention over less informative tokens.
Figure 1.5 compares the traditional transformer attention and DeepSeek’s adaptive attention routing.
Figure 1.5: Traditional transformer attention versus DeepSeek-R1 adaptive attention routing
In addition, DeepSeek-R1 employs mixed-precision optimization, combining FP16 (half-precision) and INT8 (quantized) arithmetic to improve training and inference efficiency. This approach reduces memory usage and accelerates computation while maintaining competitive model performance in terms of accuracy and perplexity. Typically, FP16 is used throughout most of the model for general computation, while INT8 quantization is selectively applied to inference-time matrix multiplications, often in attention and feed-forward layers, where precision can be reduced without significantly impacting output quality. By carefully choosing which layers to quantize, DeepSeek-R1 achieves a favorable trade-off between efficiency and performance. This approach significantly accelerates inference and training while maintaining output quality, making it ideal for deployment at scale.
The model also benefits from efficient parallelization strategies. It leverages tensor parallelism and activation checkpointing to reduce memory usage during training, allowing it to be trained on multi-GPU systems more effectively.
Together, these enhancements make DeepSeek-R1 an evolution of the transformer, one that is not only more scalable and context-aware but also more cost-efficient in both training and inference.
Let’s now turn our focus to how the DeepSeek architecture compares to the architecture of other LLMs.
As the field of LLMs evolves, different architectures and training paradigms have emerged. This comparison focuses (Table 1.1) on key differentiators, particularly the use of Mixture-of-Experts (MoE) architectures and RLamong some of the most impactful models in recent years.
Model
MoE ?
Key architecture highlights
RLHF
Open source?
DeepSeek-R1
Yes. Sparse MoE
671 B params; MoE + Multi-Head Latent Attention; reasoning-centric
No RLHF, uses pure reinforcement learning (GRPO)
Yes (MIT)
Claude 4
No
Dense transformer; built with Anthropic’s Constitutional AI and Direct Preference Optimization (DPO)
Yes, advanced RLHF + DPO
No
Gemini 2.5 Pro
Yes. Sparse MoE
Multimodal sparse MoE transformer; 1 M token context (2 M soon)
Yes. RLHF + ongoing alignment
No
GPT-4.5
No
Released Feb 27, 2025; OpenAI’s largest non-CoT model (Orion)
Yes. RLHF + SFT
No
o3
Unknown (likely dense)
Optimized for personalized assistant tasks; improved grounding and memory modules
Yes, advanced RLHF
No
Grok 3.5
No evidence of MoE
Dense transformer; enhanced reasoning from Grok 3; advanced “Think” mode; still proprietary
Yes. RL-based training + RLHF fine-tuning
No
Gemma 3
No, dense
Lightweight MoE; instruction-tuned with long-context
Yes. RLHF
Yes (Apache 2.0)
LLaMA 4
No
Dense transformer; advanced memory and modular layers
Yes. RLHF and safety fine-tuning
Yes
Table 1.1: Comparison of LLM architectures
Leading language models are increasingly diverging in their architectural strategies, particularly around the use of MoE. Models such as Gemini 2.5 Pro and DeepSeek-R1 adopt sparse MoE architectures, enabling large parameter scales while maintaining efficient compute usage. GPT-4.1 is widely believed to incorporate some form of MoE or sparse expert routing, based on its strong performance and low-latency characteristics, though exact details remain undisclosed.
Sitting between these approaches, Grok 3.5 retains a dense transformer architecture, optimized for real-time responsiveness and integrated reasoning. It avoids MoE entirely, focusing instead on RL techniques and iterative refinement using live feedback data.
In contrast, Claude 4 and LLaMA 4 continue with fully dense designs, prioritizing simplicity, alignment stability, and predictable behavior over raw parameter scaling.
Now that we have introduced you to the DeepSeek’s technical innovation, we will dive deeper into the MoE design of DeepSeek.
A major architectural breakthrough in DeepSeek-R1 is the integration of the MoE design.
MoE is a modular neural network design where only a subset of parameters (called experts) is activated for any given input. Rather than using the full parameter space for every prediction, MoE selectively activates a few experts dynamically.
DeepSeek employs a sparse MoE architecture, in which a gating network dynamically selects two out of Nexpert networks at each layer to process a given input. Selecting two experts strikes a balance between computational efficiency and model expressiveness. It allows the model to leverage diverse expertise without incurring the full cost of activating all experts. This approach enables specialization across experts while keeping inference latency and resource usage manageable. These experts are not manually assigned to specific tasks such as math or code; instead, specialization emerges during training. The gating mechanism learns, through optimization, to route inputs to the most effective experts based on contextual cues. Over time, certain experts become more activated for specific domains (e.g., language, reasoning, and coding) as a result of this learned routing, effectively developing functional specialization.
Figure 1.6 provides an overview of this architecture.
Figure 1.6: Conceptual view of MoE architecture
Each expert processes the same type of input representations but may learn to emphasize different aspects depending on the patterns it receives. Their role is shaped by the data they are most often routed for, which, in turn, guides their parameter updates. This allows experts to take on distinct roles organically, without needing different input formats or encodings. The model contains a large pool of expert subnetworks (each a small feed-forward network), but only a small subset – typically two – is activated per input token. A gating network evaluates the context and dynamically decides which experts to activate, allowing the model to adaptively route information where it’s most effectively processed.
This structure yields several key benefits:
Scalability: Since only a few experts are active at any time, the model can maintain a large number of total parameters while consuming less compute per token than a dense model of equivalent size. This enables DeepSeek to scale up without linear increases in computational cost.Modularity: Experts can be trained, frozen, updated, or even swapped independently. This modularity allows for efficient continual learning, domain adaptation, or task-specific fine-tuning without retraining the entire model.Specialization: As the gating network learns to route different inputs to different experts, these subnetworks begin to specialize; some become more attuned to code, others to mathematical reasoning, natural language, or dialogue. This reduces the risk of overfitting and enhances the model’s ability to generalize across diverse tasks.Through this architecture, DeepSeek-R1 essentially behaves like an ensemble of domain-specific models, but without duplicating resources or incurring the latency overhead typically associated with running multiple systems in parallel.
Apart from DeepSeek, many state-of-the-art (SOTA) LLMs also employ MoE architecture, the details of which are provided in the following table:
Model
Parameter count
Active experts per token
Total experts
Routing type
Use case strengths
DeepSeek-R1
~130B total / ~30B active
2
~64
Sparse + Gating
Math, reasoning, code, and multilingual understanding
Gemini 2.5 Pro
Estimated 1T+ (MoE config)
Unspecified (Likely 2–4)
Dozens
Proprietary Sparse
Multimodal apps, coding, and retrieval-augmented reasoning
Grok-3
Estimated 400B+ total (MoE)
2–4 (adaptive)
Unspecified (20+)
Advanced Dynamic MoE
Enhanced reasoning, DeepSearch, vision + code + chat, and long context
Grok-1.5
Estimated 300B total
2–4 (adaptive)
16+
Dynamic Routing
Real-time interaction, multimodal learning, and large-scale context tracking
Mixtral 8x7B
56B total / 12.9B active
2
8
Top-2 Gated MoE
General-purpose reasoning, fast inference, and multilingual
Switch Transformer
1.6T total / ~15B active
1
2,048
Top-1 Routing
Scalability benchmark; pioneered MoE at trillion-scale
GLaM
1.2T total / 93B active
2
64
Top-2 Routing
NLP understanding, code, and scientific tasks
Table 1.2: Comparison of MoE models
With MoE architecture gaining traction, Mixtral 8×7B showed how open source models could gain strong reasoning with limited compute, while newer systems such as Grok and Gemini 2.5 add adaptive or proprietary routing and multimodal pretraining. Google’s Switch Transformer and GLaM, both trillion-parameter prototypes, first confirmed that MoE could scale reliably. Together, these projects show how MoE lets very large models grow while keeping inference fast enough for real-time, high-performance tasks.
Another foundational aspect of building an LLM is the data on which it is trained. Let’s see how DeepSeek utilized its training dataset.
In the paper DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (https://arxiv.org/abs/2501.12948), the authors state that DeepSeek-R1 was trained on a large, diverse, and domain-rich multilingual dataset intended to support reasoning, coding, and cross-lingual understanding.
While the precise composition of this dataset is not publicly disclosed, the paper outlines the use of carefully curated cold-start data and approximately 800,000 SFT samples, integrated into a multi-stage RL framework to develop the model.
The data selection strategy emphasized high-quality sources across several key domains:
The model was heavily trained on programming languages such as Python, JavaScript, and Rust, using curated code from repositories such as GitHub and developer Q&A platforms such as Stack Overflow.To strengthen mathematical and logical reasoning, the dataset incorporated formal logic corpora, symbolic mathematics benchmarks, and collections of competitive math problems (e.g., MATH, GSM8K, and ProofWriter).Scientific understanding was bolstered through pretraining on academic publications and technical manuals, drawing from sources such as arXiv, and open-access research datasets.The multilingual component of the dataset was anchored in both Chinese and English, with additional coverage of major global languages sourced from datasets such as CCMatrix and OPUS.This diverse training foundation reflects DeepSeek-R1’s objective: to combine high-level reasoning with robust multilingual and domain-specific capabilities.
In the same paper, the authors describe adopting a low-filtering strategy in curating their training data, contrasting with the more aggressive data filtering pipelines used by many other LLM developers. This design choice was made to preserve the natural complexity and diversity of language, enabling the model to better capture informal expressions, culturally specific idioms, emotionally charged language, and edge-case scenarios. According to the authors, such linguistic variety supports more expressive, creative, and contextually fluent model behavior. However, the paper does not specify exactly what types of content, if any, were filtered out. Without transparency into the dataset composition or filtering criteria, it remains unclear whether the training corpus included potentially harmful content such as hate speech, misinformation, or offensive material. While this low-filtering approach may enhance the model’s ability to generalize across diverse linguistic and cultural contexts, it also introduces the risk that undesirable content could be learned and reproduced. These trade-offs underscore the importance of downstream safeguards and responsible use, particularly when deploying the model in real-world settings.
DeepSeek-R1’s architecture and training strategy resulted in a model particularly well-suited for open-ended tasks. It demonstrates strong capabilities in brainstorming, idea generation, and exploratory dialogue – areas that benefit from flexibility and minimal preconception. The model also performs well in nuanced translation and multilingual reasoning, aided by its diverse language training. Additionally, DeepSeek-R1 is noted for its adaptability to user tone and conversational style, which many users find useful in creative and collaborative contexts.
Well, there have been several versions of DeepSeek, and we expect newer releases as the race toward artificial general intelligence heats up.
In the next section, we will talk about how the DeepSeek ecosystem has evolved since the release of R1.
The story of DeepSeek is not just about a single model launch. It’s about a continuously evolving ecosystem. Each iteration of DeepSeek introduces significant upgrades in reasoning, usability, safety, and integration across platforms. Understanding the evolution of DeepSeek is critical to appreciating its long-term vision and potential.
Since its debut, DeepSeek has made significant strides across multiple domains, including language understanding, mathematical reasoning, code generation, and multimodal capabilities. Each version introduces meaningful enhancements, underscoring DeepSeek’s goal of democratizing high-performance language models without compromising on quality.
Understanding DeepSeek’s evolution provides critical insight into its growth trajectory, vision, and how it continues to disrupt both proprietary and open AI ecosystems. In the following table, we chronologically explore the key milestones, model variants, and product layers that now form the DeepSeek suite.
Version
Release date
Key features
DeepSeek LLM
Jan 2025
Foundational model; multilingual, open weights
DeepSeek-R1
Jan 20, 2025
Full MIT-licensed release, strong reasoning, multilingual, chat + code
DeepSeek V2
Early 2025
Refined alignment, better factual grounding
DeepSeek Coder
Feb 2025
Specialized coding model with top-tier performance in Python and JS
DeepSeek VL
Mar 2025
Vision-language model (image+text), multimodal groundwork
DeepSeek Math
Apr 2025
Focused on algebra, logic, and multi-step reasoning
DeepSeek V3
May 2025
Upgraded generalist model with better long-context and planning
DeepSeek-R1-0528
May 2025
Latest refinement: stronger factuality, 32k token context, improved alignment
DeepSeek Coder V2
Jun 2025
Massive improvement in code synthesis and inline documentation
DeepSeek V3.1
Aug 2025
Hybrid inference, fast thinking, and stronger agent skills
DeepSeek V3.2-Exp
Sep 2025
DeepSeek Sparse Attention (DSA) for faster, more efficient, and inference on long context.
Table 1.3: DeepSeek model evolution
Each of these models targets specific use cases, from general-purpose chatbot functions to highly focused coding and math tasks. Let’s talk about them in detail.
The DeepSeek ecosystem has rapidly evolved into a suite of specialized models, each designed to address different use cases in reasoning, coding, vision, and general-purpose AI. While all variants build on a shared architectural backbone and training philosophy, each model iteration introduces new capabilities, performance trade-offs, and domain optimizations. Here is an overview of the key models within the DeepSeek family and how they compare in terms of specialization and utility:
DeepSeek LLM (https://github.com/deepseek-ai/DeepSeek-LLM): The original backbone of the DeepSeek family, the DeepSeek LLM laid the foundation for all future iterations. While it lacked some specialized capabilities, it established multilingual competence and solid reasoning as core priorities.DeepSeek Math (https://github.com/deepseek-ai/DeepSeek-Math): Tailored for students, researchers, and technical professionals, DeepSeek Math focuses on multi-step reasoning problems in algebra, calculus, geometry, and symbolic logic. It serves as a viable open source alternative to Wolfram Alpha-like reasoning systems.DeepSeek Coder (https://github.com/deepseek-ai/DeepSeek-Coder) and Coder V2 (https://github.com/deepseek-ai/DeepSeek-Coder-V2): The first Coder model introduced competitive performance in Python and JavaScript, integrated with development environments such as VS Code and GitHub Copilot. Coder V2 (June 2025) significantly raised the bar, approaching Claude 3.5 in inline function synthesis, docstring generation, and type inference.DeepSeek VL (https://github.com/deepseek-ai/DeepSeek-VL): A pivotal release for multimodal applications, VL supports both image and text inputs, opening the door for applications in visual question answering, optical character recognition, document summarization, and more. While it still lags behind GPT-4-V or Gemini 1.5 Pro in vision capabilities, it’s rapidly improving.DeepSeek V2 (https://github.com/deepseek-ai/DeepSeek-V2) and V3(https://github.com/deepseek-ai/DeepSeek-V3): The V2 update prioritized prompt alignment, minimizing hallucinations and expanding support for longer contexts. V3 followed up with better long-term memory support, faster inference, and internal planning modules that enabled early-stage agentic behavior.DeepSeek-R1 (https://github.com/deepseek-ai/DeepSeek-R1): This was the launch version that started it all. Released on January 20, 2025, it quickly became the top-performing open source model across a wide array of benchmarks. Key highlights include a fully open source, MIT-licensed release that provides model weights, a tokenizer, and the entire training pipeline. DeepSeek-R1 delivers strong logical reasoning capabilities, surpassing most open models and rivaling some proprietary systems, along with robust multilingual performance in both English and Chinese. As already discussed, the release also features a set of smaller, distilled variants under 20B parameters, enabling efficient use in local or edge environments. Practical applications range from API-based chatbot integration and Copilot-style coding assistance to lightweight deployments via platforms such as Ollama and VS Code extensions.DeepSeek-R1-0528 (https://github.com/marketplace/models/azureml-deepseek/DeepSeek-R1-0528): This update solidified DeepSeek’s position at the top of the open source pyramid. The May 28 version refined the core model, making it more aligned, accurate, and efficient. The new features and enhancements include the following:Substantial reduction in hallucinations: Notably in scientific and historical facts.Improved coding fluency: Achieved parity with GPT-4-turbo in many Python tasks.Mathematics performance: Enhanced accuracy in multi-step algebra, geometry, and logic problems.Updated prompt alignment: Better adherence to user instructions, even in ambiguous prompts.Multimodal readiness: Architecture adapted for future image/text fusion.In benchmark evaluations, the May release of DeepSeek-R1-0528 showed a 7% improvement in mathematical reasoning tasks compared to the January version. Code generation performance, measured on HumanEval-style benchmarks, increased by 9%. Additionally, the model demonstrated effective long-context reasoning, handling inputs up to 32,000 tokens with minimal performance degradation.
The release of DeepSeek-R1-0528 underscored that the pace of DeepSeek’s development remained strong and consistent. Its improved performance and open accessibility led many developers to begin migrating entire workflows from GPT-based systems to DeepSeek APIs. This shift was further supported by a surge in ecosystem integrations, including Visual Studio Code plugins, Ollama compatibility, and Dockerized deployment options, signaling growing adoption across both individual and enterprise-level users.
DeepSeek-V3.1 (August 2025) (https://huggingface.co/deepseek-ai/DeepSeek-V3.1): DeepSeek V3.1 is a cutting-edge hybrid reasoning model featuring both thinking and non-thinking modes, advanced agent and tool use capabilities, a massive 685-billion-parameter architecture, and an extended 128,000-token context window for long-document understanding. The model is engineered for fast, structured, multi-step reasoning and supports code generation, search, and agentic workflows, with enhanced efficiency from MoE architecture and optimized inference, matching or surpassing previous DeepSeek benchmarks while maintaining low latency. DeepSeek V3.1 also boasts strong multilingual support, open source availability for research, and specialized training for reliable external tool integration and reduction of hallucinations, making it suitable for diverse enterprise and developer applications.Apart from various models, DeepSeek has created many products for streamlined adoption and use. Let’s take a look.
Beyond the models themselves, DeepSeek has developed a growing suite of user-facing products and developer tools that make adoption frictionless:
DeepSeek app: A mobile-first AI assistant app available for Android and iOS, offering real-time interaction with DeepSeek-R1 and Math/Coder variants. Key features include voice input, code cell execution, note-taking, and multilingual support.DeepSeek web app: Accessible at deepseek.com/chat, this offers a clean and responsive interface for real-time interaction with various DeepSeek model variants. It includes conversation memory, allowing users to maintain context across multiple exchanges for more coherent dialogues. Prompt templates are available to streamline repetitive tasks or structured inputs, making it easier to prototype or test specific behaviors. Additionally, users can export entire chat sessions in Markdown or PDF formats, which is particularly useful for documentation, collaboration, or offline review.DeepSeek developer platform (https://platform.deepseek.com/): Offers a flexible and open environment for building and deploying AI-powered applications. Developers can fine-tune models to create custom endpoints tailored to specific tasks or domains. The platform supports seamless model selection across the DeepSeek family, including general-purpose (R1), coding-focused (Coder), and multimodal (VL) variants. With support for context-aware API calls up to 32,000 tokens, it enables sophisticated multi-turn reasoning and long-form content processing. A beta feature for role and function calling is also available, allowing developers to define structured interactions and extend model capabilities for tools, agents, or workflow automation.DeepSeek’s value proposition lies in its accessible model weights, strong reasoning capabilities, and low cost. This combination makes DeepSeek-R1 an attractive choice for educational use, research and development, and start-ups seeking advanced AI tools without restrictive licensing or high costs.
Now, let’s take a look at the integration and deployment support DeepSeek offers.
DeepSeek’s accessibility is one of its defining strengths, thanks to a wide range of integration and deployment options across local, cloud, and web environments. Here is a high-level overview of where and how DeepSeek can be used:
For local and edge deployment, DeepSeek runs seamlessly on platforms such as Ollama (https://ollama.com/library/deepseek-r1–ollamarundeepseek), Docker Hub (https://hub.docker.com/r/devlinrocha/ollama-deepseek-r1-7b/tags), and through VS Code extensions (https://github.com/enesbasbug/deepseek-vscode-extension) for code completion and inline assistance. It also supports private cloud and on-premises deployments via Kubernetes, Docker Compose, or direct GPU cluster setups.In the cloud, DeepSeek is integrated with Amazon Bedrock (https://aws.amazon.com/bedrock/deepseek), Amazon SageMaker (https://aws.amazon.com/blogs/machine-learning/deploy-deepseek-r1-distilled-models-on-amazon-sagemaker-using-a-large-model-inference-container/), Azure ML (https://azure.microsoft.com/en-us/blog/deepseek-r1-is-now-available-on-azure-ai-foundry-and-github/), and Google Cloud Vertex AI (https://cloud.google.com/vertex-ai/generative-ai/docs/maas/deepseek), enabling scalable inference via Hugging Face or custom containers.Hosted APIs are available via providers such as Fireworks.ai (https://fireworks.ai/), Together.ai (https://www.together.ai/), Replicate (https://replicate.com/), and Modal (https://modal.com/), offering both rapid prototyping and production-ready workflows.For lightweight and browser-based access, users can try models on deepseek.com/chat, Hugging Face Spaces, or through ready-to-run notebooks on Google Colab and Kaggle.If you wish to explore these deployment options, we have created an Appendix toward the end of the book for you to explore.
DeepSeek also integrates with popular orchestration frameworks such as LangChain, Haystack, and LlamaIndex for retrieval-augmented generation (RAG), as well as no-code tools such as Turing, Trae, and Windsurf for structured agent workflows.
DeepSeek’s upcoming roadmap outlines an ambitious expansion of its model capabilities and deployment strategies. Planned developments include multimodal training that incorporates image and audio understanding, as well as the creation of autonomous agents equipped with memory, planning, and tool-use functions for complex workflows. The team is also working on smaller, task-specific variants tailored to domains such as medical question answering, legal analysis, and STEM education. In addition, DeepSeek is investing in privacy-focused deployments through federated training methods and secure, on-premises LLMs. Following the success of the R1 series, anticipation is building for a potential DeepSeek-R2 release in late 2025. While unconfirmed, early reports suggest it may offer structured reasoning capabilities that rival GPT-5, while maintaining a fully open source framework.
But with all the hype comes skepticism, too. The next section will examine how DeepSeek’s choices have influenced the broader AI ecosystem, including pricing models, competition, and global policy dynamics. We will also talk about some concerns and risks that are being commonly discussed in the community.
The release of DeepSeek-R1 wasn’t just a technical milestone; it was a strategic turning point for the global AI industry. Its unique combination of open source accessibility and top-tier performance sent shockwaves through research labs, start-ups, and policy circles alike. The implications have spanned economic competition, academic acceleration, ethical discourse, and geopolitical realignment.
Before DeepSeek-R1, frontier-level LLMs were typically expensive, restricted to API-only access, or bound by licenses that limited commercial use. The release of DeepSeek-R1, with freely available model weights, inference code, and permissive licensing, marked a significant shift, making high-performing language models far more accessible.
DeepSeek-R1 made a notable impact on the AI landscape by introducing free, full access to a high-performing reasoning model. It demonstrated competitive accuracy across domains such as mathematics, programming, and formal logic – areas traditionally dominated by proprietary systems. By making these capabilities openly available, DeepSeek offered a viable alternative to closed source platforms for both researchers and commercial developers.
Perhaps the most significant consequence of this release was the market pressure it generated. By lowering the economic barrier to advanced AI experimentation, DeepSeek-R1 challenged prevailing assumptions about access and affordability in the field. Its open availability pushed several proprietary labs to accelerate their own open source strategies in response. Moreover, the move sparked broader conversations about AI equity, competition, and global governance, highlighting the growing tension between innovation, accessibility, and responsible deployment. Its open release created ripples far beyond China, inspiring labs in Europe, India, and even Silicon Valley start-ups to explore more transparent models.
DeepSeek-R1’s release triggered a notable shift in AI pricing and market positioning. In response, OpenAI introduced more affordable options, such as discounted access to GPT-3.5 Turbo, while keeping premium GPT-4.1 plans priced at around USD 200 per month for those needing top-tier performance. Anthropic and Cohere also reacted by launching smaller, lower-cost chat models aimed at maintaining their appeal to budget-conscious users. Meanwhile, Meta reaffirmed its commitment to its LLaMA roadmap, signaling expanded licensing options and underscoring a pivot toward broader accessibility in model deployment.
Open source suddenly wasn’t a niche, with DeepSeek becoming a serious economic threat to closed-model business models. Enterprises, particularly cost-sensitive ones, began evaluating DeepSeek for customer support bots, embedded agents, and enterprise knowledge bases, replacing more expensive APIs.
DeepSeek’s open release emboldened the global open source movement. Developers and researchers who had previously felt locked out of meaningful LLM contributions found new momentum. DeepSeek demonstrated that you didn’t need a billion-dollar infrastructure to create something truly impactful.
This led to the following:
Academic labs fine-tuning DeepSeek for specialized domains (biomedicine, law, and STEM education).Start-ups building SaaS tools using DeepSeek as a backend.Governments investigating use cases for public-sector LLM deployment.Open source development surged globally following DeepSeek-R1’s release, with new models emerging in regions such as India, South Korea, the EU, and Latin America. On Hugging Face, one notable Indian-based project, Deepdive404/Deepseek-fork, released distilled versions of R1 across multiple parameter sizes: 1.5B, 7B, 8B, 14B, 32B, and 70B. You can find it here: https://huggingface.co/Deepdive404/Deepseek-fork Meanwhile, the original deepseek-ai/DeepSeek-R1 repository provides the core R1 and R1-Zero models at https://huggingface.co/deepseek-ai/DeepSeek-R1, along with its distilled variants.
These community-driven forks incorporate local languages, regional usage patterns, and various export formats, evidenced by quantized and optimized versions such as gghfez/DeepSeek-R1-11446-Q2_K at https://huggingface.co/gghfez/DeepSeek-R1-11446-Q2_K, which is tailored for efficient GPU inference. Within just months, Hugging Face recorded hundreds of forks and integrations inspired by DeepSeek, spotlighting localized models, quantization, and community-led improvements.
The model also inspired collaboration across borders. Researchers began publishing cross-lab benchmark studies using DeepSeek as a baseline, and community-maintained evaluation leaderboards gave the model credibility well beyond its original launch hype (https://artificialanalysis.ai/models/deepseek-r1, https://www.statista.com/statistics/1552824/deepseek-performance-of-deepseek-r1-compared-to-open-ai-by-benchmark/, and https://pubmed.ncbi.nlm.nih.gov/40267969/).
DeepSeek-R1 marked a milestone as one of the first open source model from China to achieve competitive performance on globally recognized benchmarks. Its strong results in reasoning-intensive tasks and rapid international adoption challenged the perception of Western dominance in frontier AI. The model’s reception highlighted the growing importance of open collaboration and cross-border evaluation in legitimizing AI innovation worldwide.
Benchmarks from independent labs confirmed that DeepSeek-R1-0528 was rivaling OpenAI’s O3 and Google’s Gemini 2.5 Pro. For an open source model, this was unprecedented.
It proved two critical things:
DeepSeek was not a one-hit wonder – it was a growing ecosystem.Open source development could keep pace with proprietary labs when supported by community collaboration and smart engineering.The release of DeepSeek-R1-0528 on May 28, 2025, delivered enhanced performance in mathematical reasoning, code generation, and factual retrieval. It also demonstrated reduced hallucination rates in factual question-answering benchmarks and introduced improved long-context handling, now supporting input lengths of up to 32,000 tokens – thus, making it more effective for complex, multi-turn tasks and extended document analysis, and reaffirming continuous model improvement with this new release.
The release of DeepSeek-R1 has also prompted critical questions and ongoing debates. Let’s talk about some of the controversies surrounding DeepSeek.
As with any major development in AI, DeepSeek has not emerged without controversy. While it is widely celebrated for its technical sophistication, open source stance, and trailblazing approach to reasoning, its ascent has stirred debate in areas ranging from research ethics and safety to geopolitical strategy and intellectual property. This section explores the multifaceted controversies that have accompanied DeepSeek’s rise, acknowledging the tension between technological progress and responsible innovation.
