DeepSeek
Deepseek (Shēndù qiúsuǒ in Chinese) is an application of artificial intelligence chatbot launched on January 10, 2025 by the Chinese Deepseek company, which specializes in dialogue
The chatbot is a language model fine-tuned with both supervised and reinforcement learning techniques
It is composed of models DeepSeek LLM, DeepSeek-V2, DeepSeek-V3, and DeepSeek-R1 of Deepseek
Background
In February 2016, High-Flyer was co-founded by artificial intelligence enthusiastic Liang Wenfeng, who had been operating since the financial crisis of 2007-2008 while attending the University of Zhejiang
In 2019, he established high-flyer as a coverage fund focused on the development and use of AI negotiation algorithms
In 2021, High-Flyer used exclusively in commerce
According to estimates of 36kr, Liang had accumulated a warehouse of more than 10,000 NVIDIA A100 chips before the United States government imposed restrictions on AI chips in China
Dylan Patel, from the semi -toalysis research consultant, estimated that Depseek had at least 50,000 chips
In April 2023, High-Flyer launched a general artificial intelligence laboratory dedicated to investigating the development of AI tools independent from the financial business of High-Flyer
In May 2023, with High-Flyer as one of the investors, the laboratory became its own company, Deepseek
Risk capital companies were reluctant to provide financing, since it was unlikely that they could generate a way out (return on investment) in a short period of time
After launching Depseek-V2 in May 2024, which offered great performance at a low price, Depseek became known as the Catalizer of the Price War of China's models of AI models
It was quickly called the "Pinduoduo of AI", and other important technological giants such as Bytedance, Tencent, Baidu and Alibaba began to reduce the price of their AI models to compete with the company
Despite the low price charged by Depseek, he was profitable compared to his rivals that were losing money
So far, Depseek focuses solely on research and has no detailed marketing plans
Deepseek hiring preferences focus on technical skills rather than work experience by recruiting new employees, so most of their new employees are recently graduate or developer university students whose careers are less established
Versions
DeepSeek LLM
On November 2, 2023, Depseek presented its first model, Deepseek Coder, which is available for free for both investigators and commercial users
The model code was made open under the MIT license, with an additional license agreement on the “open and responsible subsequent use” of the model itself
On November 29, 2023, Depseek launched Deepseek LLM, which climbed up to 67 billion parameters
It was developed to compete with other LLM available at that time with a performance close to that of GPT-4
However, he faced challenges in terms of computational efficiency and scalability
A chatbot version of the model called Depseek Chat was also launched
DeepSeek-V2
In May 2024, Depseek-V2 was launched
The Financial Times reported that it was cheaper than its peers with a price of 2 rmb per million departure tokens
The Tiger Lab classification of the University of Waterloo classified Deepseek-V2 in the seventh place of its LLM classification
DeepSeek-V3
In December 2024, Depseek-V3 was launched
He arrived with 671 billion parameters and trained in around 55 days at a cost of 5.58 million dollars, using significantly less resources compared to their peers
He trained in a data set of 14.8 billion tokens
The reference tests showed that it exceeded Llama 3.1 and Qwen 2.5 while matched GPT-4o and Claude 3.5 Sonnet
Deepseek's optimization of limited resources highlighted the potential limits of USA sanctions on China's development
An opinion article by The Hill described the launch as the American AI arriving at his "Sputnik moment"
The model is a mixture of experts with multi-head latent attention transformer, which contains 256 enrupted experts and 1 shared expert. Each active token 37 billion parameters and more
On January 27, 2025, the Artificial Intelligence Assistant of the China Depseek startup surpassed ChatGPT as the top-rated free app on the USA App Store
It has caused debates about the effectiveness of USA export restrictions on advanced artificial intelligence chips to China
The DeepSeek-V3 model, which uses Nvidia's H800 chips, is gaining recognition for its competitive performance, challenging the global dominance of USA AI models.
DeepSeek R1
In November 2024, Depseek R1-Lite-Preview was launched, which was trained for logical inference, mathematical reasoning and real-time problem solving
Deepseek said that OpenAI o1's performance exceeded reference points such as Invitational Mathematics Examination (AIME) and MATH
However, The Wall Street Journal said that when he used 15 problems of AIME's 2024 edition, the o1 model reached a faster solution than Deepseek R1-Lite-Preview
On January 20, 2025, Depseek-R1 and Deepseek-R1-Zero were launched
They were based on V3-Base
Like V3, each is a mixture of experts with 671b of total parameters and 37b of activated parameters
They also launched some “Deepseek-R1-Distill” models, which are not based on R1
On the other hand, they are similar to other open weight models such as LLaMA and Qwen, adjusted with synthetic data generated by R1
R1-Zero trained exclusively through reinforcement learning (RL), without any supervised learning (SFT)
It was trained using group relative policy optimization (GRPO), which estimates the baseline from the group's scores instead of using a critical model
The reward system used is based on rules and consists mainly of two types of rewards: precision rewards and format rewards
The results of R1-Zero are not very legible and change between English and Chinese in them, so they trained it to address these problems and further improve reasoning
Concerns
Censorship
Some sources have observed that the official API version of R1 uses censorship mechanisms for issues that are considered politically sensitive to the Government of the People's Republic of China
For example, the model refuses to answer questions about the protests of the Tiananmén Plaza in 1989, the persecution of the Uigures or Human Rights in the People's Republic
AI can initially generate an answer, but shortly after it eliminates and replaces it with a message such as:
Sorry, that is beyond my current reach. Let's talk about something else
Integrated censorship mechanisms and restrictions can only be eliminated in the open source version of the R1 model
If you touch the fundamental socialist values Defined by Chinese Internet regulatory authorities or the political status of Taiwan is raised, discussions are terminated
When it was tested by NBC News, Deepseek's R1 described Taiwan as an inalienable part of the territory of China and declared:
We firmly oppose any form of separatist activity of Taiwan independence And we are committed to achieving the complete reunification of the homeland through peaceful means
Western researchers could in January 2025 deceive Deepseek to give precise answers to some of these issues by adapting the question asked
Security and privacy
There is also fear that the AI system can be used for foreign influence operations, disinformation, surveillance and development of cyber weapons for the Government of the People's Republic of China China
Deepseek's privacy terms and conditions establish the following:
We store the information that we collect in safe servers located in the People's Republic of China ... We can collect your text or audio entry, indications, charged files, comments, chat history or other content that provides our model and services
Although the data storage and collection policy is consistent with the privacy policy of ChatGPT, A press article reports that this represents a security problem
In response, the Italian Data Protection Authority is looking for additional information on the collection and use of personal data by Depseek and the United States National Security Council announced that it had initiated a national security review
However, when devile is used locally, the data is not shared publicly
Using DeepSeek
Account creation
Before being able to use Depseek it is necessary to have a registered account in the Deepseek system
For this we will use the following registration link
We will introduce our email and a password or use our Google account
We will go to our email account, which must be valid and real
And we will confirm the account registration by clicking on the verification email that they will send us
We will continue entering the rest of the user data requested in the form
And we can start using the Depseek Promot Chat Time whenever we accredit with the user and password of the account we have created
Installation at home
Apart from the Official Depseek API, we can also install the model locally on our device, for this we will use the client for artificial intelligence models Ollama
Ollama
Ollama is a program that you can install on any computer, both with operating system Windows as with macOS or GNU/Linux
It is a customer of artificial intelligence models, so it is the basis on which to then install the AI you want to use
Ollama has two particularities
-
Allows you to use an AI locally
This means that instead of going to the chat page with artificial intelligence of a company, the model is installed on your computer and you use it directly without entering any website
That favors us in the following ways:
- The data of everything you do stay on your PC, so that no company uses them
- You can use AI no Internet connection
- You can skip censures that has an artificial intelligence model you are using on a website
- However, what you cannot do is searches online to complete the information
-
It works through your computer terminal (the system symbol in Windows, a shell on macOS systems or GNU/Linux)
This does not have to use a separate application
When I install Ollama, then you will have to use the console of your device to install and execute in it the model you want, and the questions and the prompts you write them in the console, where you will also get their answers
Installing Ollama in Windows
It is as simple as accessing their website and click on the Download button
Now, you must choose the Windows Where do you want to install it (the minimum version is Windows 10)
Once chosen, click on the Download button
By default the web will show the system you are using, but you can download the executable of any other
When you download it, launch the installation program
Install Ollama is very simple, you just have to click on the next button on the presentation screen, and then click on the Install button on the installation screen
Once you have installed Ollama, launches the application
You will see that nothing happens (as an icon appears in the taskbar), this is because you have to open the terminal of your computer (with administrator permits), which in Windows It is called the system symbol
Now, before you start you have to go to the website where you'll see all available AI models
Since we want to use DeepSeek go to the web And you will get all the available links of that model
Choose well depending on the capacity of your machine in GB of memory and available hard disk space, use the model information to guide you
To make the examples I am going to use deepseek-coder-v2, since my machine only has 12 GB of RAM and the model occupies 8.9 GB on the hard drive
Once chosen, search the model information a right -hand tab that has a button that allows you to copy the text, since we will use it in the system symbol
In my case:
And I simply hit it in the system symbol and wait for the Deepseek model to be installed (only the first time) and that the cursor is put in promp mode, responding for the first time the model
The next time you use the model, you must hit the command again, but it will take less to answer because it will already be installed
Installing ollama on Android
Before starting, our device Android need to meet the following prerequisites:
- At least 4.5 GB RAM
- A stable Internet connection to download the thermux, Ollama and Deepseek model
- Android >= 7
In addition to Ollama, we are going to use the terminal emulator thermux
To install it we must go to the Thermux Development Website and choose the latest stable APK version for your version of Android
It can also be found in the PlayStore, But the latest stable version does not match or is older than the one you can find in the Thermux Development Website
We install the APK file on our device Android
We open Termux to access the terminal
Once within the terminal we need to grant Termux access to the storage of the device
To do this we will execute:
To have Termux and the updated packages we will run:
We wait for the update process to end
Now we will install Ollama executing the following command:
Now we will start ollama with the following command:
Now, before you start you have to go to the website where you'll see all available AI models
Since we want to use DeepSeek go to the web And you will get all the available links of that model
Choose well depending on the capacity of your machine in GB of memory and available hard disk space, use the model information to guide you
To make the examples I am going to use deepseek-coder-v2, since my machine only has 12 GB of RAM and the model occupies 8.9 GB on the hard drive
Once chosen, search the model information a right -hand tab that has a button that allows you to copy the text, since we will use it in the system symbol
In my case:
And I simply hit it in the system symbol and wait for the Deepseek model to be installed (only the first time) and that the cursor is put in promp mode, responding for the first time the model
The next time you use the model, you must hit the command again, but it will take less to answer because it will already be installed
Privacy
You have to be special care when we use Deepseek
By default our conversations are stored in a history and can be used to continue training Deepseek
If we do not want our data to be used to train, there is an option in the configuration of our account to deactivate the use of our data to train Deepseek (and the history of conversations)
PROMP
Depseek is trained to follow and execute the instructions that we provide
Our instructions are called prompts
They can be as simple or complex as we want, and can include additional information
For example, an example text, an image, a link to a web page...
We can "talk" to Depseek interactively
For example, asking you to complete or correct your previous answer
That means we can ask him something, and then refer to either our previous question or his answer:
We ask you to do it a little more serious
Effective PROMPS
Deepseek is quite literal interpreting our instructions, so it is convenient that we give all the necessary information to complete your tasks according to our expectations
Overall, a good prompt must include:
- Role: for Deepseek (expert in ..., assistant of ...)
- Context: the situation relative to the text we have to generate
- Instructions/tasks: What we need to do you do for us
- Format/style: if we want a formal letter, more modern, aggressive style... or if we need the response to be formatted in JSON for example
You can even ask Deepseek himself to give you more advice
Normally the PROMPTS are in English, but remember that you can ask Depseek to translate them into your language
Examples
Fairy tale
Let's ask you to write a fairy tale with a happy ending
We ask you to modify the ending of the story for a sadder one
We ask you to generate a moral to the story
Finally we ask you to generate the story but with Hansel and Gretel, encountering a dragon and a fairy godmother giving them advice to defeat it
Letter
We will simulate that we are an employee of the General Directorate of Public Works using Deepseek to help you in your day to day
We ask you to generate a letter to inform a user that a pipeline is going to be carried out that will pass through their property
We ask that you generate a second letter informing the user that their request to stop the work has been rejected
What can you do
There are many things that Depseek can do very well, just ask for them
Below we present some examples of applications:
Creative tasks
- Generate fictional stories
- Generate technical documentation (if we provide you with enough information)
- Texts for project proposals
- Reports
- Letters
- Brainstorming: Product names, titles of works...
Training
- Create text summaries
- Generate activities, test-type exercises
- Plan classes
- Generate agendas
- Code
Proofreading
- Review texts to correct grammar and spelling
- Change style:
- Depending on the audience (for a high school student, for a scientist...)
- Depending on the role of Deepseek ("Talk to the style and vocabulary of a professor of university literature/of a high school student ...")
Translation
- Deepseek has been trained with a corpus that includes a large number of languages and we can ask you to translate from/to them
It is convenient that after making a translation, we ask Depseek to review the text, correct literal translations and adjust the style and language to our audience
- You also know a large number of programming languages and we can ask you to transform code from one language to another
- We can also ask you to transform file formats (for example, data from csv format to json)
Code
- Code generation following instructions
We can specify whether we need type annotations (for example in Python) or unit tests
- Code validation
- Function Explanation
- Refactoring: using another library, changing variable names...
Reasoning
We can raise problems, challenges and complex issues to Deepseek based on well -specified assumptions and facts
To resolve it, the following promting techniques can be performed:
- IO (Direct Input/Output): we raise a problem and request the answer
- IO with refinement: we raise a problem, request the response and ask you to improve it
- CoT (Chain of Thoughts): We pose a problem and ask you to explain how you arrived at the solution step by step
- CoT-SC (Chain of Thoughts – Self Consistency): We apply Chain Of Thoughts several times and select the most consistent response (the one that is repeated the most times)
- Tree-of-thoughts: we generate a prompt that allows ChatGPT to explore different avenues of thought critically, until finding a satisfactory solution
In the next link You can find these techniques with more details about them and also some more complex techniques
IO (Direct Input/Output)
It is the most basic method, it consists in asking Depseek directly about the answer to our problem
It works correctly with simple problems, but will fail on complex problems
Although with the latest Deepseek training has learned to reason for steps even if we do not explicitly ask, and on many occasions it will generate the correct answer without the need for additional help
IO with refinement
A method that provides good results: we ask Deepseek to respond to our problem
And then in successive prompts we ask Deepseek to review and improve your answer
CoT (Chain of Thoughts)
We can explicitly ask Deepseek to reason about each stage of the process, or show an example with that reasoning for him to repeat it
CoT-SC (Chain of Thoughts – Self Consistency)
We will perform Chain-of-thought reasoning several times, and then we will select the most repeated answer (the most consistent between the different executions)
Tree-of-thoughts
We generate a prompt that allows Deepseek to explore different ways of thought critically, until you find a satisfactory solution
Limitations
But it is important to know them to avoid "surprises" when using Depseek
Provide complex answers if we do not give them time to reason
If we ask Deepseek to answer us to a complex question, Depseek will try to ask it in small steps
The output of each step serves as support for the following reasoning, and greatly improves your results
For example we ask you:
Now we ask you, letting you “think”:
When we have let him “think”, the result has been much better
Complex mathematical operations
The results will return them using the LaTeX label language
For example, multiplying numbers with more than 3 digits
Deepseek's response will normally approach the correct value (in this case 553,254), but it will not be exact
For example, operations with large square roots
The solution will be close to the correct value (in this case 44.988887516807970076138159027823), but it will not be exact
In seemingly simple arithmetic operations, we will begin to see effects with relatively large numbers
For example, we will probably get the correct answer for the value of 3 cubed but not for 333 cubed
Once you have given us an answer, if we do not correct it, it will give that result for good, and it is quite likely that Deepseek will reuse it in subsequent answers about the same operation
Provide or verify factual information
When we ask you for factual information, you do not have to answer us with reality
Even if it claims otherwise, it cannot verify whether the information is true or not with certainty
You also cannot give us the source of your data (it is not something supported by the algorithm)
For example, we are going to ask you about the equestrian monument to Espartero, but we want information about the one in Logroño, since there are several in Spain
We insist that we want the one from Logroño
He has answered that there is no, when there is, this is a case of hallucination
Although he has given us in an polite way, useful information about the city and its monuments
We politely ask you to list us if there are more in Spain
Access to updated information
The training corpus varies over time, but is static at any given time
We are going to try to ask him on April 18, 2025, when Akira Toriyama died
On the date of April 18, 2025, he could not answer me when Akira Toriyama died (which has been on March 1, 2024 with 68 years)
In fact he considers that he is still alive, but very kindly he tells me about his professional career
Know what date and time we are at
Deepseek presents erratic behavior in this regard, sometimes he answers, sometimes he says he does not have that information, and sometimes gives us a wrong date
We are going to try to ask you on April 18, 2025, what time is it
Access to information about Deepseek itself
Deepseek does not know its current version, the value of its configuration parameters, etc.
As with the factual data, sometimes Depseek can answer us as if he knew the answer, but we cannot trust that it is true