Bits, Bytes and Neural Networks

Jekyll2026-01-29T13:46:16+00:00https://bits-bytes-nn.github.io/feed.xmlBits, Bytes and Neural NetworksA tech blog focusing on AI/ML paper reviews and latest research trends DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models2025-12-02T09:25:14+00:002025-12-02T09:25:14+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2025/12/02/deepseek-v3.2--pushing-the-frontier-of-open-large-language-modelsDeepSeek AI

Kimi K2: Open Agentic Intelligence2025-07-28T05:35:43+00:002025-07-28T05:35:43+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2025/07/28/kimi-k2--open-agentic-intelligenceMoonshot AI

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities2025-07-07T17:36:04+00:002025-07-07T17:36:04+00:00https://bits-bytes-nn.github.io/paper%20reviews/multimodal-learning/2025/07/07/gemini-2.5--pushing-the-frontier-with-advanced-reasoning--multimodality--long-context--and-next-generation-agentic-capabilitiesGoogle DeepMind

Qwen3 Technical Report2025-05-14T13:41:34+00:002025-05-14T13:41:34+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2025/05/14/qwen3-technical-reportAlibaba Group

Gemma 3 Technical Report2025-03-25T15:52:34+00:002025-03-25T15:52:34+00:00https://bits-bytes-nn.github.io/paper%20reviews/multimodal-learning/2025/03/25/gemma-3-technical-reportGoogle DeepMind

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning2025-01-22T15:19:35+00:002025-01-22T15:19:35+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2025/01/22/deepseek-r1--incentivizing-reasoning-capability-in-llms-via-reinforcement-learningDeepSeek AI

DeepSeek-V3 Technical Report2024-12-27T04:03:16+00:002024-12-27T04:03:16+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2024/12/27/deepseek-v3-technical-reportDeepSeek AI

Tulu 3: Pushing Frontiers in Open Language Model Post-Training2024-11-22T18:44:04+00:002024-11-22T18:44:04+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2024/11/22/tulu-3--pushing-frontiers-in-open-language-model-post-trainingAllen Institute for AI

Pixtral 12B2024-10-09T17:16:22+00:002024-10-09T17:16:22+00:00https://bits-bytes-nn.github.io/paper%20reviews/multimodal-learning/2024/10/09/pixtral-12bMistral AI

Gemma 2: Improving Open Language Models at a Practical Size2024-07-31T19:13:07+00:002024-07-31T19:13:07+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2024/07/31/gemma-2--improving-open-language-models-at-a-practical-sizeGoogle DeepMind

The Llama 3 Herd of Models2024-07-31T17:54:27+00:002024-07-31T17:54:27+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2024/07/31/the-llama-3-herd-of-modelsMeta AI

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model2024-05-07T15:56:43+00:002024-05-07T15:56:43+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2024/05/07/deepseek-v2--a-strong--economical--and-efficient-mixture-of-experts-language-modelDeepSeek AI

From Local to Global: A Graph RAG Approach to Query-Focused Summarization2024-04-24T18:38:11+00:002024-04-24T18:38:11+00:00https://bits-bytes-nn.github.io/paper%20reviews/retrieval-augmented-generation/2024/04/24/from-local-to-global-a-graph-rag-approach-to-query-focused-summarizationMicrosoft Research

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models2024-02-05T18:55:32+00:002024-02-05T18:55:32+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2024/02/05/deepseekmath-pushing-the-limits-of-mathematical-reasoning-in-open-language-modelsDeepSeek AI

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval2024-01-31T18:30:21+00:002024-01-31T18:30:21+00:00https://bits-bytes-nn.github.io/paper%20reviews/retrieval-augmented-generation/2024/01/31/raptor-recursive-abstractive-processing-for-tree-organized-retrievalStanford University

DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence2024-01-25T14:17:53+00:002024-01-25T14:17:53+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2024/01/25/deepseek-coder-when-the-large-language-model-meets-programming-the-rise-of-code-intelligenceDeepSeek AI

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models2024-01-11T17:31:42+00:002024-01-11T17:31:42+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2024/01/11/deepseekmoe-towards-ultimate-expert-specialization-in-mixture-of-experts-language-modelsDeepSeek AI

Mixtral of Experts2024-01-08T18:47:34+00:002024-01-08T18:47:34+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2024/01/08/mixtral-of-expertsMistral AI

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism2024-01-05T18:59:13+00:002024-01-05T18:59:13+00:00https://bits-bytes-nn.github.io/paper%20reviews/language-models/2024/01/05/deepseek-llm-scaling-open-source-language-models-with-longtermismDeepSeek AI

Gemini: A Family of Highly Capable Multimodal Models2023-12-19T02:39:27+00:002023-12-19T02:39:27+00:00https://bits-bytes-nn.github.io/paper%20reviews/multimodal-learning/2023/12/19/gemini--a-family-of-highly-capable-multimodal-modelsGoogle DeepMind