Qwen3-235B-A22B-Thinking-2507: A Major Upgrade to Open-Source Reasoning Models

2025-07-25
Qwen3-235B-A22B-Thinking-2507: A Major Upgrade to Open-Source Reasoning Models

Qwen3-235B-A22B-Thinking-2507 represents a significant upgrade to open-source large language models, boasting groundbreaking advancements in reasoning capabilities. It achieves state-of-the-art results on logical reasoning, mathematics, science, coding, and academic benchmarks, demonstrating superior performance across various complex tasks. The model also exhibits improved general capabilities such as instruction following, tool usage, text generation, and alignment with human preferences, along with enhanced 256K long-context understanding. Crucially, this version operates in 'thinking mode' by default and is highly recommended for complex reasoning tasks.

Read more

SmolLM3: A Tiny, Multilingual, Long-Context Reasoner

2025-07-09
SmolLM3: A Tiny, Multilingual, Long-Context Reasoner

SmolLM3 is a fully open-source 3B parameter multilingual language model that strikes a compelling balance between efficiency and performance. Outperforming Llama-3.2-3B and Qwen2.5-3B on various benchmarks, it even competes with larger 4B parameter models. Supporting 6 languages and boasting a context length of up to 128k tokens, SmolLM3 features a unique dual-mode reasoning capability (think/no_think). Beyond the model itself, the researchers are releasing the complete engineering blueprint, including architecture details, data mixtures, and training methodology—a valuable resource for anyone building or studying models at this scale.

Read more

Nanonets-OCR-s: Beyond Traditional OCR with Intelligent Document Processing

2025-06-16
Nanonets-OCR-s: Beyond Traditional OCR with Intelligent Document Processing

Nanonets-OCR-s is a state-of-the-art image-to-markdown OCR model that surpasses traditional text extraction. It transforms documents into structured markdown with intelligent content recognition and semantic tagging, ideal for downstream processing by Large Language Models (LLMs). Key features include LaTeX equation recognition, intelligent image description, signature detection, watermark extraction, smart checkbox handling, and complex table extraction. The model can be used via transformers, vLLM, or docext.

Read more
AI

Penny-1.7B: A 19th-Century Irish Prose Style Language Model

2025-06-02
Penny-1.7B: A 19th-Century Irish Prose Style Language Model

Penny-1.7B is a 1.7 billion parameter causal language model fine-tuned with Group Relative Policy Optimization (GRPO) to mimic the 19th-century prose style of the 1840 Irish Penny Journal. A reward model distinguishes original journal text from modern translations, maximizing authenticity. Ideal for creative writing, educational content, or stylistic pastiche in Victorian-era Irish English, but not recommended for contemporary fact-checking.

Read more
AI

Hugging Face Hosts New 685B Parameter DeepSeek LLM

2025-05-28
Hugging Face Hosts New 685B Parameter DeepSeek LLM

A new large language model, DeepSeek-R1-0528, boasting a massive 685 billion parameters, has been released on Hugging Face. The model is available in Safetensors format and supports tensor types including BF16, F8_E4M3, and F32. Currently, no inference providers have deployed the model, but its Hugging Face page provides details such as model card, files, and versions.

Read more
AI

Hugging Face Launches Free MCP Course: Your Gateway to Model Context Protocol

2025-05-21
Hugging Face Launches Free MCP Course: Your Gateway to Model Context Protocol

Hugging Face has launched a free Model Context Protocol (MCP) course designed to take learners from beginner to expert. The course covers MCP theory, design, and practice, along with building applications using established MCP SDKs and frameworks. Participants can earn a certificate of completion by finishing assignments and compete in challenges. The curriculum also includes units collaborating with Hugging Face partners, providing access to the latest MCP implementations and tools. Prerequisites include a basic understanding of AI and LLMs, software development principles and APIs, and experience with at least one programming language (Python or TypeScript examples provided).

Read more
AI

Critical Analysis: The Case Against Fully Autonomous AI Agents

2025-02-08
Critical Analysis:  The Case Against Fully Autonomous AI Agents

This paper critically analyzes the argument against developing fully autonomous AI agents. While structured, rigorous, and highlighting real risks like safety hazards and privacy breaches, it suffers from an overly absolute stance, a vague definition of 'fully autonomous,' an unbalanced risk-benefit analysis, and insufficient exploration of mitigation strategies. It also displays hints of technological determinism. Improvements could include softening the absolute rejection, clarifying the definition of autonomy, balancing the analysis, developing mitigation strategies, and strengthening the empirical basis. Ultimately, it's a valuable contribution to the ongoing AI ethics debate, but not a definitive conclusion.

Read more
AI

Open-R1: Open-Source Reproduction of DeepSeek-R1 Reasoning Model

2025-01-28
Open-R1: Open-Source Reproduction of DeepSeek-R1 Reasoning Model

DeepSeek-R1's impressive reasoning capabilities have captivated the AI community, but its training details remain undisclosed. The Open-R1 project aims to fully reproduce DeepSeek-R1 in the open source, including datasets and training pipeline. This will involve distilling a high-quality reasoning dataset from DeepSeek-R1, replicating its pure reinforcement learning training process, and exploring multi-stage training methods. The ultimate goal is to create a transparent and reproducible reasoning model, driving advancements within the open-source community.

Read more
AI

Janus-Pro-7B: A Unified Multimodal Understanding and Generation Model

2025-01-27
Janus-Pro-7B: A Unified Multimodal Understanding and Generation Model

DeepSeek introduces Janus-Pro-7B, a novel autoregressive framework unifying multimodal understanding and generation. Unlike previous approaches, Janus-Pro cleverly decouples visual encoding, enabling efficient processing within a single transformer architecture. This decoupling not only resolves the conflict between the visual encoder's roles in understanding and generation but also enhances the framework's flexibility. Janus-Pro surpasses previous unified models and matches or exceeds the performance of task-specific models. Its simplicity, high flexibility, and effectiveness make it a strong contender for next-generation unified multimodal models.

Read more
AI

DeepSeek-R1: A Reasoning Model Trained via Reinforcement Learning and its Distilled Versions

2025-01-20
DeepSeek-R1: A Reasoning Model Trained via Reinforcement Learning and its Distilled Versions

DeepSeek has released its first-generation reasoning models, DeepSeek-R1. Trained via large-scale reinforcement learning without supervised fine-tuning, DeepSeek-R1 addresses issues like endless repetition and poor readability present in its predecessor, DeepSeek-R1-Zero, by incorporating cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across various benchmarks. Furthermore, DeepSeek has open-sourced DeepSeek-R1 and six distilled models based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B surpasses OpenAI-o1-mini on multiple benchmarks, setting new state-of-the-art results for distilled models. These models, along with a user-friendly API and chat interface, are available on Hugging Face.

Read more

400x Faster Static Embedding Models with Sentence Transformers

2025-01-15
400x Faster Static Embedding Models with Sentence Transformers

This blog post introduces a method to train static embedding models that are 100x to 400x faster on CPU than state-of-the-art embedding models, while maintaining most of the quality. This unlocks exciting use cases like on-device and in-browser execution. Two highly efficient models are presented: sentence-transformers/static-retrieval-mrl-en-v1 for English retrieval and sentence-transformers/static-similarity-mrl-multilingual-v1 for multilingual similarity. These models achieve at least 85% of the performance of counterparts like all-mpnet-base-v2 and multilingual-e5-small, while being significantly faster on CPU.

Read more

ModernBERT: A Revolutionary BERT Replacement

2024-12-19
ModernBERT: A Revolutionary BERT Replacement

Answer.AI and LightOn introduce ModernBERT, a family of state-of-the-art encoder-only models that outperform BERT in both speed and accuracy. ModernBERT incorporates numerous advancements from recent LLM research, boasting an extended context length (8192 tokens), faster processing, and superior performance across various benchmarks. Its particularly strong code retrieval capabilities unlock new applications like large-scale code search and enhanced IDE features. ModernBERT is a drop-in replacement for BERT models and is available on Hugging Face.

Read more

Hugging Face Spaces Launches ZeroGPU: Dynamic GPU Allocation for Enhanced AI Model Efficiency

2024-12-15
Hugging Face Spaces Launches ZeroGPU: Dynamic GPU Allocation for Enhanced AI Model Efficiency

Hugging Face Spaces has introduced ZeroGPU, a shared infrastructure that dynamically allocates NVIDIA A100 GPUs to optimize GPU usage for AI models and demos. ZeroGPU offers free GPU access, multi-GPU support, and lowers the barrier to entry for deploying AI models. Users simply select ZeroGPU hardware when creating a Gradio Space and use the `@spaces.GPU` decorator for GPU-dependent functions. ZeroGPU is compatible with PyTorch and optimized for Hugging Face's transformers and diffusers libraries, but currently only works with the Gradio SDK. Personal accounts (PRO users) can create up to 10 ZeroGPU Spaces, while organization accounts (Enterprise Hub) can create up to 50.

Read more