Qodo Command Achieves Stunning 71.2% on SWE-bench Verified

2025-08-12
Qodo Command Achieves Stunning 71.2% on SWE-bench Verified

Qodo Command, a command-line AI coding agent, achieved an impressive 71.2% score on the SWE-bench Verified benchmark, a leading test for evaluating AI agents on real-world software engineering tasks. This score was achieved using the production version of Qodo Command without fine-tuning or benchmark-specific adjustments. Its success stems from features like context summarization, execution planning, retry and fallback mechanisms, and the LangGraph framework. Built to support multiple LLMs, Qodo Command currently partners with Anthropic's Claude 4 to create adaptive and learning-oriented coding agents.

Read more
Development

GPT-5 Excels in Qodo's Code Review Benchmark

2025-08-08
GPT-5 Excels in Qodo's Code Review Benchmark

Qodo used its private PR Benchmark, simulating real-world code review workflows, to evaluate top language models including GPT-5. Results showed GPT-5 excelled at understanding code diffs, identifying bugs, and suggesting improvements. Its 'minimal' variant balanced speed and quality impressively. While GPT-5 had some weaknesses like false positives and inconsistent labeling, its overall code review performance was striking, marking significant progress in AI-assisted code review.

Read more
Development

Qodo Gen CLI: Automate Your SDLC with AI Agents

2025-06-25
Qodo Gen CLI: Automate Your SDLC with AI Agents

Qodo Gen CLI is a powerful command-line interface for building, managing, and running AI agents. Developers can create custom agents to automate workflows across the entire software development lifecycle (SDLC), integrating AI capabilities into any IDE. Supporting leading LLMs and flexible deployment options, Qodo Gen CLI offers both terminal and browser-based interfaces. Automate tasks like code review, documentation generation, and test coverage, boosting efficiency and allowing developers to focus on building features.

Read more
Development SDLC automation

AI Code Generation: Accuracy and Confidence are Key

2025-06-12
AI Code Generation: Accuracy and Confidence are Key

Only 3.8% of developers report experiencing both low hallucinations and high confidence in shipping AI-generated code. These are the teams truly benefiting from AI in production. They trust the suggestions, ship faster, and close the loop with high-quality feedback. Among this low-hallucination group, those who are also confident (17%) report a 1.3x higher likelihood of seeing code quality gains (44% vs. 35%) and 2.5x greater confidence in shipping AI code (24% vs. 9%). This is the 'sweet spot,' where over half (53%) report clear improvements in code quality. This highlights the strong link between accuracy, quality, and confidence: developers seeing fewer errors and higher-quality output are much more likely to trust and use AI in production. Low hallucinations make developers 1.3x more likely to report improved code quality (44% vs. 35% overall), but most developers, even with accurate output, remain hesitant. Automated quality checks can bridge this gap.

Read more
Development developer confidence

Debugging Java Logic Errors with Unit Tests

2025-05-07
Debugging Java Logic Errors with Unit Tests

Logic errors in Java development are notoriously difficult to debug using traditional methods. This article introduces a test-driven debugging approach, utilizing unit tests to discover and pinpoint logic errors. It details various testing techniques, including hypothesis testing, state progression tests, and regression testing, and explains how to leverage test results to understand code behavior and ultimately improve logic. The article also mentions AI-assisted unit testing tools that can help developers more effectively uncover potential logic vulnerabilities.

Read more
Development Logic Errors

LangGraph: Building a Flexible, Opinionated AI Coding Assistant

2025-03-24
LangGraph: Building a Flexible, Opinionated AI Coding Assistant

Qodo built an AI coding assistant using the LangGraph framework, balancing flexibility with adherence to coding best practices. Initially, they used predefined workflows for coding tasks, but with the advent of more powerful LLMs like Claude Sonnet 3.5, they shifted to LangGraph's graph-based approach. LangGraph allows building agents ranging from completely open-ended to fully structured deterministic flows, enabling Qodo to adjust the structure of their flows based on LLM capabilities. The framework's clean API, reusable components, and built-in state management simplified development and support persistence, checkpoints, and branch points. While documentation and testing present some challenges, LangGraph provided a solid foundation for Qodo to build a robust AI coding assistant.

Read more
Development

Qodo Gen 1.0: Agentic AI Coding with LangGraph and MCP

2025-03-18
Qodo Gen 1.0: Agentic AI Coding with LangGraph and MCP

Qodo Gen 1.0 introduces agentic workflows in its AI coding and testing IDE plugin, enabling AI to dynamically decide how to navigate complex coding tasks. This was achieved by restructuring the infrastructure using LangGraph for structured workflows and Anthropic's Model Context Protocol (MCP) for standardized external tool integration. The architecture supports asynchronous communication, on-demand context retrieval, and enhanced error handling and reliability, allowing the AI to operate autonomously, retrieve real-time data, and adapt strategies based on tool execution results. LangGraph provides flexibility and control, while MCP simplifies external tool integration. The result is more intelligent automation, an extensible system, and a structured approach to AI autonomy.

Read more
Development

Qodo-Embed-1: A Family of Efficient, Small Code Embedding Models

2025-03-03
Qodo-Embed-1: A Family of Efficient, Small Code Embedding Models

Qodo announced Qodo-Embed-1, a new family of code embedding models achieving state-of-the-art performance with a significantly smaller footprint than existing models. The 1.5B parameter model scored 68.53 on the CoIR benchmark, surpassing larger 7B parameter models. Trained using synthetic data generation to overcome limitations of existing models in accurately retrieving code snippets, Qodo-Embed-1 significantly improves code retrieval accuracy and efficiency. The 1.5B parameter model is open-source, while the 7B parameter model is commercially available.

Read more

Building a Robust Evaluation Framework for RAG Systems

2025-02-14
Building a Robust Evaluation Framework for RAG Systems

Qodo built a Retrieval Augmented Generation (RAG)-based AI coding assistant and developed a robust evaluation framework to ensure accuracy and comprehensiveness. Challenges included verifying the correctness of RAG outputs derived from large, private datasets. The framework evaluates the final retrieved documents and the final generated output, focusing on 'answer correctness' and 'retrieval accuracy'. To address the challenges of natural language outputs, they employed an 'LLM-as-judge' approach and built a ground truth dataset with real questions, answers, and context. For efficiency, they leveraged LLMs to assist in dataset construction and used LLMs and RAGAS to evaluate answer correctness. Ultimately, they built their own LLM judge and combined it with RAGAS for improved reliability, integrating it into their workflow with regression testing, dramatically reducing the effort to verify code changes' impact on quality.

Read more
Development LLM Evaluation

Qodo Merge 1.0: AI-Powered Code Review Evolves

2025-02-02
Qodo Merge 1.0:  AI-Powered Code Review Evolves

Qodo Merge 1.0, an AI-driven code review tool, addresses inherent challenges in AI-assisted coding after over a year of development. The new version features a focus-on-problems mode prioritizing critical issues like bugs and security flaws; dynamic learning that refines suggestions based on accepted changes; real-time ticket context integration; and a `/implement` command to translate feedback into actionable code changes. Qodo Merge 1.0 makes code review more precise, adaptive, and efficient.

Read more
Development AI Code Review

Effective AI Code Suggestions: Less is More

2025-01-29
Effective AI Code Suggestions: Less is More

Qodo (formerly Codium) discovered a crucial lesson in using LLMs for code review with its AI-powered tool, Qodo Merge. Initially, prioritizing bug detection over style suggestions proved ineffective; the model got overwhelmed by the easier-to-find style issues, leading to suggestion fatigue among developers. The breakthrough came from simplifying the model's task: focusing solely on finding meaningful bugs and problems. This laser focus increased bug detection rates and the signal-to-noise ratio, resulting in a 50% jump in suggestion acceptance rates and an 11% increase in overall impact. The key takeaway: sometimes, eliminating distractions is more effective than complex prioritization.

Read more
Development

Open-Source LLM DeepSeek-R1 Integrated into Qodo Gen

2025-01-27
Open-Source LLM DeepSeek-R1 Integrated into Qodo Gen

Qodo (formerly Codium) announced the integration of DeepSeek-R1, a powerful open-source large language model comparable to OpenAI's o1, into its AI-powered coding assistant, Qodo Gen. Known for its strong reasoning capabilities and cost-effectiveness, DeepSeek-R1 handles complex coding challenges, generating responses faster and at a lower cost than many proprietary models. Qodo Gen supports multiple top-tier LLMs, providing developers with a secure and reliable AI-assisted coding experience.

Read more
Development AI Coding Assistant

VS Code's Python Debugger: Beyond Print Statements

2025-01-10
VS Code's Python Debugger: Beyond Print Statements

Tired of peppering your Python code with print statements? Visual Studio Code's powerful debugging features will revolutionize your workflow. This tutorial covers setting up VS Code's Python debugger, managing breakpoints, inspecting variables, and advanced techniques like exception handling, remote debugging, and performance analysis. Learn how to efficiently debug your Python code, leaving behind the inefficient print-statement era, and boost your development efficiency.

Read more
Development Python debugging