AI Writes Code, AI Reviews It? Is That Silly?

2025-05-01
AI Writes Code, AI Reviews It? Is That Silly?

Daksh, co-founder of Greptile, discovered that an AI code generation tool, Devin, was submitting more pull requests than any human engineer. This raises the intriguing question: should AI-generated code be reviewed by AI itself? While LLMs are stateless, each call is independent, this doesn't mean AI perfectly reviews its own code. AI-generated code, while boosting efficiency, may introduce bugs humans struggle to find. Research shows AI is more effective than humans at finding certain types of bugs, although its accuracy still needs improvement. Ultimately, the article argues that while not perfect, AI code review is more effective than humans at finding specific bug types introduced by AI itself.

Read more
Development

Getting LLMs to Generate Funny Memes: Surprisingly Hard

2025-01-06
Getting LLMs to Generate Funny Memes: Surprisingly Hard

A University of Waterloo intern attempted to build an app using LLMs and the Greptile API to generate memes that roast GitHub repositories. The process proved unexpectedly challenging. Directly prompting the LLM for roasts yielded generic results. The solution involved separating the task into code analysis (using Greptile to pinpoint specific issues) and roast generation (using the LLM to create targeted humor). Image generation proved difficult due to limitations in handling text, leading to the use of pre-built meme templates and node-canvas for text insertion. Despite the hurdles, the project culminated in reporoast.com, a website capable of generating custom code-roasting memes.

Read more
Development Meme Generation

How an AI Code Review Bot Learned to Shut Up

2024-12-21
How an AI Code Review Bot Learned to Shut Up

Greptile's AI code review bot initially faced criticism for generating excessive comments. To address this, they experimented with prompt engineering and having the LLM evaluate its own comments, but these methods proved ineffective. Their breakthrough came from vectorizing past comments, clustering them in a vector database, and filtering out new comments similar to those previously downvoted. This approach boosted the developer address rate from 19% to over 55%, significantly reducing LLM noise.

Read more
Development Code Review