Gemini 2.5 Object Detection: A Surprisingly Good Match for YOLOv3?

2025-07-10

This benchmark tests Google's Gemini 2.5 Pro multimodal large language model on object detection. Using the MS-COCO dataset, the focus is on bounding box accuracy. Results show Gemini 2.5 Pro achieves a mean Average Precision (mAP) of roughly 0.34, comparable to YOLOv3 from 2018, but significantly behind state-of-the-art models at ~0.60 mAP. While Gemini's versatility across open-ended tasks is impressive, CNNs remain faster, cheaper, and easier to reason about, especially with good training data.

Read more

Spegel: A Terminal Browser Using LLMs to Rework Webpages

2025-07-02
Spegel: A Terminal Browser Using LLMs to Rework Webpages

Spegel is a proof-of-concept terminal web browser that leverages LLMs to transform HTML into markdown, rendering it directly in your terminal. Built as a weekend project, its practicality was significantly boosted by the release of Google's faster Gemini 2.5 Pro Lite. Spegel allows for personalized views through custom prompts, such as extracting only essential recipe information. While lacking POST request support, it streamlines browsing by focusing on user-defined needs, offering a cleaner, less cluttered experience than traditional terminal browsers.

Read more