A Senior Data Scientist's Pragmatic Take on Generative AI

2025-05-05
A Senior Data Scientist's Pragmatic Take on Generative AI

A senior data scientist at BuzzFeed shares his pragmatic approach to using large language models (LLMs). He doesn't view LLMs as a silver bullet but rather as a tool to enhance efficiency, highlighting the importance of prompt engineering. The article details his successful use of LLMs for tasks like data categorization, text summarization, and code generation, while also acknowledging their limitations, particularly in complex data science scenarios where accuracy and efficiency can suffer. He argues that LLMs are not a panacea but, when used judiciously, can significantly boost productivity. The key lies in selecting the right tool for the job.

Read more
AI

Beyond Vector Databases: Efficient Text Embedding Processing with Parquet and Polars

2025-02-24
Beyond Vector Databases: Efficient Text Embedding Processing with Parquet and Polars

This article presents a method for efficient text embedding processing without relying on vector databases. The author uses Parquet files to store tabular data containing Magic: The Gathering card embeddings and their metadata, and leverages the Polars library for fast similarity search and data filtering. Polars' zero-copy feature and excellent support for nested data make this approach faster and more efficient than traditional CSV or Pickle methods, maintaining high performance even when filtering the dataset. The author compares other storage methods such as CSV, Pickle, and NumPy, concluding that Parquet combined with Polars is the optimal choice for handling medium-sized text embeddings, with vector databases only becoming necessary for extremely large datasets.

Read more
Development text embeddings

Can Iterative Prompting Make LLMs Write Better Code?

2025-01-03
Can Iterative Prompting Make LLMs Write Better Code?

This blog post details an experiment exploring whether repeatedly prompting a Large Language Model (LLM) to "write better code" improves code quality. Using Claude 3.5 Sonnet, the author starts with a simple Python coding problem and iteratively prompts the LLM. Performance improves dramatically, achieving a 100x speedup. However, simple iterative prompting leads to over-engineering. Precise prompt engineering yields far more efficient code. The experiment shows LLMs can assist code optimization, but human intervention and expertise remain crucial for quality and efficiency.

Read more