Sampling Big Data: Small Samples, Big Answers

2025-05-31
Sampling Big Data:  Small Samples, Big Answers

Hadley Wickham's recent interview highlighted that many big data problems are actually small data problems, given the right subset, sample, or summary. This post delves into efficient sampling for big data analysis. Using the example of Goatly, a company serving narcoleptic goats, the author demonstrates how to calculate the appropriate sample size for logistic regression. The conclusion is that approximately 2345 samples are needed to accurately represent 100,000 farms. The post also details Python scripts and online tools for sample size calculation, and briefly touches on the concept of statistical power.

Read more

Hacker News: A Decade of Tech Growth

2025-03-18
Hacker News: A Decade of Tech Growth

Starting in 2011, the author began using Hacker News, initially understanding very little of the technical jargon and companies mentioned. However, through daily reading and deep dives into unfamiliar concepts, the author transformed from a data analyst into an engineer confidently deploying code to millions of users. Hacker News provided not only learning resources but also a supportive community, helping the author improve technical skills and writing, ultimately leading to a significant career leap.

Read more
Development technical learning

LLMs: Exploring Arithmetic Capabilities in the Pursuit of AGI

2024-12-24
LLMs: Exploring Arithmetic Capabilities in the Pursuit of AGI

This article explores why large language models (LLMs) are being used for calculation. While LLMs excel at natural language processing, researchers are attempting to make them perform mathematical operations, from simple addition to complex theorem proving. This isn't to replace calculators, but to explore the reasoning capabilities of LLMs and ultimately achieve artificial general intelligence (AGI). The article points out that humans have always tried to use new technology for computation, and testing the mathematical abilities of LLMs is a way to test their reasoning abilities. However, the process of LLMs performing calculations is drastically different from that of calculators; the former relies on vast knowledge bases and probabilistic models, while the latter is based on deterministic algorithms. Therefore, LLM calculation results are not always accurate and reliable, highlighting the trade-off between practicality and research.

Read more