Benchmarking LLMs for Long-Form Creative Writing

2025-04-10

This benchmark assesses large language models' ability to create long-form narratives. It evaluates brainstorming, revision, and writing eight 1000-word chapters. Metrics include chapter length, fluency (avoiding overused phrases), repetition, and the degradation of writing quality across chapters. A final score (0-100) is assigned by an evaluation LLM.

Read more