Brett Owers
← All Projects

Haiku Detector

Production

June 1, 2023

A syllable-counting validation tool that determines whether text follows the 5-7-5 haiku form. Loads a library of syllable counts and checks content with good accuracy. Currently useful as a validation gate for AI-generated haikus — generate daily haikus, validate with the detector, regenerate if they do not pass.

Purpose

Built a tool that loads a large syllable dictionary and validates whether a given piece of text follows the 5-7-5 haiku syllable pattern. Useful for quality-controlling the 101 Potato Haikus pipeline and potentially for a daily haiku feature on the Potato Literature website.

Stack

JavaScriptSyllable DictionaryNLPValidationText Processing

What I Learned

  • Syllable counting in English is harder than it sounds. "Fire" — one syllable or two? "Poem" — one or two? "Comfortable" — three or four? English does not have consistent syllable rules the way Japanese does (where haiku originated and every mora is unambiguous). The approach: load a large dictionary with known syllable counts (like the CMU Pronouncing Dictionary), and for unknown words, fall back to heuristic rules (count vowel groups, subtract silent e's, handle common suffixes).
  • The CMU Pronouncing Dictionary maps ~130,000 English words to their phonetic pronunciation using ARPAbet notation. Each entry includes stress markers on vowels (0=unstressed, 1=primary, 2=secondary). Counting the vowel phonemes gives you the syllable count. For example: "POTATO" → P AH0 T EY1 T OW2 → three vowels → three syllables. This dictionary is the backbone of most English syllable counters.
  • Haiku validation is a binary gate: 5-7-5 or not. This makes it a perfect automated quality check — no subjective judgment, just counting. AI can generate haikus all day, but AI models frequently misccount syllables (they tokenize text differently than humans syllabify it). A deterministic syllable counter as a validation gate catches what the AI misses.
  • The accuracy gap comes from words not in the dictionary — proper nouns, slang, neologisms, brand names. "Potatuhs" is not in any syllable dictionary. The heuristic fallback has to handle it (Po-ta-tuhs → 3 syllables, which is correct). Getting the heuristics right for edge cases is where the tool goes from "cool demo" to "actually useful."

Key Insights

  • The haiku detector has a clear application in the Potatuhs ecosystem: Potato Literature wants a daily haiku on potatoliterature.com. AI can generate candidate haikus on any topic (potatoes, seasons, cooking, farming). The detector validates syllable counts. If a haiku fails validation, generate another. This creates an automated pipeline: prompt → generate → validate → publish if valid, retry if not. Zero human intervention for daily content.
  • Validation tools are more valuable than generation tools. Anyone can generate content with AI. The bottleneck is knowing whether the content is correct. A syllable counter for haikus, a grammar checker for prose, a linter for code — these are the quality gates that make automated content pipelines trustworthy. The generator is the engine. The validator is the brakes. You need both.
  • The 5-7-5 rule is actually debated among haiku practitioners — traditional Japanese haiku counts morae (sound units), not syllables, and many modern English haiku poets use fewer syllables for a closer approximation of the Japanese brevity. But for the 101 Potato Haikus series, 5-7-5 is the standard, and the detector enforces it. Knowing the rule well enough to debate it is different from enforcing it consistently at scale.
  • This tool connecting to the Potato Literature daily haiku feature is a small example of the larger Potatuhs pattern: build tools that serve the ecosystem. The haiku detector is not a standalone product. It is infrastructure for a content pipeline that feeds a division of the brand. Tools that serve the ecosystem are more durable than tools that serve the market.
#NLP#syllable-counting#haiku#validation#Potato-Literature#Potatuhs#CMU-dictionary#automation#content-pipeline#text-processing

This post was composed through a conversation between Brett Owers and Claude Code (Anthropic). The content reflects Brett's recollection of each project and the lessons drawn from it. Some details may be approximate or omitted — the purpose is to paint an honest picture of a software engineer's development over time, not to serve as a precise historical record.