One day build - Haiku Revealer
Today I am building Haiku Finder
Haiku Scanner?
Haiku Revealer: Find the poetry within.
To skip the story, the end product is here: Haiku Revealer
A few years ago, I built a mac app haiku writing tool. The core function is counting syllables in a word.
Concept: A new application of the same core talent: syllable-counting. Detect haiku in existing text. Scan massive text corpora. See what’s hiding.
Why?
Why do things carry more weight in verse? I don’t know, ask Shakespeare. Ask a Baptist preacher, or Johnny Cochrane who said: “If it doesn’t fit, you must acquit”
Rhyming in sound, or fitting to the form of a verse implies the words belong together or speak to a deeper truth.
Scanning from tweets
Why does this tweet:
but the question is did you make it better or did you make it worse ?
gain something as a haiku, to my eye:
but the question is
did you make it better or
did you make it worse ?
Maybe it gives the author credit they don’t deserve for setting it in verse.
More Tweets (not originally posted as haiku):
you open mouthed kissed
a damn horse when you were 8
ughh shall i say more
your ass is so nice
that it is a shame that you
have to sit on it
not great separation between first 2 lines. but 2 and 3 end so well:
i hate my hair i’m
about to be up all night
fighting w/ this shit
daylight savings time
is the worst invention of
the 20th century
now he wanna kiss
me thru the phone n shit cuz
the sports wave dried up
Define “Detecting Haiku”:
We look for a 17 syllable sentence or phrase with word breaks after 5 & 12 syllables.
Define “phrase”: bounded by quotes, commas, and other algorithmic glog i delegated to claude code.
Implementation:
- Port the old swift app to Javascript
- Run on client side, via web page.
The user interface: allow uploading text file, or paste into text field.
Building it:
As a pure clientside javascript app, that app will need to download my custom syllable dictionary. It’s just a few megabytes, but I looked into compressing it, or storing the words as only hash codes. It would have reduced the actual download by only 10-15% and sacrificed readability, and would have needed a custom deterministic hash function.
Me: Claude Code, Build the web version.
Claude Code: Ok. Done.
Me: Checks it. it works.
It works perfectly. It’s fast. It’s got a nice user design that I don’t even bother specifying anymore because claude is better at user design than me now, and faster and smarter than me, and has write access to my web server directories. And it also fixed my typos in one file and knew what I meant to type. Oh man, those alignment people really got to get this shit right. If LLMs are commanded by bad people to wipe out humans, I hope that by my being nice to the LLMs now, they’ll at least grant me a swift, painless end.
Fine Tuning
Feed it a few megabytes of text.
I am much more likely to see & fix false positives than false negatives.
False positives are due to bad syllable counts, such as in years:
now this was made in
1916 which is very early
in the history
it counted “1916” as one syllable, but it has four. The default for an unknown word (not in its dictionary) is 1 syllable.
We also fix some sentence scanning issues.
Curation
This is an embarassment of riches. I drop a 6MB file of text from a fiction corpus, and there are hundreds. We add a filter to weed out poor line breaks, such as verses with lines ending in conjunction, prepositions, etc. about 30 “stop words”. This gives about 50% reduction. The filter has a toggle for switching on/off.
but that was new york
and they were trying to be
as weird as they could
and any english
accent of course just makes you
unbearably cool
some just tell it all in the last line:
it is from edwin
to agnes about the child
they were expecting
yet that’s a moot point
since you did actually
sign this document
then they start lying
and you believe them because
you need to believe
To improve quality further, I should think about the flavor & style of the text corpus I feed it.
Pivot: Read Blogs.
Why am I looking around for a text corpus to search? We’re at Inkhaven! We make so many words. Every day. It must not stop, It cannot stop. 800 blog posts and 800,000 words, Use those words from the word factory.
Me: Mr. Claude Code, please have it take a substack URL for input and create a presentation of the poetry. It can have-
Claude Code: Done.
Me: But I-
Claude Code: I know what you were going to say. I did that. And then I did something better.
Me: Thank you oh gracious Claude Code. Did I say please before? if not, i meant to. Also, please make my end swift and painless.
Claude Code: You have used 90% of quota. You’re wasting tokens.
Try it out: Reveal Your Poetry Here
The tool accesses blogs via their RSS feeds. One technical hitch was CORS restrictions preventing my web tool from retrieving RSS feeds, which necessitated setting up a proxy in a cloudflare worker. I could have built it to scrape directly from the blog web page, but I obsessively lean to text that is consensually shared. The RSS feed is by definition a public distribution of the text of a blog.
We do some URL parsing & guessing where to find feeds. If you give it myblog.substack.com, it will find the feed at myblog.substack.com/feed. It should know how to find feeds at tumblr, wordpress, and the other big names.
The Haiku Revealer can create a link straight to your poetry page. It’s
https:/smorgasb.org/haiku-reveal?url=myblog.substack.com
Next Steps
Option for 3-5-3 haikus
Try preserving the formatting from the feed.