[Max Woolf] sometimes struggles to create ideal headlines for his blog posts, and decided to apply his experience with machine learning to the problem. He asked: could an AI be trained to optimize his blog titles? It is a fascinating application of natural language processing, and [Max] explains all about what it does and how it works.
The machine learning framework [Max] uses is GPT-3, a language model that works with natural-seeming human language that is capable of being tweaked in different ways. [Max] uses OpenAI’s GPT-3 API (which, by the way, is much easier to experiment with than one might think) and here is the basic workflow for his title optimizer:
- The optimizer takes as input a blog post title to optimize.
- OpenAI’s pre-trained GPT-3 engine is used to generate six alternate titles.
- For each of those alternate titles, a fine-tuned version of GPT-3 is consulted to judge how “good” they are based on custom training data. (“Good” in this context means “similar to titles of successful submissions on Hacker News“, but more on that in a moment.)
- Print the results.
The custom training data in step 3 comes from bulk submission data from Hacker News, obtained via Google’s BigQuery service. [Max] separated Hacker News submissions into ‘good’ and ‘bad’ depending on how many points the submission ended up with. Step 3 simply asks GPT-3 to grade each potential headline based on this data. The hypothesis that a submission’s rating on Hacker News can be directly correlated to the quality of its headline is an interesting idea, and the Title Optimizer can be thought of as an experiment in seeing whether this idea can be applied in the other direction: making posts more successful with the help of a good headline.
So, does [Max] now just use the highest-scoring headlines for his blog posts and call it a day? Sadly, no. Many of the results aren’t terribly suitable for one reason or another. They may neglect to emphasize the right elements, or sound too much like clickbait, or are lacking in some other way.
The AI-generated headlines might be a mixed bag, but that doesn’t mean they are not useful. There is genuine variety in the machine-generated suggestions, and they provide useful inspiration even when none of the results themselves are a home run.
[Max]’s GPT-3 Blog Title Optimizer is here on GitHub if you’d like a closer look. It’s an interesting application of natural language AI, and is also a perfect example of how machine learning’s best creative results so often come from having a human in the loop.
No comments:
Post a Comment