Selecting the right prompts for AIO / GEO tracking could use a human touch

The Stats

We compared the earliest UX reports we did against the most recent one we did, the data is messy, but here is what we're seeing in terms of how people seem to be changing how the prompt from 1 year ago, the data is limited to only 7 studies but its something to start learning from:

300% increase - In the frequency of people putting personal information into their AI prompts.

270% increase - In what we categorize as delegation prompts (i.e. do this thing for me).

83% decrease - in prompts that look more like keyword searches, than structured thoughts.

25% increase - in median prompt length

Here's the full contrast from July 2025 to June 2026, in one frame:

	Jul 2025 (first study)	Jun 2026 (most recent)	Change
Prompt style: keyword/search	18%	3%	↓ Decrease
Prompt style: task delegation	10%	37%	↑ Increase
Personal details per prompt	0.36	1.24	↑ Increase
Follow-up rate	Not captured	25–40% of session	New data
Median words per prompt	16	20	↑ Increase

Question your AI prompt selection status quo:

Ask yourself these questions relative to your GEO brand tracking efforts:

1 - How did I pick the prompts I am tracking today?

2 - How long have I been tracking those prompts? & When is the last time I edited or deleted the prompts I am tracking?

3 - How much “influence” did reviewing human inputs have on the prompts I selected?

4 - Who is the person / team / job title responsible for selecting prompts? How much time do they spend talking to customers or collaborating with people who do before selecting prompts?

How did you select the current prompts to track in your GEO tracker?

For me when AI "search" was new it started with, I made them up based on keywords.

Some of us advanced quickly to using “people also ask” questions, as they felt more human.

We got a baseline of prompts approved, whew, now time to optimize, right?

How long have I been tracking these prompts?

This is a million dollar self reflection and a problem.

When we take over client accounts, many times we see a "set it and forget it" approach to prompts. They kept running the same ones because they wanted a consistent baseline.

Yet there’s nothing consistent about what the moment we are in.

If you are really into optimizing for AI, you are learning new things on the daily. Prompts, memory, tools, etc., so if you are tracking the same prompts that you were tracking 6 months ago, you might have a problem.

The first problem is your team is optimizing for “baseline,” so they can have something to show you that is consistent and shows they have improved your visibility.

How is that a problem?

Here’s why…

It is easier to show your boss the same consistent dataset improving, than to have the harder strategic conversation that’s like, I shouldn’t show you much trended over time because the world is changing, I will only offer snapshots. I got this from Jennifer Brain Mennes

She's the Global Head of Digital Marketing, Strategy & Innovation at Mondelez, and she said at a recent panel we were on together about AI ... “I don’t trend out any of my AI reports, everything is a snapshot because I don't want my C levels expecting a consistent report.”

That is an example of having the harder convo.

The second problem, if you are using prompts from 8+ months ago, is your team questioning themselves and asking "has our customer changed at all in 8 months"

The closer you get to customer inputs, the more you realize your prompts (at least for now) should be in some state of flux, because customers are just learning how to search AI.

This AI optimization. Your prompt selection strategy should be keeping up with it the AI response world that is changing daily.

Do we compare our tracked prompts against human inputs?

This is why I say that every attempt to optimize for AI via generative engine optimization should include as much customer input into prompt selection as possible.

Where to Find Real Customer Language

Paid Query Data

Real customer language in short phrases. Flawed, but it forces you to ask: do your prompts sound like your customers?

Trade-off

Low context, but high volume and authenticity.

Call Transcripts

Full conversational context. Even 5 calls compared against your prompt library can reveal meaningful gaps.

Trade-off

Token limits require selective use.

UX Interviews

Direct observation of how people interact with AI answers. Small sample, but the deepest insight available.

Trade-off

Small sample size and higher cost.

Best for

Aligning prompt language to real customer voice

Best for

Grounding prompts in real conversational context

Best for

Understanding how people interact with answers

This post is about UX Interviews mostly, what we’ve started seeing longitudinally in human behavior when using ChatGPT to solve problems for our clients. It is rife with small data, opinions, etc, but in spite of that it is rooted in observed behavior - and I’ve seen it make me question prompts and help us find gaps our clients competitors are likely ignoring.

We reviewed 387 real prompts across 7 studies spanning July 2025 to June 2026.

The median prompt in June 2026 was 20 words up 25% from July. People year over year are completing specific tasks our clients care about.

People volunteer 3x more personal details in AI prompts than they did a year ago

Most GEO tracking tools contain zero personal details in their prompts

Jul 2025

0.36

personal details per prompt

Jun 2026

1.24

personal details per prompt

12-month change

3x+

increase in details volunteered

This is people handing over details automatically, details that will allow for future personalization.

What happens with all that personal context people are voluntarily handing over? Garrett Sussman ran an experiment connecting Gmail to Google's AI Mode and found that email signals alone nearly tripled brand visibility in AI responses without a single change to a brand's content. The implications for how you show up in AI are worth understanding.

Now ask yourself: How many of the prompts in your GEO tracking tool contain personal details?

The Prompt Inventory Problem

The standard GEO tracking setup looks something like this: you define prompts, you test prompts, you optimize for those prompts. Almost everyone is tracking non-branded, when we believe that a large part of your prompts should be branded, so you can understand what the AI is saying about you to the people who already know you, and optimize that.

Our tests have shown:

The fastest-growing prompt category in our research is now task delegation (prompts where the user hands the AI a job to produce an output for or make a decision on their behalf).
In July 2025, delegation was 10% of prompts. In June 2026, it's 37%.
Keyword/search-style prompts dropped from 18% to 3% in the same period.

Are you tracking things like:

"Narrow my list to two providers you recommend based on my previous criteria."

"Create a family friendly decision matrix."

"Or, can you do it for me?"

Those are real prompts from real participants in these studies, now “can you do it for me” with no context isn’t useful, but looking at those nudges from AI helps you see what you can add to your prompts that reflects the follow ups nudges your customers are definitely seeing when they run prompts against their AI tool of choice.

The Humanity Stack: How to Fix Your Prompt Inventory

When we built Seer, we wanted to bring humanity to the SERP: How do we build things that help people and avoid the algorithmic optimization that helps search engines, but annoy people.

What we realized was that paid search queries are the customer speaking in their own voice about how to solve a problem. The keywords we picked for SEO were what we as marketers thought people were typing. Combining the two improved keyword lists and injected humanity into a process that had gone completely marketer-driven.

Wil Reynolds

Following the Data: 13 Years of Staying Ahead

2012

Setting the Standard

I said I'm going to lead a charge to call out the low quality work and hire the kinds of people who want to win in search and win for people, the people who wanted to do the harder work.

2017

Unlocking Hidden Data

I discovered that keyword conversion data Google was blocking from organic reports was sitting in paid search all along and built a system to bridge the gap others weren't seeing.

2025

The Same Lens, Applied to AI

As AI and social channels started pulling traffic from traditional search, I applied the same data-first framework to track disruption across all three in Looker Studio.

If you're overwhelmed and overloaded and just need to get something real into your GEO tracking tool today, your PPC query data is the fastest path to modifying the prompts you are selecting for AI prompt tracking - it gives you actual human voices in them.

It's not perfect. It's a floor. But it's a better floor than a spreadsheet your content team brainstormed in a conference room.

The next level up is pulling sales call transcripts and comparing them against the prompts you're currently tracking. Customers describe their problems in their language there.

The job of an AI optimization professional... mine that gap.

One practical constraint to be honest about: transcript volume will blow out your context window fast. If you try to analyze 200 calls in a single pass, you'll hit the wall.

What you actually need is a metadata layer over your transcripts — something that flags calls for specific signals (objection types, competitor mentions, decision criteria, constraint language) — so you only pull the exact quotes when you need them. The flags are searchable, so the full transcripts don't need to be loaded every time.

The final stage is watching real customers use an AI tool to make a decision in your category. Not surveying them after. Not asking them to describe it. Watching it happen in real time.

The Humanity Stack

Higher signal requires more investment, and reveals what lower levels structurally cannot

LEVEL 1 · START HERE

PPC Query Data

Fastest, most accessible. Gets humanity into your prompts today.

→

LEVEL 2 · HIGHER SIGNAL

Sales & Support Transcripts + Metadata Layer

Higher signal, requires some infrastructure to not drown in volume

→

LEVEL 3 · HIGHEST SIGNAL

Live UX Research

Highest signal, almost nobody doing it. Reveals things the other two structurally cannot.

Each level requires more investment. Each level up also shows you behaviors that let you see around corners your competitors aren't. While they brag about visibility going up fast, you know the real question is: visibility for what?

People told Google things, they wouldn't tell anyone else, AI is that 10x'ed.

My favorite things about google suggest was I could get a glimpse into how people really searched, you could literally just type in something and boom autocomplete gave you a clue on if there was any "there, there"... these are some oldies but goodies:

Prompts for AI Tracking Prompts for AI Tracking (1) Prompts for AI Tracking (2)

Ewww, yall need therapy, not Google. Anyway...

Real people, real prompts

These are real prompts typed by real people in these studies that are unprompted and unguided, just a person and a chat window trying to solve a problem.

Energy/Utilities:

"what are the cheapest providers for electricity in [x city] for a 1500 square foot house and usage of around 700 kwh a month 1yr – 3yrs"

That person just told an AI their city, their home size, their monthly kWh usage, and their contract preference. In one sentence. Without being asked. Four personal data points in a natural question.

Consumer finance:

"I am interested in getting a new checking or savings account for the specific purpose of using it for all of my online purchases. I am not comfortable using my main brick and mortar bank account numbers online. I want to have a smaller account dedicated to online purchases only. I would like to get some interest back. I anticipate keeping over [xxxx] dollars at all times. I would be interested in online banks."

Existing bank relationship revealed. Minimum balance stated. Risk posture disclosed. Use case specified. Account type preference given.

Home improvement:

"I'm looking to purchase a new refrigerator. We like having a French door style. It must have an ice maker, ideally a water filter too. I'd like the front to be magnetic, and stainless is ideal. I like adjustable shelves and climate control within the fridge. Give me top 5 options. Include reviews, price, and any specials."

This indicates a specific style a user has in mind and what features they want their appliance to have.

And then the delegation prompts:

"Please find me the best option for electrical service at my home [ZIP]. I want to prioritize price per kw first, but also look for free nights or weekends and other promotions. Please evaluate the plans available and take into account all promotions and credits so you can display a list for me in order from the cheapest overall to most expensive. Please look for all available providers, not just the top 10."

This changes my content strategy for AI because now I’m thinking, if we have promotions…

Who is eligible?
Where are they posted?
How consistent are they posted on other sites?
Can I partner with those sites to make sure AI sees the same promotion across ALL sites consistently.

Personal details changes context, decision context

Here's the GEO implication: when a person types "1500 square foot house, 700 kWh a month, [location]" into an AI, they have just created a decision context. Everything that comes next, every recommendation, every follow-up question, every comparison happens with that context.

If your brand isn't positioned to surface in that specific constrained context, you don't exist in that conversation. And it’s not just in that query. You don’t in that entire thread.

In traditional search someone typed "cheap electricity [state]," you optimized for that. The next person got the same blank slate. In AI search, context accumulates within a session and over a lifetime given memory.

The 700 kWh detail in prompt one is still active in prompt five. That context is sticky.

That square footage of your home, could be used 6 months from now without you ever entering that in again.

A great example from a client:

User prompt: Now I want to know what types of materials should I use on the deck - what woods would you suggest for the deck?

ChatGPT response: “Good—this is the decision that quietly determines whether your space feels like a Havana courtyard… or just a basic backyard deck. I’ll be direct: not all woods age well in full afternoon sun + humidity swings (Ohio). You want something that looks better over time, not tired." [proceeds to suggest Cedar, Ipe, Pressure-treated wood, and composite with pros and cons for all]

This user told ChatGPT their 0 prompts ago, and may never have to tell it again to get more personalized answers.

How much do answers & citations differ based on square footage of my house?

That is the testing I would want to be doing. What is the "elasticity" of an answer based on changing square footage?

I would want to know early in this game: if I put in the square footage of my house as 750, 1500, 3000, and 5000 - do the answers differ? Now you might say, “Wil that is 4x’ing my number of prompts tracked and my cost.”

To that I would say “Welcome to the big leagues!” You might want to test this temporarily across models every few months. Because if you see massive swings in answers, you might need to track them more or less.

The Follow-Ups in Prompts - Are you tracking them?

In every 2026 study where follow-up prompts were measurable, 25 to 50 percent of prompts within a session were follow-ups, not new queries, but continuations of an existing conversation.

In the July 2025 study, that number was effectively zero in the data, because the methodology wasn’t capturing the follow-ups yet. When we fixed the data collection, the follow-up rate increased significantly.

If we had taken the July 2025 number at face value, we would have built an optimization strategy around a behavior pattern that was already slipping.

Prompt 1 is the natural question we all track:

"what are the cheapest providers for electricity in [X] area 1500 square foot house and usage of around 700 kwh a month 1yr – 3yrs"

The AI returns a list. But our observed behaviors help us see what the nudges are after the original prompt and how people react to them?

Nudge: "Would you like a comparison of the top two providers side by side, or factor in any additional preferences like green energy?

Nudge: "Would you like to see the highest rated providers?

Prompt 2 is the beginning of that task delegation step, filter it down for me to a list I can manage:

"Narrow my list to two providers you recommend based on my previous criteria."

Prompt 3 (task delegation):

"Create a comparison table I can show my husband."

That's three citation moments. Typical GEO tracking measures the response to prompt 1, but rarely 2 or 3.

In the conversation below, you can see that this user didn't even have to go through all these prompts in sequence because the AI did it all for them in one step.

This is a perfect example of how AI can behave differently and provide different results from person to person.

Prompts for AI Tracking (3) Prompts for AI Tracking (4)

Here's what a real consumer session from looked like, stage by stage:

Stage 1 — Early Exploration (what most GEO tools track)

"What are some ideas for a deck?"
"I'm looking to build a deck"

This is a broad opener, very little to no personal context. These are the kind of prompts being tracked by most GEO practitioners. Yet it might be the least important moment in the decision.

Stage 2 — Specific Ideas (personal details start flowing)

The participant adds constraints unprompted:

"16'x16', family with kids and pets, keep costs low, Ohio weather"

Dimensions. Family composition. Budget signal. Climate. Four personal data points in one follow-up, none of which appeared in the opening prompt.

Stage 3 — Narrow Options (brand comparison begins)

"I want composite decking"
"Tell me about Trex vs. TimberTech"

The Ohio weather detail? That's the reason one Gemini session specifically recommended TimberTech over Trex for warmer climates. A personal detail in prompt two changed the brand recommendation in prompt three.

Stage 4 — Decision Making (format delegation)

"Put that in a comparison table"
"Give me a bullet point material list"

Stage 5 — Real World Action (conversion intent)

"Where can I buy this near me?"
"Help me find a contractor"

This is the moment GEO has been promising to capture. It happens at prompt five, but GEO tracking tools are measuring prompt one.

The brand that surfaces in prompt one gets tested again in response to prompt two. The brand that makes it to the comparison table in prompt three is the one that gets taken into a spousal conversation, and this is the actual buying decision.

The AI is now partially authoring the follow-up prompts users send.

When an AI ends its answer with "Would you like me to narrow this down by customer ratings?" It is shaping the vocabulary of the next prompt.

If it frames the follow-up around a criteria that your brand doesn't invest in, you're being filtered out by the nudge prompt the user didn't write, but the AI suggested.

This is why cross divisional work is so important. AI answers, for a single prompt, run often enough will show you predicted patterns on what likely follow ups will be. Ratings, Reviews, Green, etc.

You can’t invent ratings, reviews, and environmental friendliness. Well, you can if you do listicles and other low quality tactics, but your competitors can just copy it and it’s likely short lived.

How do you, as someone managing GEO and prompt selection, show these follow ups as a way to partner with your other divisions to see what they are doing to make us legitimately a good answer for the follow on query? Writing content saying “we’re highly rated” gets you visibility without credibility. As we say, it gets you seen but not believed.

BeSeen-BeBelievedBeChosen

The cleanest session we captured in the research: a participant opened with a 78-word contextual prompt about a deck project, used the AI's response to learn the vocabulary of the decision, then came back with a sharp delegating prompt in 6 words: "Create a family friendly decision matrix." The complexity compressed. The decision got made. And the brand in that matrix was the brand that got chosen.

If you're only optimizing for the opening prompt, you're optimizing for the part of the conversation that matters least to the final decision.

The opening prompt is where people figure out what to ask. The follow-up prompts are where they decide.

Start Here >

Before any of the intricate stuff: watch a real customer use an AI tool for 30 minutes on a decision in your category.

Then work the stack. PPC query data gets you humanity fast. Sales transcripts (with a metadata layer so you don't drown) get you the vocabulary customers use. Live UX research gets you the session behavior, the accumulation, the follow-ups, and the AI suggestions that generate the next prompt.

The GEO teams that build prompt inventories from customer behavior instead of marketer intuition are going to look back at 2025 the way we look back at the days before we combined paid and organic data. It seems obvious in retrospect. It's not obvious right now, which is exactly when it matters.

This research comes from Seer's AI Search User Testing methodology — 387 prompts across 7 studies, July 2025 to June 2026, observing real participants in real sessions. For the full methodology, see our overview here.

Wil Reynolds

CEO & Vice President

Wil Reynolds is the founder and CEO of Seer Interactive, which he started in 2002 and has grown into a 200+ person digital marketing agency. A sought-after voice on SEO and AI search, Wil has spoken at 100+ conferences worldwide, including MozCon and SearchLove, and his research has led to industry-defining insights that have helped thousands of businesses grow.

Denise Baginski

Lead, UX

Selecting the right prompts for AIO / GEO tracking could use a human touch

The Stats

We compared the earliest UX reports we did against the most recent one we did, the data is messy, but here is what we're seeing in terms of how people seem to be changing how the prompt from 1 year ago, the data is limited to only 7 studies but its something to start learning from:

Question your AI prompt selection status quo:

Ask yourself these questions relative to your GEO brand tracking efforts:

How did you select the current prompts to track in your GEO tracker?

How long have I been tracking these prompts?

Yet there’s nothing consistent about what the moment we are in.

Do we compare our tracked prompts against human inputs?

The Prompt Inventory Problem

The Humanity Stack: How to Fix Your Prompt Inventory

The job of an AI optimization professional... mine that gap.

People told Google things, they wouldn't tell anyone else, AI is that 10x'ed.

Real people, real prompts

Personal details changes context, decision context

How much do answers & citations differ based on square footage of my house?

The Follow-Ups in Prompts - Are you tracking them?

When an AI ends its answer with "Would you like me to narrow this down by customer ratings?" It is shaping the vocabulary of the next prompt.

Start Here >

Wil Reynolds

Denise Baginski

We love helping marketers like you.

Related Posts