A glimpse into Google's machine learning?

You’ve likely seen the People Also Ask (Related Questions) boxes in SERPs. These accordion-like question and answer boxes are Google’s way of saying, “Hey, you beautiful searcher, you! These questions also relate to your search... maybe you're interested in exploring these too? Kick off your shoes, stay a while!”

However, few people have come across infinite PAAs. These occur when you expand a PAA question box to see 2 or 3 other related questions appear at the bottom. These infinite PAA lists can continue into the hundreds, and I've been lucky enough to come across 75+ of these gems!

So, grab a coffee and buckle up! I’d like to take you on a journey of my infinite PAA research, discoveries, machine learning hypothesis, and how you can find PAA opportunities.



Why PAAs should matter to you

PAAs have seen a 1,723% growth in SERPs since 7/31/15 via Mozcast! ← Tweet this stat!

Compare that to featured snippets, which have seen only a 328% growth since that timeframe.

Research has also shown that a single PAA can show up in 21 unique SERPs! How 'bout dem apples?! PAA opportunities can take over some serious SERP real estate.

My infinite PAA obsession

These mini-FAQs within search results have fascinated me since Google started testing of them in 2015. Then in November 2016, I discovered Google's PAA dynamic testing:



You guys, I've discovered a SERP black hole! I'm on #200 suggested PAA for this SERP?! Has anyone else seen an infinite PAA SERP before? pic.twitter.com/YgZDVWdWJ9

— Britney Muller (@BritneyMuller) November 23, 2016

The above infinite PAA expanded into the hundreds! This became an obsession of mine as I began to notice them across multiple devices (for a variety of different searches) and coined them “PAA Black Holes.”

I began saving data from these infinite PAAs to see if I could find any patterns, explore how Google might be pulling this data, and dive deeper into how the questions/topics changed as a result of my expanding question boxes, etc.

After seeing a couple dozen infinite PAAs, I began to wonder if this was actually a test to implement in search, but several industry leaders assured me this was more likely a bug.



They were wrong.

Infinite People Also Ask boxes are live

Now integrated into U.S. SERPs (sorry foreign friends, but get ready for this to potentially migrate your way) you can play with these on desktop & mobile:

If you're in the US and like exploring topics, there's a nifty feature for you to try with "People also ask" on Google. :-) pic.twitter.com/s2WtwyYvun

— Satyajeet Salgar (@salgar) February 10, 2017

I’m fascinated by Satyajeet's use of “exploring topics”.

Why does Google want people to spend more time on individual SERPs (instead of looking at several)? Could they charge more for advertisements on SERPs with these sticky, expansive PAAs? Might they eventually start putting ads in PAAs? These are the questions that follow me around like a shadow.



To get a better idea of the rise of PAAs, here's a timeline of my exploratory PAA research:

PAA timeline

April 17, 2015 - Google starts testing PAAs

July 29, 2015 - Dr. Pete gets Google to confirm preferred “Related Questions” name

Aug 15, 2015 - Google tests PAA Carousels on desktop

Dec 30, 2015 - Related Questions (PAAs) grow +500% in 5 months

Mar 11, 2016 - See another big uptick in Related Questions (PAAs) in Mozcast

Nov 11, 2016 - Robin Rozhon notices PAA Black Hole

Nov 23, 2016 - Brit notices PAA Black Hole

Nov 29, 2016 - STAT Analytics publishes a research study on PAAs

Dec 12, 2016 - Realized new PAA results would change based on expanded PAA

Dec 14, 2016 - Further proof PAAs dynamically load based on what you click

Dec 19, 2016 - Still seeing PAA Black Holes

Dec 22, 2016 - Discovered a single PAA result (not a 3-pack)

Jan 11, 2016 - Made a machine learning (TensorFlow) discovery and hypothesis!

Jan 22, 2016 - Discovered a PAA Black Hole on a phone

Jan 25, 2016 - Discovered a PAA Black Hole that maxed out at 9

Feb 10, 2017 - PAA Black Holes go live!

Feb 14, 2017 - Britney Muller is still oblivious to PAA Black Holes going live and continues to hypothesize how they are being populated via entity graph-based ML.

3 big infinite PAA discoveries:

#1 - Google caters to browsing patterns in real time

It took me a while to grasp that I can manipulate the newly populated question boxes based on what I choose to expand.

Below, I encourage more Vans-related PAAs by clicking “Can I put my vans in the washing machine?” Then, I encourage more “mildew”-related ones simply by clicking a “How do you get the mildew smell out of clothes” PAA above:



Another example of this is when I clicked “organic SEO” at the very top of a 100+ PAA Black Hole (the gif would make you dizzy, so I took a screenshot instead). It altered my results from “how to clean leather” to “what is seo” and “what do you mean by organic search”:



#2 - There are dynamic dead ends



When I reach an exhaustive point in my PAA expansions (typically ~300+), Google will prompt the first two PAAs, as in: “We aren’t sure what else to provide, are you interested in these again?”

Here is an example of that happening: I go from “mitosis”-related PAAs (~300 PAAs deep) to a repeat of the first two PAAs: “What is Alexa ranking based on?” and “What is the use of backlinks?”:

This reminds me of a story told by Google machine learning engineers: whenever an early ML model couldn’t identify a photograph, it would say a default ‘I don’t know’ answer of: “Men talking on cell phone.” It could have been a picture of an elephant dancing, and if the ML model wasn’t sure what it was, it would say “Men talking on cell phone.”

My gut tells me that G reverts back to the strongest edge cases (the first two PAAs) to your original query when running out of a certain relational threshold of PAAs.

It will then suggest the third and fourth PAA when you push these limits to repeat again, and so on.

#3 - Expand & retract one question to explore the most closely related questions

This not only provides you with the most relevant PAAs to the query you're expanding and retracting, but if it’s in your wheelhouse, you can quickly discover other very relevant PAA opportunities.

Here I keep expanding and retracting "What is the definition of SEO?":

Notice how “SEO” or “search engine optimization” is in every subsequent PAA!? This is no coincidence and has a lot to do with the entity graph.

First, let's better understand machine learning and why an entity-based, semi-supervised model is so relevant to search. I’ll then draw out what I think is happening with the above results (like a 5-year-old), and go over ways you can capture these opportunities! Woohoo!

Training data's role in machine learning

Mixups are commonplace in machine learning, mostly due to a lack of quality training data.



Well-labeled training data is typically the biggest component necessary in training an accurate ML model.



Fairly recently, the voice search team at Google came across an overwhelming amount of EU voice data that was interpreted as “kdkdkdkd.” An obvious exclusion in their training data (who says “kdkdkdkd”?!), the engineers had no idea what could be prompting that noise. Confused, they finally figured out that it was the trains and subways making that noise!

This is a silly example , but Google is now able to account for these pesky "kdkdkdkd" inclusions.

Relational data to the rescue

Because we don’t always have enough training data to properly train a ML model, we look to relational data for help.

Example: If I showed you the following picture, you could gather a few things from it, right? Maybe that it appears to be a female walking down a street, and that perhaps it’s fall by her hat, scarf, and the leaves on the ground. But it’s hard to determine a whole lot else, right?

What about now? Here are two other photos from the above photo’s timeline:

Aha! She appears to be a U.S. traveler visiting London (with her Canon Ti3 camera). Now we have some regional, demographic, and product understanding. It’s not a whole lot of extra information, but it provides much more context for the original cryptic photo, right?

Perhaps, if Google had integrated geo-relational data with their voice machine learning, they could have more quickly identified that these noises were occurring at the same geolocations. This is just an example; Google engineers are WAY smarter than myself and have surely thought of much better solutions.

Google leverages entity graphs similarly for search

Google leverages relational data (in a very similarly way to the above example) to form better understandings of digital objects to help provide the most relevant search results.

A kind of scary example of this is Google’s Expander: A large-scale ML platform to “exploit relationships between data objects.”

Machine learning is typically “supervised” (training data is provided, which is more common) or “unsupervised” (no training data). Expander, however, is “semi-supervised,” meaning that it’s bridging the gap between provided and not-provided data. ← SEO pun intended!

Expander leverages a large, graph-based system to infer relationships between datasets. Ever wonder why you start getting ads about a product you started emailing your friend about?

Expander is bridging the gap between platforms to better understand online data and is only going to get better.

Relational entity graphs for search

Here is a slide from a Google I/O 2016 talk that showcases a relational word graph for search results:

Solid edges represent stronger relationships between nodes than the dotted lines. The above example shows there is a strong relationship between “What are the traditions of halloween” and “halloween tradition,” which makes sense. People searching for either of those would each be satisfied by quality content about “halloween traditions.”

Edge strength can also be determined by distributional similarity, lexical similarity, similarity based on word embeddings, etc.

Infinite PAA machine learning hypothesis:

Google is providing additional PAAs based on the strongest relational edges to the expanded query.

You can continue to see this occur in infinite PAAs datasets. When a word with two lexical similarities overlaps the suggested PAAs, the topic changes because of it:

The above topic change occurred through a series of small relational suggestions. A PAA above this screenshot was “What is SMO stands for?” (not a typo, just a neural network doing its best people!) which led to "What is the meaning of SMO?", to “What is a smo brace?” (for ankles).

This immediately made me think of the relational word graph and what I envision Google is doing:

My hypothesis is that the machine learning model computes that because I’m interested in “SMO,” I might also be interested in ankle brace “SMO.”

There are ways for SEOs and digital marketers to leverage topical relevance and capture PAAs opportunities.

4 ways to optimize for machine learning & expand your topical reach for PAAs:

Topical connections can always be made within your content, and by adding additional high quality topically related content, you can strengthen your content’s edges (and expand your SERP real estate). Here are some quick and easy ways to discover related topics:

#1: Quickly discover Related Topics via MozBar

MozBar is a free SEO browser add-on that allows you to do quick SEO analysis of web pages and SERPs. The On-Page Content Suggestions feature is a quick and simple way to find other topics related to your page.

Step 1: Activate MozBar on the page you are trying to expand your keyword reach with, and click the Page Optimization:

Step 2: Enter in the word you are trying to expand your keyword reach with:

Step 3: Click On-Page Content Suggestions for your full list of related keyword topics.

Step 4: Evaluate which related keywords can be incorporated naturally into your current on-page content. In this case, it would be beneficial to incorporate “seo tutorial,” “seo tools,” and “seo strategy” into the Beginner’s Guide to SEO.

Step 5: Some may seem like an awkward add to the page, like “seo services” and “search engine ranking,” but are relevant to the products/services that you offer. Try adding these topics to a better-fit page, creating a new page, or putting together a strong FAQ with other topically related questions.

#2: Wikipedia page + SEOBook Keyword Density Checker*

Let’s say you're trying to expand your topical keywords in an industry you’re not very familiar with, like "roof repair." You can use this free hack to pull in frequent and related topics.

Step 1: Find and copy the roof Wikipedia page URL.

Step 2: Paste the URL into SEOBook’s Keyword Density Checker:

Step 3: Hit submit and view the most commonly used words on the Wikipedia page:

Step 4: You can dive even deeper (and often more topically related) by clicking on the "Links" tab to evaluate the anchor text of on-page Wikipedia links. If a subtopic is important enough, it will likely have another page to link to:

Step 5: Use any appropriate keyword discoveries to create stronger topic-based content ideas.

*This tactic was mentioned in Experts On The Wire episode on keyword research tools.

#3: Answer the Public

Answer the Public is a great free resource to discover questions around a particular topic. Just remember to change your country if you’re not seeking results from the UK (the default).

Step 1: Enter in your keyword/topic and select your country:

Step 2: Explore the visualization of questions people are asking about your keyword:

Note: Not all questions will be relevant to your research, like “why roof of mouth hurts” and “why roof of mouth itches.”

Step 3: Scroll back up to the top to export the data to CSV by clicking the big yellow button (top right corner):

Step 4: Clean up the data and upload the queries to your favorite keyword research tool (Moz Keyword Explorer, SEMRush, Google Keyword Planner, etc.) to discover search volume and SERP feature data, like featured snippets, reviews, related questions (PAA boxes), etc.

Note: Google’s Keyword Planner does not support SERP features data and provides vague, bucket-based search volume.

#4: Keyword research “only questions”

Moz Keyword Explorer provides an “only questions” filter to uncover potential PAA opportunities.

Step 1: Enter your keyword into KWE:

Step 2: Click Keyword Suggestions:

Step 3: Filter by “are questions”:

Pro tip: Find grouped question keyword opportunities by grouping keywords by “low lexical similarity” and ordering them from highest search volume to lowest:

Step 4: Select keywords and add to a new or previous list:

Step 5: Once in a list, KWE will tell you how many “related questions” (People Also Ask boxes) opportunities are within your list. In this case, we have 18:

Step 6: Export your keyword list to a Campaign in Moz Pro:

Step 7: Filter SERP Features by “Related Questions” to view PAA box opportunities:

Step 8: Explore current PAA box opportunities and evaluate where you currently rank for “Related Questions” keywords. If you’re on page 1, you have a better chance of stealing a PAA box.

+Evaluate what other SERP features are present on these SERPs. Here, Dr. Pete tells me that I might be able to get a reviews rich snippet for “gutter installation”. Thanks, Dr. Pete!

Hopefully, this research can help energize you to do topical research of your own to grab some relevant PAAs! PAAs aren't going away anytime soon and I'm so excited for us to learn more about them.

Please share your PAA experiences, questions, or comments below.