The search marketing community is trying to make sense of the leaked Yandex repository containing files listing what looks like search ranking factors.
Some may be looking for actionable SEO clues but that’s probably not the real value.
The general agreement is that it will be helpful for gaining a general understanding of how search engines work.
If you want hacks or shortcuts those aren’t here. But if you want to understand more about how a search engine works. There’s gold.
— Ryan Jones (@RyanJones) January 29, 2023
There’s A Lot To Learn
Ryan Jones (@RyanJones) believes that this leak is a big deal.
He’s already loaded up some of the Yandex machine learning models onto his own machine for testing.
Ryan is convinced that there’s a lot to learn but that it’s going to take a lot more than just examining a list of ranking factors.
Ryan explains:
“While Yandex isn’t Google, there’s a lot we can learn from this in terms of similarity.
Yandex uses lots of Google invented tech. They reference PageRank by name, they use Map Reduce and BERT and lots of other things too.
Obviously the factors will vary and the weights applied to them will also vary, but the computer science methods of how they analyze text relevance and link text and perform calculations will be very similar across search engines.
I think we can glean a lot of insight from the ranking factors, but just looking at the leaked list alone isn’t enough.
When you look at the default weights applied (before ML) there’s negative weights that SEOs would assume are positive or vice versa.
There’s also a LOT more ranking factors calculated in the code than what’s been listed in the lists of ranking factors floating around.
That list appears to be just static factors and doesn’t account for how they calculate query relevance or many dynamic factors that relate to the resultset for that query.”
More Than 200 Ranking Factors
It’s commonly repeated, based on the leak, that Yandex uses 1,923 ranking factors (some say less).
Christoph Cemper (LinkedIn profile), founder of Link Research Tools, says that friends have told him that there are many more ranking factors.
Christoph shared:
“Friends have seen:
- 275 personalization factors
- 220 “web freshness” factors
- 3186 image search factors
- 2,314 video search factors
There is a lot more to be mapped.
Probably the most surprising for many is that Yandex has hundreds of factors for links.”
The point is that it’s far more than the 200+ ranking factors Google used to claim.
And even Google’s John Mueller said that Google has moved away from the 200+ ranking factors.
So maybe that will help the search industry move away from thinking of Google’s algorithm in those terms.
Nobody Knows Google’s Entire Algorithm?
What’s striking about the data leak is that the ranking factors were collected and organized in such a simple way.
The leak calls into question is the idea that that Google’s algorithm is highly guarded and that nobody, even at Google, know the entire algorithm.
Is it possible that there’s a spreadsheet at Google with over a thousand ranking factors?
Christoph Cemper questions the idea that nobody knows Google’s algorithm.
Christoph commented to Search Engine Journal:
“Someone said on LinkedIn that he could not imagine Google “documenting” ranking factors just like that.
But that’s how a complex system like that needs to be built. This leak is from a very authoritative insider.
Google has code that could also be leaked.
The often repeated statement that not even Google employees know the ranking factors always seemed absurd for a tech person like me.
The number of people that have all the details will be very small.
But it must be there in the code, because code is what runs the search engine.”
Which Parts Of Yandex Are Similar To Google?
The leaked Yandex files tease a glimpse into how search engines work.
The data doesn’t show how Google works. But it does offer an opportunity to view part of how a search engine (Yandex) ranks search results.
What’s in the data shouldn’t be confused with what Google might use.
Nevertheless, there are interesting similarities between the two search engines.
MatrixNet Is Not RankBrain
One of the interesting insights some are digging up are related to the Yandex neural network called MatrixNet.
MatrixNet is an older technology introduced in 2009 (archive.org link to announcement).
Contrary to what some are claiming, MatrixNet is not the Yandex version of Google’s RankBrain.
Google RankBrain is a limited algorithm focused on understanding the 15% of search queries that Google hasn’t seen before.
An article in Bloomberg revealed RankBrain in 2015. The article states that RankBrain was added to Google’s algorithm that year, six years after the introduction of Yandex MatrixNet (Archive.org snapshot of the article).
The Bloomberg article describes the limited purpose of RankBrain:
“If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.”
MatrixNet on the other hand is a machine learning algorithm that does a lot of things.
One of the things it does is to classify a search query and then apply the appropriate ranking algorithms to that query.
This is part of what the 2016 English language announcement of the 2009 algorithm states:
“MatrixNet allows generate a very long and complex ranking formula, which considers a multitude of various factors and their combinations.
Another important feature of MatrixNet is that allows customize a ranking formula for a specific class of search queries.
Incidentally, tweaking the ranking algorithm for, say, music searches, will not undermine the quality of ranking for other types of queries.
A ranking algorithm is like complex machinery with dozens of buttons, switches, levers and gauges. Commonly, any single turn of any single switch in a mechanism will result in global change in the whole machine.
MatrixNet, however, allows to adjust specific parameters for specific classes of queries without causing a major overhaul of the whole system.
In addition, MatrixNet can automatically choose sensitivity for specific ranges of ranking factors.”
MatrixNet does a whole lot more than RankBrain, clearly they are not the same.
But what’s kind of cool about MatrixNet is how ranking factors are dynamic in that it classifies search queries and applies different factors to them.
MatrixNet is referenced in some of the ranking factor documents, so it’s important to put MatrixNet into the right context so that the ranking factors are viewed in the right light and make more sense.
It may be helpful to read more about the Yandex algorithm in order to help make sense out of the Yandex leak.
Read: Yandex’s Artificial Intelligence & Machine Learning Algorithms
Some Yandex Factors Match SEO Practices
Dominic Woodman (@dom_woodman) has some interesting observations about the leak.
Some of the leaked ranking factors coincide with certain SEO practices such as varying anchor text:
Vary your anchor text baby!
4/x pic.twitter.com/qSGH4xF5UQ
— Dominic Woodman (@dom_woodman) January 27, 2023
Alex Buraks (@alex_buraks) has published a mega Twitter thread about the topic that has echoes of SEO practices.
One such factor Alex highlights relates to optimizing internal links in order to minimize crawl depth for important pages.
Google’s John Mueller has long encouraged publishers to make sure important pages are prominently linked to.
Mueller discourages burying important pages deep within the site architecture.
John Mueller shared in 2020:
“So what will happen is, we’ll see the home page is really important, things linked from the home page are generally pretty important as well.
And then… as it moves away from the home page we’ll think probably this is less critical.”
Keeping important pages close to the main pages site visitors enter through is important.
So if links point to the home page, then the pages that are linked from the home page are viewed as more important.
John Mueller didn’t say that crawl depth is a ranking factor. He simply said that it signals to Google which pages are important.
The Yandex rule cited by Alex uses crawl depth from the home page as a ranking rule.
#1 Crawl depth is a ranking factor.
Keep your important pages closer to main page:
– top pages: 1 click from the main page
– imporatant pages: <3 clicks pic.twitter.com/BB1YPT9Egk— Alex Buraks (@alex_buraks) January 28, 2023
That makes sense to consider the home page as the starting point of importance and then calculate less importance the further one clicks away from it deep into the site.
There are also Google research papers that have similar ideas (Reasonable Surfer Model, the Random Surfer Model), which calculated the probability that a random surfer may end up at a given webpage simply by following links.
Alex found a factor that prioritizes important main pages:
#3 Backlinks from main pages are more important than from internal pages.
Make sense. pic.twitter.com/Mts9jHsRjE
— Alex Buraks (@alex_buraks) January 28, 2023
The rule of thumb for SEO has long been to keep important content not more than a few clicks away from the home page (or from inner pages that attract inbound links).
Yandex Update Vega… Related To Expertise And Authoritativeness?
Yandex updated their search engine in 2019 with an update named Vega.
The Yandex Vega update featured neural networks that were trained with topic experts.
This 2019 update had the goal of introducing search results with expert and authoritative pages.
But search marketers who are poring through the documents haven’t yet found anything that correlated with things like author bios, which some believe are related to the expertise and authoritativeness that Google looks for.
Learn, Learn, Learn
We’re in the early days of the leak and I suspect it will lead to a greater understanding of how search engines generally work.
Featured image: Shutterstock/san4ezz