Caffeine, Mayday, Panda & Penguin – 4 updates to the Google indexation algorithm since 2009 that has changed SEO landscape forever. Lets see what these 4 updates brought to the table.
Google made between 350 and 550 changes in its organic search algorithms in 2009 – this was before the Caffeine & Mayday updates. This is one of the reasons why SEO specialists or Webmasters should not get too worried and loose their night’s sleep on total pages indexed and site ranking factors – whether on-page or off-page. Its physically not possible to adjust to all of these changes as made by Google every single day.
Then came – Caffeine and Mayday updates to the search algorithm in Summer 2010.
Google’s June 2010 Caffeine update to its indexation algorithm was a revolution. Changed everything – how web pages are indexed.
With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.
Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.
Mayday update (read review from Vanessa Fox) was targeted towards eCommerce websites that creates thousands of URLs based on long-tail keywords – and tend to get them indexed in Google. This is how Mayday impacted such sites –
Last week at Google I/O, I was on a panel with Googler Matt Cutts who said, when asked during Q&A, ”this is an algorithmic change in Google, looking for higher quality sites to surface for long tail queries. It went through vigorous testing and isn’t going to be rolled back.”
I asked Google for more specifics and they told me that it was a rankings change, not a crawling or indexing change, which seems to imply that sites getting less traffic still have their pages indexed, but some of those pages are no longer ranking as highly as before. Based on Matt’s comment, this change impacts “long tail” traffic, which generally is from longer queries that few people search for individually, but in aggregate can provide a large percentage of traffic.
This change seems to have primarily impacted very large sites with “item” pages that don’t have many individual links into them, might be several clicks from the home page, and may not have substantial unique and value-added content on them. For instance, ecommerce sites often have this structure. The individual product pages are unlikely to attract external links and the majority of the content may be imported from a manufacturer database. Of course, as with any change that results in a traffic hit for some sites, other sites experience the opposite. Based on Matt’s comment at Google I/O, the pages that are now ranking well for these long tail queries are from “higher quality” sites (or perhaps are “higher quality” pages).
In simple English – this means –
a) for an eCommerce website – whether B2B/B2C – just creating a few static / dynamic URLs and millions of keyword-rich / search-based URLs will not TRANSLATE to higher rankings in Google PageRank for the site OR/AND result in traffic.
b) the number of pages indexed may go up exponentially from few hundred thousands to millions, but as per Mayday update; pages (or URLs) that have lower visibility + content + quality content on those pages – would be ranked so far down the order, the chances of people clicking them would exponentially disappear.
Thus, in short – Caffeine made ‘getting millions of pages indexed’ easier, and Mayday made ‘getting top-quality & top-keyword optimized pages from those indexed easier’. Focus shifted from:
> getting pages indexed to
> getting all pages indexed to…
Now, came Panda update in 2011. Changed everything.
Panda was aimed at – The SEO model has changed with Panda in that, rather than getting as many URLs as you can indexed, you now want only your highest-quality, most important URLs indexed. As outlined in the article (link provided above):
Consistent signals should be sent as to which pages are most important:
- Decide which URLs are canonical and create strong signals (rel canonical, robot exclusion, internal link profile, XML sitemaps)
- Decide which URLs are your most valuable and ensure they are indexed and well optimized
- Remove any extraneous, overhead, duplicate, low value and unnecessary URLs from the index
- Build internal links to canonical, high-value URLs from authority pages (strong mozRank, unique referring domains, total links, are example metrics)
- Build high-quality external links via social media efforts —-> this was specifically targeted in Penguin
Pay special attention to number 3 above. If your properties have low-quality or significantly duplicative content, it is best to remove those URLs from the indexes. Even a site with some high-quality content and lots of thin or low-quality content could see traffic deterioration because of Panda.
The new SEO, at least as far as Panda is concerned, is about pushing your best quality stuff and the complete removal of low-quality or overhead pages from the indexes. Which means it’s not as easy anymore to compete by simply producing pages at scale, unless they’re created with quality in mind. Which means for some sites, SEO just got a whole lot harder.
With Penguin update the focus is on “how to have great quality content on site, that other sites will wish to connect with us (by referring to us on their page)“.
Which means – links from social media channels – facebook, linkedin, twitter, pinterest, and most importantly wordpress/tumblr blogs will get indexed super fast and show up in results in same day or within next day; considered the content is rich, has value and adds visibility to the “topic or site in question”. Excellent example of this is from the 2 examples shown below –
a) I wrote a blog post on Elitify.com – a Fashion site for Men – 2 days ago; the blog post got indexed same day from my WordPress blog; and now coming at 4th position in Google for “elitify” term.
b) I wrote a blog post on our former CTO & friend, Vijay Marada passing away in 2011. When one searches for keyword “toboc” in Google image search, the image embedded in the post written so many months back, now comes as the first image. The image is called “toboc-training.png”.
Thus, post Penguin, social media will play an overly increasing factor for proper ranking of pages.
Which means – say, a site (A) selling item “XYZ” has 5,000 pages indexed. 500 are URLs with good quality content which are relevant to the overall meaning of the brand; and remaining 4,500 have been created for the sake of creation & got indexed during Panda update. And another site (B) selling same item “XYZ” has only 800 pages created and indexed from website + from social media channels – facebook, pinterest, tumblr, youtube & wordpress & from other PR websites – accounting to over 600 good quality, content and ‘good to better to excellent’ PR sites links; post Penguin update – the probability of site B to get indexed and ranked for MAJORITY of the URLs is higher than site A. Which also means – probability of site B getting more visits (hits) from people searching item “XYZ” than A.
But how is that possible? Site A has 5,000 pages indexed, Site B only has 800 pages indexed.
a) for site A = 500 URLs (out of 5,000) are good; for site B = 600 URLs (out of 800) are good;
b) post Panda & Penguin – “quality content“, “relevant content from search” & “social media backlinks to site” are governing factors.
Timeline with important factors:
- 2009 – Google made over 350-550 changes to its indexation & ranking algorithm
- 2010 – Caffeine update – targeted towards “indexation” – resulted in faster indexation, better indexation, fresher results in Google search results
- 2010 – Mayday update – targeted towards “ranking” – resulted in better ranking of URLs with greater visibility and lower ranking of URLs with long-tail keywords with lower visibility.
- 2011 – Panda update – targeted towards “indexation” – webmasters should make an effort to remove duplicate URLs, duplicate content, and build better links from good PR sites.
- 2012 – Penguin update – targeted towards “ranking” – social media backward links to site gets more important & relevant in indexation & ranking in Google search results.