Utilizing LDA to Silo Your Website

by on October 14, 2010 | posted in SEO Theory

EDITOR’S NOTE: It has come to my attention that a decent amount of this post is lacking in Informational Retrieval fundamentals, as pointed out by the fine folks over at SEO Dojo Radio. Please take the following recommendations with a grain of salt.

Latent Dirichlet Allocation has ridden the rollercoaster of hype as of late – rolling from being the #1 ranking factor back down to an onslaught of criticism and, eventually, a rescinding of the point of it likely being the #1 ranking factor. And now here we are, the dust has settled, and what are we left with?

A practice that still matters – and, even if, in the future, it is proved to have zero ranking significance presently, it is a practice that should be implemented and highly weighed moving forward.

If you look at LDA more holistically, you can see why this thing has to be implemented as a best practice. It just makes sense for search engines to rank websites in this way – even if it’s completely being ignored presently or way too complex to accurately implement in the algorithm, its implementation makes sense to improve the search engines. If I mention tennis four times in this blog post and throw tennis in the front of the title tag, it shouldn’t rank for tennis. There has to be more to it.

The same way we shouldn’t be getting disruptive links today – even though they pass value – is the same reason this LDA thing should be something we must do onward. It’s a best practice. It’s going to protect us, and more importantly, it’s not going to hurt – especially if we’re in the practice of procuring great content.

On that point, content writers everywhere should rejoice. The more systematic tweaking content requires, in both blending SEO intricacies and maintaining overly important user experience, the more outdated and unsatisfactory cheap, outsourced content factories become.

Great content isn’t dead – although overly natural content might be.

Silos and LDA

Siloing your website is an advanced topic, and as such, you might not be too up on it. I suggest you read Bruce Clay’s post on SEO and Siloing and also Michael Gray’s recent multi-piece series on How to Silo Your Website. Both provide authoritative, comprehensive posts on the subject – more than I can possibly address in this entry.

For a high level view of Siloing, the main idea behind the concept is how you can segregate your website into separate sections without diluting theme – in example, how you might be able to both rank for “toys” and “books” on the same domain, even though the subjects are effectively completely different. Doing this requires organizing the website so that unrelated subjects and topics don’t interblend within pages, such as randomly linking to a women’s toy section within your page on Twilight books.

The problem with this comes with the difficulty of segregating these sections while still effectively passing PageRank and ensuring the crawability from your website. If you completely disconnect your toys section from your books section, it’s likely there’s some PageRank loss there and you lose some of the domain strength you would’ve gained connecting them.

So, the best practice here is connecting your top level pages, such as a /toys domain to a /books page, and letting all the /books/twilight type pages take care of themselves within each separate section.

However, even when you’ve done that, you’ve managed to dilute your site. You have random text on each of your subsection pages that say “Kitchen”, or “Electronics”, or “Jewelry”, such as in HSN’s case. It should be no surprise, then, that when we look at their rankings on Google for the page shown, targeting the keyword “NFL shop”, they’re around 20th.

Even with a super-strong domain, they seemingly have little chance against other websites that are entirely NFL-focused. Their LDA score is approximately 65%, and they are competing against front page websites that average a score of better than 90%.

But wait! HSN still has hope.

NFL Sites Walk into the Right Answer – Shopping Sites Have to Work for It

The thing about the competing NFL sites is they often stumble into a high LDA score. They created content on page revolving around their main focus, and naturally, it matched machine data for what was relevant for the query they wanted to rank for, NFL shop.

Shopping sites likely created content on page revolving around their aimed query, but, it seems likely, they didn’t use LDA to fundamentally tweak the score to show as much on-page relevancy as possible. Similarly, NFL sites probably don’t do this, either.

For shopping sites like HSN, LDA-focused content writing is an absolute requirement. Every time they include “Jewelry” or “Electronics” in the website text, they must offset that by being laser-focused with the rest of the content on-page. By doing this and competing with a more laissez faire, NFL-specific site, they can sometimes actually deliver a page with more content relevancy than the NFL-specific sites themselves.

Those that don’t are the ones that end up on the third or fourth page. For some websites with a strong niche focus, they can probably drift by and potentially dominate SERPs simply by chance – but for those mammoth, gargantuan domains with millions of uniques, this approach can be critically impairing.

A Methodology for Improvement

Since we’re using HSN as an example, there are explicit things they can do to improve their on-page relevancy that can also be applied universally for whatever website you might be working on.

The Sidebar Navigation

The first is in their sidebar. Almost all of their navigation elements lack real relevancy to the term “NFL shop” itself.

I question the likelihood that many of their sidebar navigation elements are even used, or otherwise, are worth the negative cost their inclusion creates. I find it highly doubtful that people prefer to see the “shop by price” option over the ability to click to their team without having to go an extra click to the “See All” option. Price categories are more important for high-ticket items, not jerseys. Similarly, the brands that make NFL gear are pretty much a non-variable to casual NFL fans – to the extent that they might not help a LDA score for “NFL shop”.

The important point, though, is that they can be much more deliberate including NFL specific anchor text in their internal anchors. It’s possible that they’ll help these pages rank for each anchor, and also, help this page seem more relevant for “NFL shop”. Changing the above things will go a long way to upping their LDA score.

Page-Level Content

They also lack focused on-page content, rather choosing to go with a product display choice as a focus. They have some unique content, but it’s short, and somewhat topically scattered. They could expand upon this for a few more sentences and tighten the terminology to help pull up the LDA and drive more organic traffic.

The Footer

Finally, HSN’s footer element has a lot of fluff/links that probably go unused. It is my estimation that the footer largely goes ignored on pages other than the homepage, so, for a site such as this – especially one with so much unnecessary links and text, I would recommend completely eliminating the footer on the page level.

It’s possible the search engines completely eliminate this part of the page as it pertains to topical relevancy – if they’re capable of knowing what parts of the page as most important as it pertains to the reasonable surfer model, it’s very possible they similarly highly devalue or just don’t care about what’s detailed text-wise in this footer section. BUT, as good practice, it seems beneficial to eliminate something that at best, is a largely unused navigational element.


Besides this specific example, some things that many of these large, “shopping” focused sites can be very cognizant of to not get outdone by the smaller, topically relevant competitors in their space:

  • Use only one top-level navigational element: If HSN links to “Jewelry”, “Fashion”, and etc in their masthead navigation, they should be very aware to not do so again in the side navigation. Although it might slightly improve UX in some places, the offset in LDA drop will not be worth it.
  • Be very deliberate in creating page-level anchors for side navigation elements. If possible, modify your sidebar on a page-by-page basis to match the topical relevancy on page. If that’s not possible, at very least create very strong vertical-specific internal anchors on the sidebar.
  • Eliminate content fluff. Each of these shopping sites have to be extremely “lean” – eliminating any elements that don’t measurably improve user experience, especially if they also hurt the focus on-page. The actual written content must be heavy in keyword-specific terms, and not wander into more casual language.
  • Remove the footer where necessary. On websites like SEOMoz, the footer isn’t something that’s going to hurt their scores page-to-page – mostly because they do a good job of including search-specific terms. On sites like HSN, though, the footer is completely irrelevant and only serves to dilute the page theme. If your site has a minimalistic footer with little text, keeping it may offset the minimal negation it takes away from your page-to-page LDA score – so always weigh these factors when deciding whether or not to include it on your page.

A Case for MayDay and a Disclaimer

If you at all remember the MayDay update – and if you’re a serious SEO, you do – you remember the apocalyptic situation where many enterprise-level sites lost much of their long tail traffic.

If we think about the above situation – how these sites were impacted – and the recent discovery of LDA – there seems to be some inference that perhaps, during this infamous period, LDA was increased/implemented as a ranking factor. Suddenly, subpar, third-tier content pages on these massive sites began dropping back.

Many were the above-mentioned types of websites, which had the two following characteristics:

  1. Poor, barely passable content deep on the website
  2. Need for advanced siloing

These kinds of pages, whose content creation was probably cheaply outsourced to uncaring, unknowledgeable 3rd parties, frequently suffered. On the top level pages where content was stronger and link building was heavier, it wasn’t so much the case. Some domains weren’t so heavily affected, despite their massive size – and it seems possible that these domains may have been the beneficiaries of accidental LDA.

This is only an inference, and of course, I have no data to prove it, but there seems to be some connection there, if only from an opinionated person close to the situation.

As it pertains to the earlier recommendation, please note that it is not clear how much LDA itself, on one page, can offset other thematic factors of your website. It seems likely that some information about your website passes from thematic relevancy from incoming links, and the entire body of content your website presents. How much this factors in is hard to tell, and might have even changed by the time you finish reading this post.

Despite this disclaimer, I still think the main point remains: Keeping your page-to-page topical relevancy as laser-focused as possible is absolutely required when your website requires advanced siloing – especially in competitive niches.

Good luck – and let a high LDA score be with you.

Previous post:

Next post: