SEO

Hash-bang: Good for the Web But Isn't Ready for Prime-Time

­

In light of Gawker’s new design and hash-bang controversy, I offer our clients my personal opinion.  My opinion may change as new developments and information appears.  I will update as I learn more but for now, here’s the bottom line:

I don’t recommend that our clients develop webpages using the single page, non-refreshing, hash-bang technique if they want the page to rank on search engines without considering the added amount of technical effort involved.

If they want to develop a page using the hash-bang technique, make sure you consider how to make AJAX pages crawlable by Google.  Given the added technical effort involved, you may find better results if you focus on marketing value metrics such as product buzz and PR, as opposed to focusing on SEO metrics such as rank.

A non-technical history on the subject and my personal opinion:

The hash-bang technique tries to solve a computer science problem that has been stuck in an 20 year old anachronism: every time the page refreshes, such as when you click on a link, any information that was gathered on the first page is lost before it gets to the second page.

Twenty years ago, everything was a static.  No logins, no Javascript, no AdWords, no analytics, nothing was personalized.  It was always the same static page.  Because there wasn’t a need to pass information, browsers and web servers were never built that way.

The implication travels miles deep.  Think about it:  If you lose information gathered on one page before seeing the next, how can you enter your username and password on one page to get access to the next?  How does Google Analytics know your travel path from one page to the next?

Well, fast-forward a little and the problem got solved.  Sort of. Browsers store the information gathered from the first page into a cookie.  That cookie is kept on your computer.  The second page reads the cookie so it can have the information that the first page gathered.  It’s a programatic headache and security problem, but I’ll spare you the details.

The fundamental problem still exists: you cannot keep information around if you refresh the page.  The problem is so deeply rooted on the Internet, it’s almost impossible to root out.  It has twenty years of web sites, content, browsers, companies, and people on top of it.

Ultimately, the hash-bang technique attempts to solve that problem by never refreshing the page.  Not refreshing means nothing gets lost.  But not refreshing means nothing new appears on the page.  To get new content, the browser uses Javascript and Ajax to get new content and change the page without refreshing the page.

This is great for websites like Twitter.  The page can constantly change without needing to be refreshed.  It’s more like a desktop application now.  Gawker wanted to move toward a more constant flow of stories.  The content is always fresh and the page is always up to date.

But GoogleBot is not a browser.  It copies the text on the page, sends it back to Google and moves on.  It does not click, scroll, or even read.  It doesn’t have Javascript so nothing appears on the page at all and so it has nothing to send back to Google.

And like any innovation in web development, Google needs to change their search algorithms and crawlers to accommodate the innovation.  An AJAX page that never refreshes and can change its content infinitely so is fundamentally different than the traditional page.  In order to crawl it, the bots have to act more like browsers and less like bots.

To make it easier for Google, they have proposed that developers give them HTML snapshots to help the GoogleBot along.  That opens a can of worms for Google, but that’s another discussion.  For now, make sure your developers study how Google wants you to make it easy for them to crawl.  It does add work for them which is why you should consider if it’s worth putting in the effort for the goal your project is trying to achieve.

In my opinion, the real problem is that HTML is a 20 year old anachronism and it needs to be squished out of existence in order to move the Internet and technology forward.   Technically and philosophically, the problem can be solved using techniques such as the hash-bang technique.   The Internet will benefit from having a richer user experience that’s easier to use and drives customer happiness, but for now, it adds a layer of complexity to make them work for crawlers.

So at this time, consider it a bleeding-edge web development technique.  Developers should study Google’s help on making AJAX applications crawlable.  Project managers and SEO advisors should consider the added amount of technical effort needed to make it crawlable by Google.  Is it worth the added effort or will do you need to focus on different goals?  Our account managers can help you decide if developing hash-bang pages is right for you and how to maximize the potential return.