In my career I have come across many online businesses from large e-commerce website and content portals to small profile website for professionals such as lawyers and surgeons that have suffered a platform migration in the name of improvement leading to lost or dramatically-decreased organic search engine rankings and even unhappy visitors in some cases. the emphasis is on “suffered” as almost all of them have been so excited about and wrapped up in the idea of a better-looking or -functioning website that they have either ignored or deemed a good compromise to ignore one of the most basic aspect of the move: the addresses by which their pages are connected to the world i.e. URLs. The following is meant to help understand the need for considering your URL infrastructure as an important aspect of migration, and one that can be retained in part or whole with varying levels of effort with simple work-arounds such as one-to-one mapping to more technical implementations such URL rewriting.
Why Do the Search Engines or I Care about My URL Structure?
Whether static or dynamic, the URL addresses of the web pages that make up a website are the direct connection between them and the outside world: they are used to advertise, share and even access the pages and their content. If direct page URLs did not exist, we would always have to enter a website at the homepage level and follow one or more usually repetitive steps every time to get to the page in mind. A not-so-well-thought-out real-life analogy would be that of having one community doorbell for all apartments in a building, which creates extra steps–and in this case even chaos–every time that one tries to access one of the units.
The tangible, even monetary, value of the public URL structure of an e-commerce website can be realized in the revenue generated from the organic, hard-to-obtain, free search engine traffic that is brings, amongst many other things such as return and referral visits by customers, social bookmarks, etc. Not to get into the granular details of search engine ranking algorithms, as search engines visit a website, they paint a picture of its URL structure in the search index they use to determine what web page is most suitable for a given keyword. They analyze the siblings, parents and children of URLs to determine a context for them and needless to mention they use those very URLs to retrieve the content that also weighs in on relevance to a search term.
It is not hard to imagine that changing or removing a URL that search engines have come to know–as explained in the previous paragraph–over time, is quite similar to pulling the proverbial rug from under the rankings that the URL in question enjoys at the discretion of the search engines. I hardly think there is the need to justify that organic search engine traffic–as compared to Pay-Per-click (PPC) search advertising–holds quite a high ROI value for any online business, but the need to change the URL structure of a website is real and sometimes unavoidable.
If Change is Unavoidable, but Search Engines Do Not Like It, then What to Do?
Most importantly, one must always leave the burden of proof on the “URL change.” This will automatically guide any migration at least towards the right path: if compromises are required, they’ll be pushed on to the correct side; if work-arounds are explored, they’ll be to save the right thing.
If it is determined that the URL structure must indeed change due to a different hardware or software infrastructure (e.g. new platform, different programming language, change in operating system or web server), or perhaps to improve web usability or even search engine placement (e.g. more memorable URLs, less duplicate URLs), then there are certain ways to help mitigate the risk of losing your organic search engine rankings.
A URL’s Identity: What Exactly Am I Protecting?
By now, you have realized that your URLs–or at least some of them–deserve attention prior to being tossed about violently and left to their own devices to survive a website move. This attention is aimed at retaining as much of the URLs identity as possible from the following angles:
- The URL Itself: The address can either be kept as it is or redirected with a server-side 301 redirect to the new destination of the URL. A less-universally-accepted approach is allowing the old URL to deliver the same content as the new URL, but instead of redirecting old to new, using a canonical URL meta tag that tells the search engines to only treat the new URL as the main URL of the page. This tag is generally not ignored by most major search engines, but the approach hasn’t shown to add any value as compared to the 301-redirect method, and is not appropriate use of the canonical meta tag in comparison, since 301-redirect is specifically meant for “permanent moves.”
- The URL’s Page: The URL itself is seldom the only thing changing during a website migration. Most changes are within the page that the URL points to, such as the title, content, images, design (i.e. HTML). Close attention has to be paid not to dramatically increase the HTML-to-text ratio of pages, which is not a difficult matter these days with universal use of CSS and better HTML coding standards than ever before. After all, notifying search engines that you have moved but showing them an entirely different page than they are used to, is not really going to score the vote of confidence you’re looking for from them; they’ll want to reconsider everything since you will seem to be offering entirely new pages. If this seems limiting, you just need a better web designer or developer, as virtually any improvement can be made to a page while maintaining its search engine identity. More detailed discussion is beyond the scope of this article, but will hopefully be covered in future ones.
- The URL’s Context: Imagine moving your organic locally-sourced cafe business according to protocol and notifying your customers of the new address, and maintaining your menu and ingredient choices, but relocating from a serene country-side into the food court of a heavy commercialized shopping mall. The very change of venue is upsetting, but the lack of familiarity between your business and its surroundings means that even your loyal patrons will need to rebuild their confidence in your business. This is precisely the same with URLs: you need to make sure important URLs have the same or similar parents (URLs that link to them), children (URLs they link to) and siblings (URLs that their parents link to, and URLs that link to their children). This might seem complicated, but unless you move a category URL to the customer service section or a product review article to the shipping policy section, you should be covered. Just be mindful of the overall place of your URLs within the link structure of your website.
Go into It Prepared: Compile Lists.
Prepare for the decision by aggregating factual and statistical data into actionable lists. Do not delve into the actual remedy yet, as you’ll read later on that sometimes it is easier to fix everything at once than treat URLs one by one. Some important lists with brief usage explanations are below, but not all may be available or even needed in different scenarios. However, the more data that is gathered, the easier it will be to make a case for each compromise and work-around.
- List of URLs listed by search engines for your highest-ranking organic keywords in Google, Yahoo and even Bing, as well as any niche search engines that may be sending you a lot of traffic. These URLs will not only have to be either maintained or carefully redirected, but also the pages they deliver need to be carefully examined from an SEO standpoint and meticulously reconstructed within the new website. Some things to watch out for are meta title changes, HTML-to-text ratios, in- and out-bound links (count, anchors and URLs), and of course content itself.
- List of URLs with highest non-search-engine referral traffic from social networks such as Facebook and Twitter, bookmarking sites such as Digg and Del.icio.us, and any website that somehow has linked to your website and is sending traffic to it. These URLs will have to be handled as delicately as the ones in the previous list; however you may have more freedom in changing content as long as the overall value and message for the visitors is retained.
- List of URLs (if any) you provide on your website for direct bookmark creation i.e. “add to favorites.” Just remember to provide your new URLs for manual bookmarking and redirect the old ones to their new destinations.
- List of your site’s external in-bound links with destinations to determine which URL these links point to. If able to compile this list, you may sometimes find URLs that add value to your website rankings but did not appear in the first two lists mentioned above. Treat them the same way you would an organically ranking URL.
Once you have these lists, and remember how similar changing the URL of a well-ranked or -linked page is to shutting down your business and moving without notifying your customers, you will already feel more certain that you are well-equipped to make a decision in either side’s favor. If the lists are meager and unimportant–for example because most of your business is due to paid advertising or offline traffic generation venues–you’ll confidently change your URLs to the most optimized state as mandated by the new requirements; you might even go far enough to implement mere best-practice recommendations to avoid any further changes in the URL structure for as long as possible. However, far more often than not, the lists and traffic related to your current URLs are substantial and need more consideration and attention.
Quick-and-Dirty Or Thorough: How to Actually Make the Move with the URLs.
Now that you know why and what to protect, the path to it will be more easily trodden. In general there are two different approaches: Static or Dynamic–this is my wording for it, and can definitely be improved. The former calls for directing each old URL to a new one often individually, while the latter pushes all URLs through a filter that will find a new home for them dynamically based on patterns programmed into it.
- Static URL Mapping: Unlike its name may suggest, this can be done both manually or using some scripting and a database, but what makes it static is the fact that each URL is mapped to its new counterpart by patterns invisible to the application that runs the new website. For instance, a human editor may have decided that all category pages now point to a new URL where some parameters are re-ordered, the name of the category is used instead of the ID, and unnecessary data such as sort-order is dropped from the URL. This logic may have then been applied in a spreadsheet to a large list of URLs, but the end result fed to the website application allowing it to know where each of the old URLs are supposed to point to. This is sometimes desirable when the number and exact format of URLs being handled is mostly known. It is very quick, but lacks scalability and requires redoing if a new batch of old URLs are discovered to have been ignored initially.
- Dynamic URL Mapping: Just as in the previous method, patterns in URL sets are detected and then used to map the old to the new. However, the difference is that these patterns are programmed into a filter within the website application instead of the mind of a human being. This means that the website can map all URLs that match the programmed patterns. Furthermore, mapping any new set of URLs is as simple as telling the program what pattern to look for. The website application usually used for this is either some type of URL re-writing module (e.g. Apache Mod_Rewrite or IIS ISAPI_Rewrite) or small scripts written in the language the website was developed in (e.g. Perl, PHP, Java). For more complex URL infrastructures both are used, the former to funnel URLs that match a certain pattern to the latter where they get paired with their new destinations. The patterns themselves are usually written in some type of Regular Expression format–a means used for pattern matching by many popular website development and programming languages. While this method requires more programming and certainly access to layers of the website application that some shared web hosts will not provide, it does save large amounts of manual editing time for larger sets of data such as the URLs of an e-commerce website or news portal.
Whichever method you choose, it needs to be sensible when considering your budget and possible adverse impact of ignoring the matter as a whole. Choosing the dynamic approach when programming resources are scarce can cause the shutting down of the idea, and leave you with orphaned URLs, while choosing the manual static method in the face of a large set of URLs can increase the error margin and be costly in time to return and fix errors.
Real Life Example: URL Rewriting when Migrating to Magento Enterprise Edition
Since this article has turned into a virtual run-away that I only hope you were able to follow along with some sense of coherence, I am providing a quick overview of the most recent migration to Varien’s Magento E-commerce platform from a proprietarily-designed PHP platform in a simple ordered list:
- We knew that we needed to address the major URL change and there was no avoiding it since our current URLs were meaningless, long and over-duplicated.
- Due to the sheer amount of URLs but limited patterns (i.e. top-level category, sub-category, product detail, other content) we decided to employ the dynamic method.
- We reviewed the URL structure and determined the patterns that matched each group of similar URLs listed in the previous step, and turned these patterns into regular expressions.
- Since almost all URLs in Magento are fed to a central filter that decides where they point, we simply had to feed these regular expression patterns to that filter.
- Then we had the filter pass on each URL that matched any of our patterns to a small script that determined where to point it based on the pattern it matched. For instance, an old URL with ProductID=x in its query string would now point to the new URL for this product after the small script looked up the new URL for product “x” in the database.
- We did have one extra step, which I have eliminated from this article due to its detailed nature: our Product and Category IDs had also changed. The solution was quite simple, we added a new attribute for each product and category (very easily done in Magento) and told the mapping script to first lookup the new ID for the old ID passed by the old URL, and then find the new URL for the new ID, which became the new URL for the old URL. Try that for a tongue-twister.
Well I hope this lengthy and somewhat discombobulated article helps at least consider URL rewriting or some other method of URL identity retention during any website migration. Talk to your developers and force them to pose viable solutions for it, there, almost always, is one.