Regardless of whether you are an SEO yourself or your responsibilities for a particular website are a bit more all-encompassing, the phrase “duplicate content” is one that likely strikes fear into your heart.
When I first started working for Launch a little over a year ago, I heard those two terrifying words tossed around quite a bit and felt I had a pretty good grasp on the meaning. Seems obvious, right? Is there identical copy on both pages? Well, there you go! And while I might not have been able to provide concrete support to back this up, I knew duplicate content was bad bad bad.
I knew it was a problem that needed to be called out immediately and shamed for its “duplicate-ness.” It was so despicable that Google would even dole out punishment if it was found on your site—or so I’d thought.
But as time went on, it started to become apparent that the phrase is actually a very popular catch-all that we use to sometimes describe very different things, some of which aren’t really duplicate content at all—at least by Google’s standards.
Furthermore, Google doesn’t treat all these instances the same; in fact, some of the things we might mistakenly view as duplicate content might be perfectly innocent. Ultimately, it’s important for us to come to a consensus on the terminology so we can hopefully stop seeing a boogeyman where none exists.
What Is Duplicate Content?
For my own part, it’s ironic that, while supposedly about 29% of what’s on the web is duplicate content, the instances that fall under Google’s definition of duplicate content were not at all the examples I was marking with a shameful scarlet letter.
In actuality, Google defines duplicate content as “substantive blocks of content within or across domains that either completely match other content or are appreciably similar.” That part of it I was on board with. But then they add, “Mostly, this is not deceptive in origin.” Rather than it being an intentional form of dishonest practice, Google typically applies this label to technical website issues, such as:
- Discussion forums that generate both regular and stripped-down pages targeted at mobile devices
- Store items shown or linked via multiple distinct URL
- A blog post (where you have the original post and then a version that shows up on the blog’s home page and in the archives)
- Domain (www vs. non-www) and protocol (http vs. https) discrepancies
- Sites with printer-only versions of pages
You’ll notice in all these cases that, rather than seeming like any kind of deceptive practice, these feel like honest oversights on the part of the webmaster. Luckily, if you make a point to arm yourself with some knowledge of noindex meta tags and canonical URLs, you should be able to prevent unintentional duplicate content issues within your site like the most common examples mentioned above.
But let’s say you haven’t yet done any investigation, so you’re blissfully unaware of any duplicate content on your website. Or, better yet, you are aware but are choosing to defiantly thumb your nose at Google. What happens then? In a nutshell, Google chooses one of the duplicates to list in the SERPs and filters the other one out.
While that might seem harmless enough—certainly in conflict with this notion of a “penalty”—not being proactive about instances of duplicate content leaves the decision of which page to display completely in the search engine’s hands. Which page gets indexed? Which page should accumulate authority? These are just a few of the matters for which you’re basically rolling the dice.
Is There a Duplicate Content Penalty?
While there continues to be a great deal of confusion on this issue, Google continues to assure us that there is no penalty for duplicate content, although hopefully the potential problems we just looked at are reason enough to actively steer clear of duplicate content on your site whenever possible.
A better way to look at the work that Google does and the possible ranking changes that resulted from the release of the Panda algorithm update is that Google does not punish sites with duplicate content (even if the duplicate information comes from a different site, like in the case of republishing syndicated news stories), but instead rewards high-quality sites that offer added value for the user. Why should they rank your site higher than one that provides users with more unique, valuable content?
Try looking at it from a completely different perspective. Imagine you’re a runner and you run a 10K—if you come in 12th or 30th or 1,000th place, are you being “penalized” for being slower?
No, of course not. The person who came in 1st is simply rewarded for doing the best job at the task everyone else was working toward, too. Only one person was the fastest—who was it? You can’t have 1,000 1st place winners.
But that’s where we have people crying “penalty.” In reality, they’re upset because either an algorithm update resulted in a reshuffling of the rankings (presumably not in their favor), or they’re upset that Google filtered out their page, not to penalize, but because it was similar enough to something else that there was seemingly no reason users might want to view it. There’s a difference between these scenarios and an actual penalty.
Google wants to reward the best possible user experience, so if you are explicitly doing something spammy or deceptive in an attempt to manipulate the user, well, Google really hates that.
It’s worth noting, though, that from Google’s point of view, this sort of practice is a different problem from duplicate content. With the sorts of technical duplicate content we’ve already reviewed, you don’t get punished per se, but you are at the mercy of the search engine when it comes to your subsequent ranking.
With deceptive content, on the other hand, Google can actually issue a penalty, and you’ll know definitively if it happens—you’ll actually receive a notification from Google Webmaster Tools.
Here are just a couple types of deceptive practices you’ll want to stay away from:
- Doorway Pages: This is when the exact same content exists across multiple domains with only the city name changed for the purposes of ranking. These pages try to “funnel” users who perhaps click on a site that touts “Horror Movie Showings in Hoffman Estates” or “Horror Movie Showings in Austin,” depending on their location, only to ultimately end up at the same place, which might be just a general movie showings database, not specific to a particular city.
- Spun Content: When you take a piece of writing that already exists and try to rewrite select parts of it to reduce the similarity to the original, you have article spinning. These pieces try to pass themselves off as “new” content when, really, no time was put into their creation; in fact, the process for this can even be automated.
The bottom line, of course, is that we want to avoid these types of practices like the plague—or bad clichés! Not only can sites that utilize these techniques be subject to actual penalties from Google, but they provide a really poor user experience. Why spend so much time and effort trying to trick people into going to your site when you could instead spend that time producing helpful, engaging content for your users?
Time to Clean House
Ultimately, you only really need to worry about getting Google-shamed if you’re actively engaging in the latter instances of purposely sneaky online behavior. And if you are, knock it off already! You’re not doing your readers any favors, which means you’re not doing any favors for your business either. Conversely, if it’s beneficial to your readers, it will likely be beneficial to you, too, so always keep that in mind.
But it’s worth noting that the types of innocent duplicate content discussed earlier can still pose a real problem for your site. The best thing you can do for yourself is be aware of what you’re putting out there, although having this kind of awareness can certainly be tricky for sites that have been active for quite some time.
Do you have what feels like a metric ton of live pages on your website? It could be time to do a little spring cleaning and thin the herd a bit. Either way, open up a discussion with your webmaster regarding your duplicate content concerns, and make sure tools like no-index meta tags are being put to good use where needed. While it may involve more work upfront, you’ll ultimately be able rest easy knowing that your online presence and the overall health of your site is now in capable hands: yours.