How to Identify and Fix Duplicate Content Issues

It isn’t unusual for businesses to use the same text across multiple websites. You may have the same product for sell on your own website, as well as a marketplace like Shopify or Amazon, and use the same description in each place. While this repeat content can be valuable to building a consistent brand message for each of the products you sell, it can prove fatal to your search engine optimization efforts. Here is an example of duplicate content: Wayfair and Allmodern both have the same content since the product is sold on both sites. According to Google, duplicate content happens when two websites have substantial blocks of similar information. Google’s algorithms must then make a choice between those two sites since the search engine prefers to display unique information in results. There’s a common misconception that Google will penalize your site and delist it altogether due to duplicate content. Although Google contends that duplicate content isn’t grounds for negative action unless you’re deliberately trying to be deceptive, you may find that only one page with that content ranks, leading to problems. Obviously, in the example above you can see that multiple sites rank for the exact content but in this case, they chose a more authoritative site to rank the highest. This is especially important for ecommerce sites that have multiple product pages with similar information like in the example above. Multiply duplicate content across hundreds or thousands of products and your site will be considered low quality because it doesn’t have unique and useful content on it

Duplicate Content and SEO

Moz identifies three major issues duplicate content can cause for your search engine optimization (SEO) efforts. Those are:
  • Search engines are unsure which version to include while indexing.
  • Search engines are unsure whether to direct links to one page or split them across multiple pages.
  • Search engines can’t determine the order in which to rank pages with duplicate content.
For you as a site owner, that means you’ll find yourself dealing with an issue called “link equity. When search engines can’t decide which page to display, they’ll choose what they determine to be the best one, reducing visibility for other pages on your site. As search engines pick those pages up, they could prioritize a third-party site over your own, further dropping your page’s visibility. Worst of all, this means that the page that ranks on the first page of search results, which captures 75 percent of all clicks, may not be the page you want potential customers to see.

Identifying Duplicate Content on Your Site

Once you understand the effects duplicate content can have on your site’s performance in search results (we suggest taking a handful of products, copying the description content and searching the exact information in Google to see what comes up in the search results) , it’s important to occasionally check your product page content as new product gets added to the site. Keep in mind, there are other issues where duplicate content can cause an issue on organic growth for your site. You need to learn how to recognize it on your own website. There are several issues specific to ecommerce sites. By knowing what they are and how to identify them, you can take steps to avoid them.

Faceted Navigation

If you run an ecommerce site, it’s very possible faceted navigation is causing problems. Faceted navigation is a way to make it easier for customers to find products within a website. This type of web page setup allows visitors to apply specific filters to their searches within your site, which helps them narrow down products to only those they specifically want to see. Unfortunately, this type of navigation structure can be disastrous to your SEO efforts. The problem with faceted navigation is that you’re basically creating a unique URL for various combinations of the same product(s), just organized a different way on the site.  Take, for example, allmodern.com. They have a massive amount of home living products across numerous categories. If you navigate to the Outdoor Fireplaces section of the site you will see something like this: From here, you can apply a number of filters or facets to narrow down your results even more. So if a customer filters by fuel type > propane, product type > Fire Pit, that search will lead them to a URL with those types of products listed on the page like this: https://www.allmodern.com/outdoor/sb2/fire-pit-propane-outdoor-fireplaces-c528113-a2673~5832-a73097~326923.html When you apply more filters and it narrows down more to something like this: https://www.allmodern.com/filters/outdoor/sb3/outdoor-fireplaces-c528113-a2673~5832-a73097~326923-p6192~500~700.html Having numerous facets not only generates duplicate content, which is confusing for search engines, but it also dilutes your website’s equity, potentially leading search engines to index pages you don’t even want showing up in search results. Google spends a limited amount of time crawling a site (a.k.a – crawl budget) so the goal is to help search engines be efficient and only crawl the information that is unique and useful to users.  In the example above, you need to be careful about how you set up filters in order to avoid any issues with search engines. One way to identify if faceted navigation is causing problems for your website is to try a few searches. If you sell a variety of scarves in different colors, try searching by terms customers would use, such as “black scarf” or “yellow plaid knit scarf.” If you know your product inventory well, you’ll immediately see an issue if far more items appear in rankings than you actually offer. This is a possible sign search engines are indexing duplicate content. Prefix Disparities Customers pull up websites in different ways. Some may enter “www” before the site name, and some simply type in sitename.com. Setting up sites under both addresses is a way to easily capture the largest audience, but it can cause issues with search engines. The same thing applies if you have both “http” and “https” versions of your site. Make sure to check the various version of your site and set a preferred version that you can 301 redirect things too.   Internal Search Issues When someone conducts a search on your site, it creates a results page that could be picked up by search engines. This can appear as duplicate content to search engines and possibly push those results above other pages you’d rather see on the first page. The best way to determine if this is a problem for your site is to use Google’s Index Status Report. This will give you a list of all the indexed URLs on your site, as well as any URLs you’ve requested be hidden from indexing. Another quick way to see if this could be an issue it by doing a site:yoursite.com then add in search parameters to see what Google is indexing. For example, when you check wayfair.com’s internal search you can see any issues. The search query I used was: site:wayfair.com/ keyword=glass+rock+firepit which returned 20,200 pages being indexed. If your CMS does not do this by default (typically done using a robot.txt file), you can and should go a step further and non-index your search result pages so they don’t create duplicate content on your site. International Pages If your country does business outside of the U.S., setting up a page specific to each country may have been a top priority. This can be a problem for your SEO efforts, though, showing up to search engines as duplicate content, even if it isn’t in the same language. However, it’s important to know that as long as the content on your website is written authentically, using correct grammar and punctuation, you won’t have a problem. Sites that try to get away with using translation software to convert to a different language for a separate site are the ones that hurt their SEO. Syndicated Blog Content One innocent mistake you may be making is in repurposing your content. While this can help you get the most out of items like blog posts, it can also hurt your SEO efforts. Although the research used to create the information can be reused, it’s important to have original content on your site as much as possible. If you do syndicate your content on other sites like Medium, LinkedIn, Industry blogs or forums etc. then make sure you state where the post was originally published at and add a link to where the original content can be found. User-Generated Content Reviews are an important part of helping customers gain trust in your products. Studies have shown that the more reviews a product has, the more likely it is that clicks will result in sales.  However, they also play an important role in SEO. The more reviews there are on a product page, the more keywords there are for search algorithms. But review sites can interfere with your SEO efforts, especially if you’re hoping to rank for the information customers input there. If a reviewer copies the same review across multiple sites, results for that review could be diluted. You’ll also run into problems if you use services like Bazaar Voice, which syndicates reviews across multiple sites. With a review syndication service, reviews on your product on one site are automatically captured and copied onto other sites. Someone might review a product you sell on the manufacturer’s site, for instance, but not in your store. The syndication service could grab that review and ensure customers see it when they’re scrolling through reviews on your own site. Although more reviews are better, it’s important to make sure they’re original reviews to avoid harming your SEO efforts. When done the right way, though, syndication can work. When you’re first starting out, syndication can be a great way to have review content on your site while you’re soliciting feedback from your brand-new customers. Over the first few months, as reviews begin streaming in, you’ll find the syndicated reviews make up a smaller portion of your page content. You should also make sure your syndicated reviews include a link back to the original source. This will improve your site’s relevancy, which is an important ranking factor. Lastly, if you use syndicated reviews on your site, confine each review to only one page since having the same review duplicated across your website can lead to some of those pages being disregarded in search results. Manufacturer Information Ecommerce sites can be especially vulnerable to duplicate information between their own sites and manufacturer product sites. If you copy and paste text from your manufacturer’s site to your own product pages, you could see your site fail to rank well, if at all, for the exact product because the information is duplicated across numerous sites. An example of this is this product – https://www.allmodern.com/outdoor/pdp/palmas-propanenatural-gas-fire-pit-table-lsin1094.html If you search for this content: “Due to its handcrafted nature, minor variations in finish should be expected and are an intentional and desirable aspect of this product” you will see this: This product exists on numerous sites. Anything after page 4 of the search results gets omitted and is part of Google’s supplemental index and unable to rank.    Although importing manufacturer product information can be quick and easy, especially if your website provider allows you to do easily with an import tool, it’s worth the extra effort to rewrite each product description. The manufacturer’s page can serve as a useful jumping-off point but reword in a unique way that will both capture search algorithms and match your own brand voice. Duplicate Video For businesses that transcribe their videos, the transcription will show up as duplicate content if you posted it in more than one place. But posting your content in multiple places could also cause you problems if you choose to share them on YouTube. The site now will remove videos that are detected to be copied from other sites since posting duplicate content is against YouTube Partner Program policies. Potential Impact of Too Much Duplicate, Thin or Low Quality Content A side note here. We do not work on any of these businesses so we can’t confirm this is or was the exact issue that impacted the businesses organic rankings earlier this year when the August 1st core algorithm update hit (a.k.a Google’s Medic Update). These insights are solely from what we have seen by looking at different industries, competitors sites, and industry updates about the update as reported on Search Engine Land, seroundtable and other sites.  Like many factors of Google’s algorithm, there is a threshold or percentage that Google will allow before a trigger is set off to indicate a site is in violation or is deemed questionable. The Google Medic update seemed to be Google resetting the bar on how much duplicate, low quality or thin content is allowed before a site is flagged and started to drop in organic visibility. This decline can be gradual in the case of some sites below, or it can be drastic once a major algorithm hits. In either case, duplicate or low-quality content can have a major impact on your sites ability to rank well.  We have shared examples of sites in the home goods industry of duplicate content that is found on certain sites. It seems like since August 2018, this specific site has seen a large decline in organic visibility. Here is another example of a site that saw a massive hit once the August 1st update hit. This site, which offers custom printed clothing & apparel, had a bunch of low quality and duplicate content lost nearly 70% of there organic keyword visibility according to SEMRush. Here is a more granular view of what this looked like for the business. Here is one more example if you need it. This is dictionary.com:

Attack duplicate content with the right tools

An easy way to quickly identify your problems, though, is to use a tool that lets you search to see if your content is showing as duplicate. Here are a few top tools that can help with your duplicate content detection.
  • DeepCrawl—Enter your domain name and you’ll get a full analysis of your website’s architecture. Best of all, the service regularly checks your site and alerts you to any issues you might be facing.
  • Screaming Frog—This UK-based site audits your website’s links, images, CSS, scripts, and internal apps to detect issues that might be affecting your SEO efforts.
  • Raven—This tool not only audits your site today, but it provides reports of your site’s progress over time to show how any changes you make from one week to the next affect your SEO.
  • Google—Of course, one tool you always have available is to simply Google exact sentences and see how many pages come up. This will help you quickly identify duplicate content issues, but it may be a tough strategy to implement long-term.
  • Copyscape—There are various ways to check for plagiarism, including sites like Plagium, but Copyscape is designed specifically for website copies, making it the best choice for detecting duplicate content. Although the free version lets you search for copies of your own site, the premium version has many features that will make things easier if you regularly source content for your site.

Repairing Duplicate Content Issues

Luckily, there are some things you can do to repair your duplicate content issues once you’ve discovered them. One is to use noindex tags, which instruct search bots to not index a particular page. You can do this to resolve the issue of your faceted nav and internal search pages show up in search results. You can also use your robots.txt to disallow certain sections of your site while indexing. This is ideal if you want to exclude entire sections of your site, such as your search pages (if they are the same) reviews, size charts, orphaned pages, paginated pages, etc. Overall, though, one of the best things you can do is ensure you create original content wherever possible. If you transcribe a video to provide text for your SEO efforts, don’t copy and paste that text to a blog post. If you guest post on a blog and you want to share it with followers, do so using a link rather than copying and pasting it onto your own site. It’s worth the extra effort and expense to create original content if it keeps your content at the top of search rankings.]]>

TJ has worked in the digital marketing space since 2006. He has worked at a number of agencies and and helped hundreds of clients grow their business through SEO, PPC, Social Media and Content Marketing. He currently lives in Lehi , UT and enjoys spending time with his family.

Leave a Reply

Your email address will not be published. Required fields are marked *

Join an E-commerce Newsletter Worth Reading

Subscribe to our weekly, no-fluff newsletter packed with actionable insights to help grow your D2C brand!