PDA

View Full Version : The impact of duplicate PRODUCTS on Google?


Jarvis
18-Oct-2008, 09:32 AM
I have been following the 'Duplicate Content' thread elswhere on this forum.
A lot of the discussion is based on finding solutions to Duplicating Products and suggests that the need is due to Googles poorer rankings for sites containing duplicate content.

I have created this seperate thread as I have a diferent angle on it.

I want to test the basis for that assumption.
I have researched a few SEO sites and really can't find any hard evidence of this being a fact.
It appears to me to be a bit of an urban myth.

The nearest I can get to fact is from the Google Wedmaster's Help page, which states, Don't create multiple copies of a page under different URLs. Many sites offer text-only or printer-friendly versions of pages that contain the same content as the corresponding graphic-rich pages. To ensure that your preferred page is included in our search results, you'll need to block duplicates from our spiders using a robots.txt file.

My understanding is that it is 'Duplicate Pages' that cause Google an issue, NOT 'Duplicate Products'.

If pages are constructed to contain 'Manufacturer' or 'Colour' sorted pages, then these are going to have pages that contain different content.

True that some of the content is duplicate, but the pages themselves will be fundamentally different.

Has anybody on this forum got any hard evidence of whether this is right or wrong?

leehack
18-Oct-2008, 09:48 AM
My understanding is that it is 'Duplicate Pages' that cause Google an issue, NOT 'Duplicate Products'.
On SPP setup a page contains the product, so that is a play on words surely? With duplicate SPPs across the site in a number of places, that would achieve both of the above points? It sounds like you are focussing on the listing page, where the concern has to be on the product page, which is the real key page IMO.

You'll need to block duplicates from our spiders using a robots.txt file.
I think that's the B&W evidence, can't be much clearer.

Mike Hughes
18-Oct-2008, 10:25 AM
It's pages with duplicate 'content' that get removed and that doesn't just mean 'identical' content either.

Google's aims are relatively straightforward, to avoid including similar pages in the serps. It's very sensible really. If a user searches for 'widgets' you want to present them with several different options that rank well. Not the same or slightly altered content on different pages or websites.

There's been a lot of discussion about how 'similar' the content has to be for Google to see it as duplicate but no-one knows for sure and there's really no way to tell because Google are always introducing new algorithms and tweaking the ones they're already using. Testing is also impossible because Google often publicises the concept of something long before implementing it.

The key thing is that if you want to avoid pages being counted as duplicate then the content has to be significantly different. That's more than just a few different words here and there.

I do agree with you on ranking though. I don't believe sites with duplicate content suffer any penalties. Google doesn't see this as a sign of anything being wrong, it just doesn't want many copies of the same thing clogging up the results. As far as I know, all that happens is that all but one of pages that are essentially similar are removed from the index.

Mike

Jarvis
18-Oct-2008, 11:24 AM
My thinking is this:

There are 3 suppliers, Supplier-1, Supplier-2, Supplier-3.
Each Supplier has 3 product ranges, Widgets, Thingies and Whatsits.

The cart has 6 sections:
Supplier-1 containing products, Supplier-1-Widgets, Supplier-1-Thingies and Supplier-1-Whatsits.
Supplier-2 containing products, Supplier-2-Widgets, Supplier-2-Thingies and Supplier-2-Whatsits.
Supplier-3 containing products, Supplier-3-Widgets, Supplier-3-Thingies and Supplier-3-Whatsits.
Widgets containing Duplicates of products, Supplier-1-Widgets, Supplier-2-Widgets and Supplier-3-Widgets.
Thingies containing Duplicates of products, Supplier-1-Thingies, Supplier-2-Thingies and Supplier-3-Thingies.
Whatsits containing Duplicates of products, Supplier-1-Whatsits, Supplier-2-Whatsits and Supplier-3-Whatsits.

Now by my reckoning that is 6 distinct pages each containing a unique combination of products.

Someone searching for products by a Supplier would reasonable expect to land on a relevant Supplier page.
i.e.; "Supplier-1" would return page "Supplier-1".
Someone searching for a particular Product would reasonable expect to land on a relevant Product page.
i.e.; "Widgets" would return page ""Widgets".

Surely Google is not ranking anything lower simply because a single product may appear in more than one page.
Is this not what is meant by "Relevance"?

Rich Brady
18-Oct-2008, 11:34 AM
But on a Single Product per Page you would only have 1 Product on it...

For instance one of my clients sells Gore-tex Camouflage Jackets.

So the Product would go under Section called Gore-Tex and it's duplicate(s) could go under Camouflage, Soldier 95, hell even a Hunting, Shooting Fishing Section.

So there are 4 pages exactly the same. One original, 3 dups...

Some will argue that you should change the text in the duplicate fields, but then they are no longer duplicates and from a client point of view he doesn't want to have to write 4 different descriptions for the same product, he's got more imprtant things to do!

Too much time, this is especially pertenent for another client who sells Compatible Ink Cartridges. The same cartridge may fit 15 printer models and I don't care how creative you are there are only so many things you can say about an ink cartridge ;).

So, I was very grateful to see Gabe's code in the other thread and have already used it for U-Need-Ink..

Mike Hughes
18-Oct-2008, 01:31 PM
Roger, as long as the pages are sufficiently different with unique content then they'll probably be included in the index. If Google decide the content isn't unique enough then they wont. It doesn't matter where they come from or how you're structuring your website it all boils down to how unique the content on the pages are.

Most of the discussion we've been having is about the impact on the product pages rather the section pages. As long as the section pages are unique enough there shouldn't be a problem with them.

The quote from Google also seems to be quite clear that you only need to do something about it if you'd rather one particular 'duplicate' page was in the index rather than letting them choose.

To ensure that your preferred page is included in our search results, you'll need to block duplicates from our spiders using a robots.txt file.

Mike

leehack
18-Oct-2008, 02:43 PM
Surely Google is not ranking anything lower simply because a single product may appear in more than one page.
Is this not what is meant by "Relevance"?
As i said in my first answer, you are focusing too much on section listing pages, it is almost impossible for these to be classed as duplicate pages as they will often bear no resemblance to any other page. They often fair quite poorly in search engines too often due to a lask of any real content. The problem is at product level where we want the vast majority of searches taking people to. Although the thread you refer to has detailed a couple of ways that can be stopped. A section listing page, with duplicates from other places in your catalog has no issues at all in my book.

RuralWeb
18-Oct-2008, 02:50 PM
Am I dreaming or did we not put all this to bed last week

leehack
18-Oct-2008, 02:53 PM
LOL Mal, no ur not dreaming, it's another angle on the same thing :p.

RuralWeb
18-Oct-2008, 03:31 PM
zzzzzzzzzzzzzzzzzzzzzzzzzzz:rolleyes:

Jarvis
18-Oct-2008, 04:01 PM
Mal, no, you are mistaken.
The other thread refers to how to address a perceived problem.
What I am questioning here is whether there is a problem at all.
Or at least to be able to scope and scale the problem if there is one.

Lee, I take your point about specific product pages.
What I am referring to in my example is a section listing of products.
"Extended Information" pages I realise would be 'at risk'.
Equally, Single Product Sections would similarly be an issue.
It is the 'default' layout of many products to a page that I suggest would not be an issue.

RuralWeb
18-Oct-2008, 04:12 PM
Its pretty basic really - if two or more single pages has a significant duplication of content ie text then only one will usually be indexed. The page that is usually indexed is the older page to avoid competitor sites stealing content.

If you want to control which page you have indexed then you need to block the others AND make sure that the page you have created is not the same (in googles eyes) as any others out there.

Jarvis
18-Oct-2008, 05:43 PM
I agree with what you say there Mal.
However, I think that you have highlighted the real issue.
The key word here is "significant".
What is significant?

I have a client's site that has some 60 suppliers each producing say, on average 30-40 products.
He has about 200 sections and subsections.
All the products are listed as products in manufacturer's sections.
These act as the 'master products'
All other sections are made up of 'duplicates' those products.

Let's say for arguments sake that each section has about 4 or 5 suppliers contributing to the page content.
Would that contribute as 'significant' duplication.
If it does, then my client and I would imagine a huge number of other Actinic sites, would have a major problem.

On the other hand, what sort of percentage of duplicate content would be considered 'significant'.
There has to be a threshold if the premise of penalisation for duplication is true.

Duncan Rounding
18-Oct-2008, 05:45 PM
How long was that piece of string again.

Jarvis
18-Oct-2008, 05:48 PM
How long was that piece of string again. Three hundred

leehack
18-Oct-2008, 05:51 PM
How long was that piece of string again.
Twice as long as half of it.

leehack
18-Oct-2008, 05:52 PM
With 4 or 5 suppliers contributing to a page, at best that will create a page around 20-25% the same as any other, so no way could that be duplicate IMO, but the product page where you actually view the product, almost certainly if it is just a straight copy.

RuralWeb
18-Oct-2008, 05:54 PM
On the other hand, what sort of percentage of duplicate content would be considered 'significant'.
There has to be a threshold if the premise of penalisation for duplication is true.There is a limit BUT all SEO specialists have thier secrets;)

Jarvis
18-Oct-2008, 06:05 PM
There is a limit BUT all SEO specialists have thier secrets;)
LOL, And so does Google:o

Lee, As far as a dedicated Product page will, as you say, clearly be seen as a duplicate.
To be honest, that really does not matter too much.
The Section listing of products will already have done it's work, being the landing page for example, a search for 'Widgets'.
Most of my sites sell have products purchasable from the Section listing anyway.
'Extended Info' pages are not attached to all products.
It would still be interesting to understand what percentage of duplication is considered significant

Mike Hughes
20-Oct-2008, 09:16 AM
It would still be interesting to understand what percentage of duplication is considered significant

Search on Google and start reading. There's lot's of discussion on this.

Mike

CymraegKev
21-Oct-2008, 09:41 AM
Matt Cutts has a [reasonably] similar discussion here: http://www.mattcutts.com/blog/duplicate-content-question/

His blog really is worth sitting down one day with a large coffee and trawling through. Some incredible SEO tips in there I think.

HTH

Kev

Jarvis
21-Oct-2008, 06:05 PM
Search on Google and start reading. There's lot's of discussion on this. MikeThanks for the advice Mike.
In fact I DID search the forum but did not find a thread that addressed the particular question of whether duplicates, used in the way I have detailed at the start of the thread, actually have an impact on Google.

Out of over 500 threads that include the word Google, some where irrelevant, dealt with v7, didn't quite hit the button or had been highjacked by flippancy.
It has to be said, the 'search' and 'Advanced Search' on the forum aren't the best tool and can sometimes be difficult to narrow down.

As a regular visitor, (and more recently a contributor), to the forum over many years, I have read virtually every post on the forum and am always happy to see 'Old Chestnuts' freshened up and viewed from a different angle.

Nonetheless, I DO take your point Mike, but on this occasion feel that the new thread is justified.

Stereo Steve
21-Oct-2008, 08:11 PM
Interesting one this. We have SPP pages which we aim to get 120 words or more on. We also may have 1 or more duplicates above this which each will have 40 unique words. Then on the home page we have new products. For this, we have created a variable where we can add yet more unique text in the product itself to appear on the home page. If this variable is empty, it reverts to using the first 30 words code from the AUG. The aim of this is to be able to put targeted keyword rich text on the home page which will not conflict with the actual products there but not look bad if we haven't had time to fill it in.

Recently, we have been trying to catch up with getting stuff on the site, putting SEO second for the time being. This has meant creating a breif 40 word desc for each item and copying this into the duplicates and it also appears on the home page under new products. The net result has been that the text is indexed for the home page first and the SPP page goes supplemental. Bad news. We figured this might happen but took the descision to go for it as a quick fix to get all stock listed.

The very best solution is to have unique text wherever a duplicate appears. It's hard work but it pays in spades. I decided to experiment with what other sites seem to do (duplicate content) and you usually end up with a non relevent page being indexed over the SPP.

Experiment done. Back to the unique text...

Mike Hughes
22-Oct-2008, 08:45 AM
Nonetheless, I DO take your point Mike, but on this occasion feel that the new thread is justified.

Sorry, I wan't suggesting there shouldn't be a thread on this or anything. Merely that if you want to find out what percentage of content needs to be unique then you need to start reading up the on the subject. There's been huge amounts of discussion on this (not on the forum, which is why I suggested a google search rather than forum search) and unsurprisingly views vary.

As I've said before though, the percentage figure, whatever it is or may be, isn't really important. The important thing is to write unique content rather than try and fudge existing stuff.

Mike

RuralWeb
22-Oct-2008, 10:16 AM
I think we are "duplicating" posts now, as Mike says we have been here many times and to answer your question - there is no published % for duplicate content - only Google know this and anything people say is only speculation. SEO specialists know what works for them and you are unlikely to get any of them to say what they do as that is thier business and how they make money - its a bit like designers giving away free templates.

SEO is big business and part of the marketing effort is to create confusion and panic - then offer a solution at a price.

Buzby
22-Oct-2008, 06:00 PM
Just to throw a spanner in the works.

I sell a plain wedding ring which is available in 5mm, 6mm, 7mm and 8mm widths. The 4 rings are exactly the same in every way except for the width.

The rings are very plain and it is very difficult to be descriptive about them at the best of times, even more difficult to have 4 different lots of content.

As a result each item has a (99%) duplication of content but when you search for it by product name as described on the website, each comes up on the first page of google, in the top 3. Each name is identical except for the prefix of width.

If Google penalises for content why are all 4 products showing on page 1 when all I have to do is change 1 character in the search term?

I can understand (and agree) with not having duplicated content, but some times content does need to be duplicated, even to a spammy level in order to give the visitor the experience that they require. For instance, if your website is a specialist site selling 100 types of widgets then every product and page name would include the word "widget" this would be incorrectly interpreted as spam.

Could it be the case that although Google recognises duplicate content it has worked out that they are different products in their own right and worthy of listing under slightly different search terms.

Kind regards

Jason

RuralWeb
22-Oct-2008, 07:32 PM
Ok in an attempt to put this whole subject to bed this is what google says:

Duplicate content



Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:
Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
Store items shown or linked via multiple distinct URLs
Printer-only versions of web pagesHowever, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic. Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results.
Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has a "regular" and "printer" version of each article, and neither of these is blocked in robots.txt or with a noindex meta tag, we'll choose one of them to list. In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we'll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.

There are some steps you can take to proactively address duplicate content issues, and ensure that visitors see the content you want them to.
Consider blocking pages from indexing: Rather than letting Google's algorithms determine the "best" version of a document, you may wish to help guide us to your preferred version. For instance, if you don't want us to index the printer versions of your site's articles, disallow those directories or make use of regular expressions in your robots.txt (http://www.google.com/support/webmasters/bin/answer.py?answer=35303) file.
Use 301s: If you've restructured your site, use 301 redirects ("RedirectPermanent") in your .htaccess file to smartly redirect users, Googlebot, and other spiders. (In Apache, you can do this with an .htaccess file; in IIS, you can do this through the administrative console.)
Be consistent: Try to keep your internal linking consistent. For example, don't link to http://www.example.com/page/ and http://www.example.com/page and http://www.example.com/page/index.htm.
Use top-level domains: To help us serve the most appropriate version of a document, use top-level domains whenever possible to handle country-specific content. We're more likely to know that www.example.de (http://www.example.de) contains Germany-focused content, for instance, than www.example.com/de (http://www.example.com/de) or de.example.com.
Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to block the version on their sites with robots.txt.
Use Webmaster Tools to tell us how you prefer your site to be indexed: You can tell Google your preferred domain (http://www.google.com/support/webmasters/bin/answer.py?answer=44231) (for example, www.example.com (http://www.example.com) or http://example.com).
Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.
Avoid publishing stubs: Users don't like seeing "empty" pages, so avoid placeholders where possible. For example, don't publish pages for which you don't yet have real content. If you do create placeholder pages, use robots.txt (http://www.google.com/support/webmasters/bin/answer.py?answer=35303)to block these from being crawled.
Understand your content management system: Make sure you're familiar with how content is displayed on your web site. Blogs, forums, and related systems often show the same content in multiple formats. For example, a blog entry may appear on the home page of a blog, in an archive page, and in a page of other entries with the same label.
Minimize similar content: If you have many pages that are similar, consider expanding each page or consolidating the pages into one. For instance, if you have a travel site with separate pages for two cities, but the same information on both pages, you could either merge the pages into one page about both cities or you could expand each page to contain unique content about each city.Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don't follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.
However, if our review indicated that you engaged in deceptive practices and your site has been removed from our search results, review your site carefully. If your site has been removed from our search results, review our webmaster guidelines (http://www.google.com/support/webmasters/bin/answer.py?answer=35769) for more information. Once you've made your changes and are confident that your site no longer violates our guidelines, submit your site for reconsideration (https://www.google.com/webmasters/tools/reconsideration?hl=en).
If you find that another site is duplicating your content by scraping (misappropriating and republishing) it, it's unlikely that this will negatively impact your site's ranking in Google search results pages. If you do spot a case that's particularly frustrating, you are welcome to file a DMCA request (http://www.google.com/dmca.html) to claim ownership of the content and request removal of the other site from Google's index.

Thats it so make of it what you will - people here are trying to make something black and white when it is much more complex.

jont
22-Oct-2008, 08:14 PM
people here are trying to make something black and white when it is much more complex.

indeed ... there are a million shades of grey between depending on individual circumstances. As ever the only solution is to change one things and sit back and watch what happens - which is essentially what SEO companies do over many many sites to get a benchmark on what is doing what.

Jarvis
22-Oct-2008, 09:58 PM
Thanks for the 'Google' perspective on this Mal.

Interesting phrasing used by them at the start of the article which implies that there is human intervention in the process.

Again, towards the start of the article, it does rather bare out what I am suggesting, that 'honest' duplication is recognised and not penalised.

Regarding what you say about people trying to make something black and white when it is much more complex, I think that it would be more accurate to say that we are attempting to understand one element of what is perceived to be a complex issue.

With regard to SEO businesses making capital out of 'confusion and panic', personally I view most, (not all), SEO 'experts' in the same way as I view Astrologers, Palmists and Faith Healers; each believes in what they do but base their trade on misinformation and myth.

However, I am not sure that we touch too deeply into SEO, when trying to understand, and thus avoid the potential pitfalls of, the specific use of duplicated products in Actinic.

One thing that is apparent is that there is divided opinion and a lack of clear understanding on this specific topic.

Nonetheless, it is not impossible to measure it and it should not difficult to set up a robust experiment in order to gain some decent data to work from. I can feel the academic in me rising to this one. I'll have a think on how it could be approached.

Mike Hughes
23-Oct-2008, 09:17 AM
Nonetheless, it is not impossible to measure it and it should not difficult to set up a robust experiment in order to gain some decent data to work from. I can feel the academic in me rising to this one. I'll have a think on how it could be approached.

Waste of time but go ahead if you want to try it.

Mike