PDA

View Full Version : Google Sitemap errors


AndyBorrett
25-Feb-2009, 03:17 PM
I have made some drastic SEO changes to our site that have resulted in us getting lots more visits but lots and lots of errors withing the Webmaster Tools of Goggle.

I have resubmitted the sitemap created using the Moleend Product Mash

There are 85 errors for URLs in sitemap.
The vast majority point to renamed pages and are linked from what appears to be a search result of some kind. Here is an example

http://www.seriouslysilver.co.uk/cgi-bin/ss000001.pl?PR=1&SS=bangle&SX=0&TB=A

Is there a file somewhere that I need to 'flush' to get rid of these old pagenames?

I also get 233 not found errors. These all seem to point at a webpage I deleted sometime ago from the days we were in a linkexchange thing.

http://www.seriouslysilver.co.uk/Links.php


Same question as above really - how can I clear these errors.

leehack
25-Feb-2009, 03:24 PM
Ban the search engines from the cgi-bin using robots.txt UNLESS you are using cgi-bin navigation (a la smart theme etc.). That way Google will not spider them and your sitemap creation tool should not list them in the first place.

AndyBorrett
25-Feb-2009, 03:35 PM
I don't use CGI for navigation apart from things like the marketing lists.

I have checked the output of the Moleend Product Mash and there is no mention of anything apart from the sections.

If I could find the file that is/was holding this out of date information I could delete it and let Actinic recreate it then setup Robots.txt to keep Google out of the bin ;)

meden
25-Feb-2009, 03:42 PM
What you may find is that the file no longer exists and Google is referring to a link in a page it indexed a while ago - Possible fix is to submit a removal request via Google dashboard (if you've confirmed that the source page is no longer there)

AndyBorrett
25-Feb-2009, 03:54 PM
A couple of months after I removed the links Page I discovered that the actual link to it was just commented out and read on the forum that Google would still read this as a link so I deleted it completely.

I did however forget that Actinic seems to leave deleted pages on the server so I manually deleted all the HTML pages then refreshed -this was a couple of days ago.

I will have a go at requesting a removal of the links page from Google but don't think that will work for the CGI pages as they are not real pages AFAIK

Dorian
28-Jul-2009, 11:50 AM
Hi Lee,

I was thinking about using a robot txt file:

User-agent: *
Allow: /

.. and I'd like to use it to stop Google coming up with some errors in it's indexing (according to Webmaters Tools, like Andy was getting).

How do I tell if I'm using cgi-bin navigation?

I'd like it to prevent Google seeing duplicate descriptions such as these:

/acatalog/The_Original_Green_Log_Maker.html

&

/cgi-bin/ss000001.pl?PRODREF=28&NOLOGIN=1

Currently it saying they are the same thing. Is using the robot txt file to stop Google indexing these the way to go?

Dorian.

pinbrook
28-Jul-2009, 12:34 PM
How do I tell if I'm using cgi-bin navigation?hover over your links and you will see if the left hand nav uses cgi links or not.

even if the nav is not cgi you may still encounter issues if you use bestseller lists as these use cgi.

Dorian
29-Jul-2009, 07:50 AM
Thanks Jo, sorry to be ignorant - but I've hovered over - what then am I looking for? I currently see the alt text of the link I'm hovering over and yes, I have new products and bests ellers on the home page.

pinbrook
29-Jul-2009, 07:58 AM
make sure your browser shows the status bar, then hover over a link - the url will then be visible (bottom left of browser - i'm using firefox)

it the site is mustbegreen site - then your left nav is not cgi - its in the format domain/acatalog/page.html. However your special offers hover over august special offers are, ie domain/cgi-bin/ss000000 etc

Dorian
29-Jul-2009, 08:53 AM
O.K. I see that now - yes my section links are normal URL's but my New Products and Special Offers have CGI in the title.

So - if the CGI is NOT used for the main navigation, should I add it into a robot txt file to prevent Google seing it as duplicated descriptions as it currently is? If so - how do I do this?

animal dreams
29-Jul-2009, 03:18 PM
Try googling robots.txt cgi-bin :rolleyes:

Dorian
29-Jul-2009, 03:33 PM
Will do - thanks for the tip Alan.
O.K. I'll add a robot with Disallow: /cgi-bin/ in it and then see if this fixes it.
Are there any down sides to doing this?

Dorian
30-Jul-2009, 09:37 AM
O.K. - this morning I checked Webmaster Tools and my Sitemap has a great big red X against it. I can only oresume because I have added the robot file yesterday? I'll remove the Disallow: /cgi-bin/ and see if my sitemap comes back.

cobbler
30-Jul-2009, 12:31 PM
Dorian, Have you tried testing your robots.txt file in google's webmaster tools?

(Go to Webmaster Tools | Site config. | Crawler Access | open the Test Robots.txt Tab )

This will tell you if you have a problem with it without the need to remove it and wait for your sitemap to be resubmitted.

Dorian
30-Jul-2009, 12:56 PM
Hi, yes I did test the origonal robot and got a 200 code back, which I believe means all's o.k. However, I hadn't tried the CGI one - so I will do that now.

Dorian
31-Jul-2009, 07:34 AM
I've tested this one:

User-agent: *
Disallow: /cgi-bin/

..and I got back:

Allowed
Detected as a directory; specific files may have different restrictions.

I also wonder why Google doesn't allow the sitemap found here: acatalog/sitemap.html. It finds it regulalry, but again givs it a big red cross each time.

cobbler
31-Jul-2009, 08:32 AM
If you click on the name of the sitemap in webmaster tools it should display "errors and warnings". This should then explain the reasons behind the red cross.

pinbrook
31-Jul-2009, 09:35 AM
I also wonder why Google doesn't allow the sitemap found here: acatalog/sitemap.html. It finds it regulalry, but again givs it a big red cross each time.actinic creates a sitemap and places it in acatalog ie sitemap.html this is not the file that google wants. The google sitemap is a different format (xml) and in the root of the site

Dorian
31-Jul-2009, 03:27 PM
Hi, I get: This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit.
However - it does not give me the ability to delete any of the 3 sitemaps (all failed) up there - not sure why not?

cobbler
31-Jul-2009, 03:50 PM
As Pinbrook said, you only need one sitemap submitted to Google: It should have a file extention of .xml.
There are two buttons on the sitemap page of webmaster tools: "delete" or "resubmit" You select a sitemap with a check box and then delete it.
You can remove or exclude a duplicate url using your sitemap generation tool then resubmit it.
I can recommend a read through the help files on the same page in google:
Understanding Sitemaps, Creating Sitemaps, Sitemap Errors etc. They are pretty good, if not exactly bedtime reading!

Dorian
31-Jul-2009, 03:57 PM
Thanks - I have read these many times - and I can not see a delete button - I've been trying to find that for ages and have even contacted support about it.

http://www.itmustbegreen.co.uk/acatalog/sitemaps.JPG

When it first asked for a site map I tried the one on the website (not knowing that these weren't acceptable to Google). It didn't work so I tried creating one using the Google tool (which again didn't work). I then made one using a third party bit of software which I got from the webmaster tools FAQ's which did work until a few days ago.

So, I now have 3 up there not working.

pinbrook
31-Jul-2009, 05:47 PM
delete them using ftp

Dorian
01-Aug-2009, 08:56 AM
Hi Jo - they're already deleted via FTP (or rahter not showing up there) - the only one showing in my FTP is the .xml sitemap that was working until a couple of days ago.

Brightstar
10-May-2011, 01:14 PM
Hi Everyone,

Have found this thread and read with interest. Have implemented the suggestion about disallowing the Google bot into the cgi-bin with :

User-agent: *
Disallow: /cgi-bin/

However, forgive me if I have not got this correct, the sitemap still has references to the cgi-bin in it and possibly has duplicate url's :

www.acaciamasonic.com/sitemap.xml

I use an online xml sitemap generator. Is there a way of creating a sitemap to exclude the path to the cgi-bin, and not to duplicate url's?