CMS MADE SIMPLE FORGE

SEOTools2

 

[#8403] Formatting rules

avatar
Created By: Nicholas Wittering (swarfega)
Date Submitted: 2012-09-15 12:03

Assigned To: Prue Rowland (psy)
Resolution: Fixed
State: Open
Summary:
Formatting rules
Detailed Description:
I was checking several robots.txt generators and various search engine webmaster
tools sites and all seem to state that allow/disallow entries should start with
a / char. In other words /cms/modules/ instead of
http://domain.com/cms/modules/.

I would be interested in your opinion on this.

Also could we have the ability to select which robots are allowed or disallowed
such as googlebot, bingbot, baiduspider, etc.

Also I think the robots.txt file should be put into root folder and if you have
installed cmsms into a foldier such as http://domain.com/cms then the module
should take that into account and install the txt file in root.

History

Comments
avatar
Date: 2012-09-17 06:53
Posted By: Prue Rowland (psy)

To be honest, I took that as done from the original SEOTools and didn't
investigate too much.

From my reading, the robots ignore the 'http://domain.com' bit so it was
superfluous.

Also, there are millions of bots so giving users the choice was in the too hard
basket.

Instead, in svn but not yet released (will be in 1.11.2):

1. Regardless of in which dir your CMSMS is installed, the robots.txt will go
into the root dir.
2. The rules will start with a /, eg /cms/
3. You have the choice of adding your own custom rules before and after the
standard CMSMS robots.txt rules, eg if you have your CMSMS in a sub dir and have
other dirs you want to hide from all bots, you can add them in the 'after'
rules'.  Similarly, if you want to give access permission to particular dirs to
certain bots, you can add those rules in the 'before' text.
4. While digging around also found/fixed a bug in the output of the robots.txt
file which had been there since the original SEOTools

Please note also that since 1.1.1, each page can have a robots meta tag to
index/noindex or follow/nofollow which kinda makes the robots.txt file for CMSMS
pages at least superfluous too.

Hope this all helps and thank you for your feedback and support
psy



      
avatar
Date: 2012-09-21 04:03
Posted By: Nicholas Wittering (swarfega)

Thanks for the explanation.
      
avatar
Date: 2012-11-13 09:47
Posted By: Metamorphose (metamorphose)

Hello

I've tested the new version 1.2 from the SVN. Thanks for the enhancement in
creating robots.txt.
But this relaese creates double slahses for the pathes instead of one

Disallow:  //admin
Disallow: //contrib
Disallow: //doc
.....
Please correct the class.seo2_utils.php from line 483 to 490 and 500 to only
write one slash



      
avatar
Date: 2012-11-13 15:12
Posted By: Prue Rowland (psy)

Thanks metamorphose.  I did not get the same result in my testing. Must be a
server difference.
Have updated SVN to take this into account. Please let me know if the solutions
works for you.
      
avatar
Date: 2012-11-17 05:24
Posted By: Metamorphose (metamorphose)

Hi Prue

Sorry, no double-slash too because the second slash are now replaced from the
string to a variable.
That two slashes is written is logical for me. The first is from the variable
and the second in the following string before the directory name.

This code works for me (and removed the additional empty line after contrib)

fwrite($fp, "Disallow: ".$cms_path. "/".$config['admin_dir']."/\r\n");
    fwrite($fp, "Disallow: ".$cms_path."/contrib/\r\n");
    fwrite($fp, "Disallow: ".$cms_path."/doc/\r\n");
    fwrite($fp, "Disallow: ".$cms_path."/lib/\r\n");
    fwrite($fp, "Disallow: ".$cms_path."/modules/\r\n");
    fwrite($fp, "Disallow: ".$cms_path."/plugins/\r\n");
    fwrite($fp, "Disallow: ".$cms_path."/scripts/\r\n");
    fwrite($fp, "Disallow: ".$cms_path."/tmp/\r\n");



      
avatar
Date: 2012-11-20 13:23
Posted By: Metamorphose (metamorphose)

Hi Prue

Now, Î've tested on multiple installations and servers. With your code I've
always get double-slashes in the robots.txt
Only with my code from last posting I get a correctly formated file with only
one slash before the directory names.
      
Updates

Updated: 2012-11-13 15:12
resolution_id: 6 => 7

Updated: 2012-09-17 06:53
resolution_id: 5 => 6

Updated: 2012-09-15 12:05
description: I was checking several robots.txt generators and various search engine webmaster tools sites and all seem to state that allow/disallow entries should start with a / char. In other words /cms/modules/ instead of http://domain.com/cms/modules/. I would b => I was checking several robots.txt generators and various search engine webmaster tools sites and all seem to state that allow/disallow entries should start with a / char. In other words /cms/modules/ instead of http://domain.com/cms/modules/. I would b
resolution_id: => 5