Introduction:
First of all robots.txt file must, to allow a bot (spiders like google, yahoo, msn etc) to reach your webiste and crawl your: “Damn looking url” or “Ugly looking url!!” or “Nice looking url”, what ever you name, just name it for standard understanding we name it SEO.
ex: www.site.com/robots.txt:
User-agent: *
Disallow: /css/
Disallow: test.php/
Allow: /file/myfile.html
Sitemap: /sitemap/xml.xml
Sitemap: /sitemap/txtmode.txt
Other methods, way of doing:
1. Text based (KISS, keep it simple STUPID!!!) && 10mb && 50,000 lines
a. http://mysite/searchlist1.txt:
http://www.mysite.com/index.php?a=b=c=d=e=f=g=h=j=uglylooking_url
http://www.mysite.com/index.php?a=b=c=d=e=f=g=h=j=damnlooking_url
http://www.mysite.com/hello-world-whats-up
b. http://mysite/searchlist2.txt
http://www.mysite.com/index.php?a=b=c=d=e=f=g=h=j=uglylooking_url
http://www.mysite.com/index.php?a=b=c=d=e=f=g=h=j=damnlooking_url
http://www.mysite.com/hello-world-whats-up
c. http://mystei/robots.txt
....
Sitemap: http://mysite/searchlist1.txt
Sitemap: http://mysite/searchlist2.txt
2. Xml based (Not KISS)
3. Meta tag/title tag (KISS)
You are upset/World is so crud:
Nice looking url all of a certain, became the bible for web technologies, and you don’t have any solution. Wait! try this atleast:
vi /etc/httpd/conf/httpd.conf
Alias /my-nice-urls http://mysite/index.php?a=old=b=nasty=c=ugly
Before:
your bad url was: http://mysite/index.php?a=old=b=nasty=c=ugly
After:
your working url is: http://mysite/my-nice-urls
More reading:
http://en.wikipedia.org/wiki/Sitemaps
http://en.wikipedia.org/wiki/Robots_exclusion_standard
http://www.sitemaps.org/protocol.php