iPower Hosting
 
Home Register FAQ Members List Calendar Search Today's Posts Mark Forums Read Web Directory

Go Back   Webmaster Forum > Dealing with Search Engines (SEO, Ranking & Services) > Search Engine Optimization

Search Engine Optimization General discussions about search engine optimization (SEO), keywords, sandbox and related issues.

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 05-07-2008, 04:23 PM
josephpettry josephpettry is offline
WMG Newcomer

Recent Blog:
 
Join Date: Apr 2008
Posts: 6
iTrader: (0)
josephpettry is on a distinguished road
Default way to prevent a spider from grabbing URLs that you want to

Is there any way to prevent a spider from grabbing URLs that you want to
keep off search engines?
Reply With Quote
Sponsored Links
Register and sign in to hide this ad block

  #2 (permalink)  
Old 05-07-2008, 06:32 PM
Crafterz's Avatar
Crafterz Crafterz is offline
WMG Deputy Sheriff
 
Join Date: Apr 2008
Posts: 121
iTrader: (0)
Crafterz is on a distinguished road
Default

Yeah, robots.txt

Oh, to block all bots from a certain area you do

Quote:
User-agent: *
Disallow: /directory/
That /directory/ could be warez.php or /diretory/warez.php

It can block any file.

If you didn't know. Just creat a file called robots.txt in your mai directory and add this information.
Reply With Quote
  #3 (permalink)  
Old 05-16-2008, 06:26 PM
Hafsoh Hafsoh is offline
WMG Resident Alien
 
Join Date: Jan 2008
Posts: 57
iTrader: (0)
Hafsoh is on a distinguished road
Default

The reputable ones - robots.txt or the nofollow attribute... or put the content behind a password protected area.
Reply With Quote
  #4 (permalink)  
Old 05-18-2008, 04:32 AM
Jacob's Avatar
Jacob Jacob is offline
WMG Deputy Sheriff
 
Join Date: Apr 2008
Posts: 161
iTrader: (0)
Jacob is on a distinguished road
Default

Quote:
Originally Posted by Hafsoh View Post
The reputable ones - robots.txt or the nofollow attribute... or put the content behind a password protected area.
Doesnt work.
I can configure (or used to be able to configure) my webbroser to behave as a googlebot, and I could access any webpage I want: password or no password.
Reply With Quote
  #5 (permalink)  
Old 05-18-2008, 06:19 PM
Crafterz's Avatar
Crafterz Crafterz is offline
WMG Deputy Sheriff
 
Join Date: Apr 2008
Posts: 121
iTrader: (0)
Crafterz is on a distinguished road
Default

Yeah, non-reputable bots wont listen to anything, just ignore them. You can find a list of bots that follow robots.txt, but are bad.
Reply With Quote
  #6 (permalink)  
Old 05-19-2008, 08:21 PM
chrishirst's Avatar
chrishirst chrishirst is offline
WMG Sheriff
 
Join Date: Nov 2007
Posts: 170
iTrader: (0)
chrishirst is on a distinguished road
Default

Quote:
Originally Posted by Jacob View Post
Doesnt work.
I can configure (or used to be able to configure) my webbroser to behave as a googlebot, and I could access any webpage I want: password or no password.
Browsers DON'T read the robots.txt!!!

the idea of the robots.txt protocol is not that it blocks useragents ad-hoc, but the useragents that ARE bots can read the text file for the directives that pertain to them specifically, and act on those directives.

If the text files specifies that a certain files or folders should not be indexed by any bots
Code:
Useragent: *
Disallow: /folder/
or by a specific bot;
Code:
Useragent: Bot_UA
Disallow: /
the robots that honour the protocol should NOT index the files contained in the folder.
__________________
And how can we win, when fools can be kings
Reply With Quote
  #7 (permalink)  
Old 05-21-2008, 12:52 AM
Crafterz's Avatar
Crafterz Crafterz is offline
WMG Deputy Sheriff
 
Join Date: Apr 2008
Posts: 121
iTrader: (0)
Crafterz is on a distinguished road
Default

Well chrishirst said it straight out. you should make that a sticky as a guide.
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

All times are GMT. The time now is 07:08 PM.



Freelance Web Designers
Work At Home Forum
Ad Marketplace
Online Deals and Bargains
Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.0.0