| Will You Add? |
Hubs | Hubbers | Topics | Request |
| #1 in Business | Subscribe Email Print |
|
You are here: Home > Internet and Businesses Online > SEO > The Robots Text File Or How To Get Your Site Properly Spidered, Crawled, Indexed By Bots |
|
Will You Add? - The Robots Text File Or How To Get Your Site Properly Spidered, Crawled, Indexed By Bots
When Managers Play the PR Card a comment on the same line with a command, I recommend that you start every command and every comment on a new line, this way, robots will never be confused by a potential formatting glitch. Examples:The payoff for business, non-profit or association managers can be a real assist towards meeting their department, division or subsidiary objectives.Playing that public relations card means they’ve decided to pursue their objectives by reaching, persuading and moving those outside audiences whose behaviors most affect their organizations, to actions those managers desire.Here’s a blueprint to help them do just that: people act on their own perception of the facts before them, which leads to predictable behaviors about which something can be done. When we create, change or reinforce that opinion by reaching, persuading and moving-to-desired- action the very people whose behaviors affect the organization the most, the public relations mission is accomplished.In other words, here is the PR blueprint and tools you need to persuade your most important external stakeholders to your way of thinking. And then move them to take actions that lead to your success.First step? Shift the attention of the PR team assigned to your unit away from communications tactics and over to a more effective action plan like the one outlined above.You’ll know it’s worth the effort when you begin to see stakeholder behaviors like strong increases in inquiries, more repeat purchases, new proposals for strategic alliances or joint ventures, a fresh round of employment inquiries, or stronger contribution levels.Lay it all out for the PR people who work for your unit, especially why it’s a must to list in priority order those key outside audiences whose behaviors impact your operation the most. Talk about the importance of discovering how your organization is perceived by those audience members. Particularly because such perceptions almost always result in predictable This is correct, as per the standard, but not recommended (a newer robot or a badly written one might read the following as "disallow the # We... Directory", not complying to the "disallow all" command): User-Agent: * Disallow: / # We decided to stop all robots but we were very silly in typing a long comment which got truncated and made the robots.txt unusable The way I recommend that you format this is: # We decided to stop all robots and we made sure Although theoretically, each robot should comply to the standards introduced around 1994 and enhanced in 1996, each robot acts a little differently. You are advised to check the documentation provided by the owners of those robots, you'll be surprised to discover a world of useful facts and techniques. For instance, from Google's site we learn that Googlebot completely disregards any URL that contains "&id=". Here are some sites to check: Google: http://www.google.com/bot.html Yahoo: http://help.yahoo.com/help/us/ysearch/slurp/ MSN: http://search.msn.com/docs/siteowner.aspx A database of robots is maintained at http://www.robotstxt.org/wc/active/html/contact.html A robots.txt validation tool - invaluable in finding potential typos that can completely change the way search engines see your site, can be found at: http://searchengineworld.com/cgi-bin/robotcheck.cgi There are also some extensions to the standard. For example, some robots allow wildcards in the Disallow line, some even allow different commands. My advice is: don't bother with anything outside the standard and you will not be unpleasantly surprised. A final word of caution: In this article I showed you how things should work in a perfect world. Somewhere along this article I mentioned that there are good bots and bad bots. Let's stop for a moment and think from a deranged person's perspective. Is there anything to prevent one from writing a robot program that reads a robots.txt file and specifically look at pages that you marked as "disallowed"? The answer is absolutely not, this entire standard is based on the honor system and is based on the concept that everyone should work hard to make the internet a better place. Basically, do not rely on this for real security or privacy. Use passwords when necessary. In conclusion, do not forget that indexing robots are your best friends. While you shouldn't build your site for robots, but for your human visitors, do not underestimate the power of those mindless crawlers - make su Car Wash Fundraiser Committee Strategies and Agendas for Meetings So you heard about someone stressing the importance of the robots.txt file, or noticed in your website's logs that the robots.txt file is causing an error, or somehow it is on the very top of the top visited pages, or, you read some article about the death of the robots.txt file and about how you should not bother with it ever again. Or maybe you never heard of the robots.txt file but are intrigued by all that talk about spiders, robots and crawlers. In this article, I will hopefully make some sense out of all of the above.When setting up a carwash fundraiser with your nonprofit group it is important to set up a committee or executive group which will handle all the various components of the carwash. Although carwash fundraisers seem simple there is a lot to them in the organization phase.For instance there will be issues with staffing the carwash with volunteers, collecting all the supplies, choosing and getting permission to use a location and advanced advertising, signage and flyers.This is why it is important in advance to have agendas for the meetings and expect to have two or three meetings in the organization phase and one last meeting before the carwash fundraiser to make sure everything has been done. If everyone gets to a carwash fundraiser and there are not enough towels it is pretty hard to wash the cars.Likewise if not enough students or kids show up to help then the lines get too long and customers will see long lines and simply drive away and therefore you will lose money that you could have made for your nonprofit group and those are funds that you definitely need.Without proper advanced advertising no one will know you are having a carwash and if you rely solely on those people driving by it may not be enough cars to wash to keep you busy. Please consider all this in 2006. There are many folks out there who vehemently insist on the uselessness of the robots.txt file, proclaiming it obsolete, a thing of the past, plain dead. I disagree. The robots.txt file is probably not in the top ten methods to promote your get-rich-fast affiliate website in 24 hours or less, but still plays a major role in the long run. First of all, the robots.txt file is still a very important factor in promoting and maintaining a site, and I will show you why. Second, the robots.txt file is one of the simple means by which you can protect your privacy and/or intellectual property. I will show you how. Let's try to figure out some of the lingo. What is this robots.txt file? The robots.txt file is just a very plain text file (or an ASCII file, as some like to say), with a very simple set of instructions that we give to a web robot, so the robot knows which pages we need scanned (or crawled, or spidered, or indexed - all terms refer to the same thing in this context) and which pages we would like to keep out of search engines. What is a www robot? A robot is a computer program that automatically reads web pages and goes through every link that it finds. The purpose of robots is to gather information. Some of the most famous robots mentioned in this article work for the search engines, indexing all the information available on the web. The first robot was developed by MIT and launched in 1993. It was named the World Wide Web Wander and its initial purpose was of a purely scientific nature, its mission was to measure the growth of the web. The index generated from the experiment's results proved to be an awesome tool and effectively became the first search engine. Most of the stuff we consider today to be indispensable online tools was born as a side effect of some scientific experiment. What is a search engine? Generically, a search engine is a program that searches through a database. In the popular sense, as referred to the web, a search engine is considered to be a system that has a user search form, which can search through a repository of web pages gathered by a robot. What are spiders and crawlers? Spiders and crawlers are robots, only the names sound cooler in the press and within metro-geek circles. What are the most popular robots? Is there a list? Some of the most well known robots are Google's Googlebot, MSN's MSNBot, Ask Jeeves's Teoma, Yahoo!'s Slurp (funny). One of the most popular places to search for active robot info is the list maintained at http://www.robots.org. Why do I need this robots.txt file anyway? A great reason to use a robots.txt file is actually the fact that many search engines, including Google, post suggestions for the public to make use of this tool. Why is it such a big deal that Google teaches people about the robots.txt? Well, because nowadays, search engines are not a playground for scientists and geeks anymore, but large corporate enterprises. Google is one of the most secretive search engines out there. Very little is known to the public about how it operates, how it indexes, how it searches, how it creates its rankings, etc. In fact, if you do a careful search in specialized forums, or wherever else these issues are discussed, nobody really agrees on whether Google puts more emphasis on this or that element to create its rankings. And when people don't agree on things as precise as a ranking algorithm, it means two things: that Google constantly changes its methods, and that it does not make it very clear or very public. There's only one thing that I believe to be crystal clear. If they recommend that you use a robots.txt ("Make use of the robots.txt file on your web server" - Google Technical Guidelines), then do it. It might not help your ranking, but it will definitely not hurt you. There are other reasons to use the robots.txt file. If you use your error logs to tweak and keep your site free of errors, you will notice that most errors refer to someone or something not finding the robots.txt file. All you have to do is create a basic blank page (use Notepad in Windows, or the most simple text editor in Linux or on a Mac), name it robots.txt and upload it to the root of your server (that's where your home page is). On a different note, nowadays, all search engines look for the robots.txt file as soon as their robots arrive on your site. There are unconfirmed rumors that some robots might even 'get annoyed' and leave, if they don't find it. Not sure how true that is, but hey, why not be on the safe side? Again, even if you don't intend to block anything or just don't want to bother with this stuff at all, having a blank robots.txt is still a good idea, as it can actually act as an invitation into your site. Don't I want my site indexed? Why stop robots? Some robots are well designed, professionally operated, cause no harm and provide valuable service to mankind (don't we all like to "google"). Some robots are written by amateurs (remember, a robot is just a program). Poorly written robots can cause network overload, security problems, etc. The bottom line here is that robots are devised and operated by humans and are prone to the human error factor. Consequently, robots are not inherently bad, nor inherently brilliant, and need careful attention. This is another case where the robots.txt file comes in handy - robot control. Now, I'm sure your main goal in life, as a webmaster or site owner is to get on the first page of Google. Then, why in the world would you want to block robots? Here are some scenarios: 1. Unfinished site You are still building your site, or portions of it, and don't want unfinished pages to appear in search engines. It is said that some search engines even penalize sites with pages that have been "under construction" for a long time. 2. Security Always block your cgi-bin directory from robots. In most cases, cgi-bin contains applications, configuration files for those application (that might actually have sensitive information), etc. Even if you don't currently use any CGI scripts or programs, block it anyway, better safe than sorry. 3. Privacy You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc. 4. Doorway pages Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages. 5. Bad bot, bad bot, what’cha gonna do... You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world. 6. Your site gets overwhelmed In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you'll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you'll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot's IP or name, read the list of active robots and try to identify and block it. What's in a robots.txt file anyway? There are only two lines for each entry in a robots.txt file, the User-Agent, which has the name of the robot you want to give orders or the '*' wildcard symbol meaning 'all', and the Disallow line, which tells a robot all the places it should not touch. The two line entry can be repeated for every file or directory you don't want indexed, or for each robot you want to exclude. If you leave the Disallow line empty, this means you are not disallowing anything, in other words, you are allowing the particular robot to index your entire site. Some examples and a few scenarios should make it clear: A. Exclude a file from Google's main robot (Googlebot): User-Agent: Googlebot B. Exclude a section of the site from all robots: User-Agent: * Note that the directory is enclosed between two forward slashes. Although you are probably used to see URLs, links and folder references that do not end with a slash, note that a web server always needs a slash at the end. Even when you see links on websites that do not end with a slash, when that link is clicked, the web server has to do and extra step before serving the page, which is adding the slash through what we call a redirect. Always use the ending slash. C. Allow everything (blank robots.txt): User-Agent: * Note that when a "blank robots.txt" is mentioned, it is not a completely blank file, but it contains the two lines above. D. Do not allow any robot on your site: User-Agent: * Note that the single forward slash means "root", which is the main entrance to your site. E. Do not allow Google to index any of your images (Google uses Googlebot-Image for images): User-Agent: Googlebot-Image F. Do not allow Google to index some of your images: User-Agent: Googlebot-Image Note the use of multiple disallows. This is allowed, no pun intended. G. Build a doorway for Google and Lycos (the Lycos robot is called T-Rex) - do not play with this unless you are 100% sure you know what you are doing: User-Agent: T-Rex H. Allow only Googlebot.. User-Agent: Googlebot Note that the commands are sequential. The example above reads in English: Let Googlebot through, then stop everyone else. If your file gets really large, or you just feel like writing notes for yourself or for potential viewers (remember, robots.txt is a public file, anyone can see it), you can do so by preceding your comment with a # sign. Although according to the standard, you can have a comment on the same line with a command, I recommend that you start every command and every comment on a new line, this way, robots will never be confused by a potential formatting glitch. Examples: This is correct, as per the standard, but not recommended (a newer robot or a badly written one might read the following as "disallow the # We... Directory", not complying to the "disallow all" command): User-Agent: * Disallow: / # We decided to stop all robots but we were very silly in typing a long comment which got truncated and made the robots.txt unusable The way I recommend that you format this is: # We decided to stop all robots and we made sure Although theoretically, each robot should comply to the standards introduced around 1994 and enhanced in 1996, each robot acts a little differently. You are advised to check the documentation provided by the owners of those robots, you'll be surprised to discover a world of useful facts and techniques. For instance, from Google's site we learn that Googlebot completely disregards any URL that contains "&id=". Here are some sites to check: Google: http://www.google.com/bot.html Yahoo: http://help.yahoo.com/help/us/ysearch/slurp/ MSN: http://search.msn.com/docs/siteowner.aspx A database of robots is maintained at http://www.robotstxt.org/wc/active/html/contact.html A robots.txt validation tool - invaluable in finding potential typos that can completely change the way search engines see your site, can be found at: http://searchengineworld.com/cgi-bin/robotcheck.cgi There are also some extensions to the standard. For example, some robots allow wildcards in the Disallow line, some even allow different commands. My advice is: don't bother with anything outside the standard and you will not be unpleasantly surprised. A final word of caution: In this article I showed you how things should work in a perfect world. Somewhere along this article I mentioned that there are good bots and bad bots. Let's stop for a moment and think from a deranged person's perspective. Is there anything to prevent one from writing a robot program that reads a robots.txt file and specifically look at pages that you marked as "disallowed"? The answer is absolutely not, this entire standard is based on the honor system and is based on the concept that everyone should work hard to make the internet a better place. Basically, do not rely on this for real security or privacy. Use passwords when necessary. In conclusion, do not forget that indexing robots are your best friends. While you shouldn't build your site for robots, but for your human visitors, do not underestimate the power of those mindless crawlers - make su Managing Rebellious Employees /p>Surveys of executives reveal that many companies fall short of their profit objectives due to “people problems.” Research for my Absolutely Fabulous Organizational Change book found these “people problems” fall into two “r” categories: rebellion and resistance.Rebellion is akin to teenagers defying authority figures, fir instance, rebelling against leaders who institute change. Resistance includes employees flinging roadblocks in the way of the organizational change. Examples include employees slowing down their work pace, badmouthing the change behind leaders’ backs, making spiteful comments about the leaders, and slashing productivity.Feels Like a Lover or Spouse Just Walked Out on YouOne of my prouder moments in the media spotlight occurred when I appeared on business television shows -- and also was quoted in national magazines -- concerning employees’ emotional reactions to organizational change. I had just delivered a speech on the topic at a national convention. At the press conference after my speech, reporters snapped to attention and later quoted me when I said the following: "The major emotional reaction of employees during organizational change is that they feel like their spouse or lover just walked out on them!”Why did my statement attract media attention? Because I summarized the emotionally charged sting of betrayal everyone has felt for various reasons. Employees showing difficulty handling change often feel betrayed. They get used to everything at work being done in a certain way. But all of a sudden, if a company (or spouse or lover) changes how it acts, the person feels a huge sense of loss, distrust, and betrayal.7 Methods to Handle Resistant EmployeesMy research on executives who lead highly profitable organizational change uncov What are the most popular robots? Is there a list? Some of the most well known robots are Google's Googlebot, MSN's MSNBot, Ask Jeeves's Teoma, Yahoo!'s Slurp (funny). One of the most popular places to search for active robot info is the list maintained at http://www.robots.org. Why do I need this robots.txt file anyway? A great reason to use a robots.txt file is actually the fact that many search engines, including Google, post suggestions for the public to make use of this tool. Why is it such a big deal that Google teaches people about the robots.txt? Well, because nowadays, search engines are not a playground for scientists and geeks anymore, but large corporate enterprises. Google is one of the most secretive search engines out there. Very little is known to the public about how it operates, how it indexes, how it searches, how it creates its rankings, etc. In fact, if you do a careful search in specialized forums, or wherever else these issues are discussed, nobody really agrees on whether Google puts more emphasis on this or that element to create its rankings. And when people don't agree on things as precise as a ranking algorithm, it means two things: that Google constantly changes its methods, and that it does not make it very clear or very public. There's only one thing that I believe to be crystal clear. If they recommend that you use a robots.txt ("Make use of the robots.txt file on your web server" - Google Technical Guidelines), then do it. It might not help your ranking, but it will definitely not hurt you. There are other reasons to use the robots.txt file. If you use your error logs to tweak and keep your site free of errors, you will notice that most errors refer to someone or something not finding the robots.txt file. All you have to do is create a basic blank page (use Notepad in Windows, or the most simple text editor in Linux or on a Mac), name it robots.txt and upload it to the root of your server (that's where your home page is). On a different note, nowadays, all search engines look for the robots.txt file as soon as their robots arrive on your site. There are unconfirmed rumors that some robots might even 'get annoyed' and leave, if they don't find it. Not sure how true that is, but hey, why not be on the safe side? Again, even if you don't intend to block anything or just don't want to bother with this stuff at all, having a blank robots.txt is still a good idea, as it can actually act as an invitation into your site. Don't I want my site indexed? Why stop robots? Some robots are well designed, professionally operated, cause no harm and provide valuable service to mankind (don't we all like to "google"). Some robots are written by amateurs (remember, a robot is just a program). Poorly written robots can cause network overload, security problems, etc. The bottom line here is that robots are devised and operated by humans and are prone to the human error factor. Consequently, robots are not inherently bad, nor inherently brilliant, and need careful attention. This is another case where the robots.txt file comes in handy - robot control. Now, I'm sure your main goal in life, as a webmaster or site owner is to get on the first page of Google. Then, why in the world would you want to block robots? Here are some scenarios: 1. Unfinished site You are still building your site, or portions of it, and don't want unfinished pages to appear in search engines. It is said that some search engines even penalize sites with pages that have been "under construction" for a long time. 2. Security Always block your cgi-bin directory from robots. In most cases, cgi-bin contains applications, configuration files for those application (that might actually have sensitive information), etc. Even if you don't currently use any CGI scripts or programs, block it anyway, better safe than sorry. 3. Privacy You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc. 4. Doorway pages Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages. 5. Bad bot, bad bot, what’cha gonna do... You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world. 6. Your site gets overwhelmed In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you'll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you'll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot's IP or name, read the list of active robots and try to identify and block it. What's in a robots.txt file anyway? There are only two lines for each entry in a robots.txt file, the User-Agent, which has the name of the robot you want to give orders or the '*' wildcard symbol meaning 'all', and the Disallow line, which tells a robot all the places it should not touch. The two line entry can be repeated for every file or directory you don't want indexed, or for each robot you want to exclude. If you leave the Disallow line empty, this means you are not disallowing anything, in other words, you are allowing the particular robot to index your entire site. Some examples and a few scenarios should make it clear: A. Exclude a file from Google's main robot (Googlebot): User-Agent: Googlebot B. Exclude a section of the site from all robots: User-Agent: * Note that the directory is enclosed between two forward slashes. Although you are probably used to see URLs, links and folder references that do not end with a slash, note that a web server always needs a slash at the end. Even when you see links on websites that do not end with a slash, when that link is clicked, the web server has to do and extra step before serving the page, which is adding the slash through what we call a redirect. Always use the ending slash. C. Allow everything (blank robots.txt): User-Agent: * Note that when a "blank robots.txt" is mentioned, it is not a completely blank file, but it contains the two lines above. D. Do not allow any robot on your site: User-Agent: * Note that the single forward slash means "root", which is the main entrance to your site. E. Do not allow Google to index any of your images (Google uses Googlebot-Image for images): User-Agent: Googlebot-Image F. Do not allow Google to index some of your images: User-Agent: Googlebot-Image Note the use of multiple disallows. This is allowed, no pun intended. G. Build a doorway for Google and Lycos (the Lycos robot is called T-Rex) - do not play with this unless you are 100% sure you know what you are doing: User-Agent: T-Rex H. Allow only Googlebot.. User-Agent: Googlebot Note that the commands are sequential. The example above reads in English: Let Googlebot through, then stop everyone else. If your file gets really large, or you just feel like writing notes for yourself or for potential viewers (remember, robots.txt is a public file, anyone can see it), you can do so by preceding your comment with a # sign. Although according to the standard, you can have a comment on the same line with a command, I recommend that you start every command and every comment on a new line, this way, robots will never be confused by a potential formatting glitch. Examples: This is correct, as per the standard, but not recommended (a newer robot or a badly written one might read the following as "disallow the # We... Directory", not complying to the "disallow all" command): User-Agent: * Disallow: / # We decided to stop all robots but we were very silly in typing a long comment which got truncated and made the robots.txt unusable The way I recommend that you format this is: # We decided to stop all robots and we made sure Although theoretically, each robot should comply to the standards introduced around 1994 and enhanced in 1996, each robot acts a little differently. You are advised to check the documentation provided by the owners of those robots, you'll be surprised to discover a world of useful facts and techniques. For instance, from Google's site we learn that Googlebot completely disregards any URL that contains "&id=". Here are some sites to check: Google: http://www.google.com/bot.html Yahoo: http://help.yahoo.com/help/us/ysearch/slurp/ MSN: http://search.msn.com/docs/siteowner.aspx A database of robots is maintained at http://www.robotstxt.org/wc/active/html/contact.html A robots.txt validation tool - invaluable in finding potential typos that can completely change the way search engines see your site, can be found at: http://searchengineworld.com/cgi-bin/robotcheck.cgi There are also some extensions to the standard. For example, some robots allow wildcards in the Disallow line, some even allow different commands. My advice is: don't bother with anything outside the standard and you will not be unpleasantly surprised. A final word of caution: In this article I showed you how things should work in a perfect world. Somewhere along this article I mentioned that there are good bots and bad bots. Let's stop for a moment and think from a deranged person's perspective. Is there anything to prevent one from writing a robot program that reads a robots.txt file and specifically look at pages that you marked as "disallowed"? The answer is absolutely not, this entire standard is based on the honor system and is based on the concept that everyone should work hard to make the internet a better place. Basically, do not rely on this for real security or privacy. Use passwords when necessary. In conclusion, do not forget that indexing robots are your best friends. While you shouldn't build your site for robots, but for your human visitors, do not underestimate the power of those mindless crawlers - make su Giving Employee Performance A Boost overload, security problems, etc. The bottom line here is that robots are devised and operated by humans and are prone to the human error factor. Consequently, robots are not inherently bad, nor inherently brilliant, and need careful attention. This is another case where the robots.txt file comes in handy - robot control.We have all experienced being singled out because of a mistake or a misdeed many times throughout lives. But rarely do we get noticed for doing something good. Even if we're all grown up and working, this trend is still widely experienced. In fact, this is a common resentment in the corporate world. Sure, every employee undergoes employee training, but it is inevitable that most still commit mistakes. Sadly, when evaluation time comes, all the good work done are almost always overshadowed by poor employee performance.Employee rights dictate that there should be provisions for coaching or training employees. However, this is an additional expense for the employer and another dent in the company's finances. This is not a problem for big multinationals, but for the average company, this is a big issue. The common stance on coaching is that it's only necessary for poor performing employees. However, Mary Massad, a human resources expert, begs to differ. According to Massad, coaching and training are essential for every member of the company or organization. It is observed that singling out an employee for mistakes leads to even poorer performance because of sagging morale. From the lowest-paid to the highest-paid employee, each must undergo training to boost the company's morale and performance.Massad asserts that training employees doesn't have to be budget and time-consuming enterprise. Training is simply a means to get employees back on track. After all, they are qualified for their jobs. She asserts that setting examples and giving incentives are great for pushing employees subtly. For example, lateness is proven to be cause of low productivity. Improving employee scheduling by making shifts is a solution. An employer, setting an example of coming in early, will inspire his employees to Now, I'm sure your main goal in life, as a webmaster or site owner is to get on the first page of Google. Then, why in the world would you want to block robots? Here are some scenarios: 1. Unfinished site You are still building your site, or portions of it, and don't want unfinished pages to appear in search engines. It is said that some search engines even penalize sites with pages that have been "under construction" for a long time. 2. Security Always block your cgi-bin directory from robots. In most cases, cgi-bin contains applications, configuration files for those application (that might actually have sensitive information), etc. Even if you don't currently use any CGI scripts or programs, block it anyway, better safe than sorry. 3. Privacy You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc. 4. Doorway pages Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages. 5. Bad bot, bad bot, what’cha gonna do... You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world. 6. Your site gets overwhelmed In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you'll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you'll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot's IP or name, read the list of active robots and try to identify and block it. What's in a robots.txt file anyway? There are only two lines for each entry in a robots.txt file, the User-Agent, which has the name of the robot you want to give orders or the '*' wildcard symbol meaning 'all', and the Disallow line, which tells a robot all the places it should not touch. The two line entry can be repeated for every file or directory you don't want indexed, or for each robot you want to exclude. If you leave the Disallow line empty, this means you are not disallowing anything, in other words, you are allowing the particular robot to index your entire site. Some examples and a few scenarios should make it clear: A. Exclude a file from Google's main robot (Googlebot): User-Agent: Googlebot B. Exclude a section of the site from all robots: User-Agent: * Note that the directory is enclosed between two forward slashes. Although you are probably used to see URLs, links and folder references that do not end with a slash, note that a web server always needs a slash at the end. Even when you see links on websites that do not end with a slash, when that link is clicked, the web server has to do and extra step before serving the page, which is adding the slash through what we call a redirect. Always use the ending slash. C. Allow everything (blank robots.txt): User-Agent: * Note that when a "blank robots.txt" is mentioned, it is not a completely blank file, but it contains the two lines above. D. Do not allow any robot on your site: User-Agent: * Note that the single forward slash means "root", which is the main entrance to your site. E. Do not allow Google to index any of your images (Google uses Googlebot-Image for images): User-Agent: Googlebot-Image F. Do not allow Google to index some of your images: User-Agent: Googlebot-Image Note the use of multiple disallows. This is allowed, no pun intended. G. Build a doorway for Google and Lycos (the Lycos robot is called T-Rex) - do not play with this unless you are 100% sure you know what you are doing: User-Agent: T-Rex H. Allow only Googlebot.. User-Agent: Googlebot Note that the commands are sequential. The example above reads in English: Let Googlebot through, then stop everyone else. If your file gets really large, or you just feel like writing notes for yourself or for potential viewers (remember, robots.txt is a public file, anyone can see it), you can do so by preceding your comment with a # sign. Although according to the standard, you can have a comment on the same line with a command, I recommend that you start every command and every comment on a new line, this way, robots will never be confused by a potential formatting glitch. Examples: This is correct, as per the standard, but not recommended (a newer robot or a badly written one might read the following as "disallow the # We... Directory", not complying to the "disallow all" command): User-Agent: * Disallow: / # We decided to stop all robots but we were very silly in typing a long comment which got truncated and made the robots.txt unusable The way I recommend that you format this is: # We decided to stop all robots and we made sure Although theoretically, each robot should comply to the standards introduced around 1994 and enhanced in 1996, each robot acts a little differently. You are advised to check the documentation provided by the owners of those robots, you'll be surprised to discover a world of useful facts and techniques. For instance, from Google's site we learn that Googlebot completely disregards any URL that contains "&id=". Here are some sites to check: Google: http://www.google.com/bot.html Yahoo: http://help.yahoo.com/help/us/ysearch/slurp/ MSN: http://search.msn.com/docs/siteowner.aspx A database of robots is maintained at http://www.robotstxt.org/wc/active/html/contact.html A robots.txt validation tool - invaluable in finding potential typos that can completely change the way search engines see your site, can be found at: http://searchengineworld.com/cgi-bin/robotcheck.cgi There are also some extensions to the standard. For example, some robots allow wildcards in the Disallow line, some even allow different commands. My advice is: don't bother with anything outside the standard and you will not be unpleasantly surprised. A final word of caution: In this article I showed you how things should work in a perfect world. Somewhere along this article I mentioned that there are good bots and bad bots. Let's stop for a moment and think from a deranged person's perspective. Is there anything to prevent one from writing a robot program that reads a robots.txt file and specifically look at pages that you marked as "disallowed"? The answer is absolutely not, this entire standard is based on the honor system and is based on the concept that everyone should work hard to make the internet a better place. Basically, do not rely on this for real security or privacy. Use passwords when necessary. In conclusion, do not forget that indexing robots are your best friends. While you shouldn't build your site for robots, but for your human visitors, do not underestimate the power of those mindless crawlers - make su List Building for Profit xt file, the User-Agent, which has the name of the robot you want to give orders or the '*' wildcard symbol meaning 'all', and the Disallow line, which tells a robot all the places it should not touch. The two line entry can be repeated for every file or directory you don't want indexed, or for each robot you want to exclude. If you leave the Disallow line empty, this means you are not disallowing anything, in other words, you are allowing the particular robot to index your entire site. Some examples and a few scenarios should make it clear:Why are you building a list? For fun or for profit? It is critical that you understand why you are building a list – you see, how you manage your list will be significantly different depending one the purpose of your list.So why are you building a list? This may seem like a pointless question. You know you have to build a list because that is where the money is. Or at least that is what you have heard. But you don’t really know it firsthand. You just know you should be building a list, and you figure that once you build your list, you can make it profitable. But you have to take steps to make it profitable, if it is going to become profitable.But it doesn’t really work that way. You see, you need to know why you want your list. Who are you going to put on your list? Are they going to be buyers or freebie seekers, are they going to simply read your emails and yawn or are they going to click through to sales pages and spend their money? What types of products are they going to want to buy? Why are they going to open your emails?All of those questions, and especially the answers to them, are critically important. If you build the wrong list for your product or your purchase, you will not be able to make it profitable. So you have to decide why you are building a list, and then build it in accordance with those goals. A. Exclude a file from Google's main robot (Googlebot): User-Agent: Googlebot B. Exclude a section of the site from all robots: User-Agent: * Note that the directory is enclosed between two forward slashes. Although you are probably used to see URLs, links and folder references that do not end with a slash, note that a web server always needs a slash at the end. Even when you see links on websites that do not end with a slash, when that link is clicked, the web server has to do and extra step before serving the page, which is adding the slash through what we call a redirect. Always use the ending slash. C. Allow everything (blank robots.txt): User-Agent: * Note that when a "blank robots.txt" is mentioned, it is not a completely blank file, but it contains the two lines above. D. Do not allow any robot on your site: User-Agent: * Note that the single forward slash means "root", which is the main entrance to your site. E. Do not allow Google to index any of your images (Google uses Googlebot-Image for images): User-Agent: Googlebot-Image F. Do not allow Google to index some of your images: User-Agent: Googlebot-Image Note the use of multiple disallows. This is allowed, no pun intended. G. Build a doorway for Google and Lycos (the Lycos robot is called T-Rex) - do not play with this unless you are 100% sure you know what you are doing: User-Agent: T-Rex H. Allow only Googlebot.. User-Agent: Googlebot Note that the commands are sequential. The example above reads in English: Let Googlebot through, then stop everyone else. If your file gets really large, or you just feel like writing notes for yourself or for potential viewers (remember, robots.txt is a public file, anyone can see it), you can do so by preceding your comment with a # sign. Although according to the standard, you can have a comment on the same line with a command, I recommend that you start every command and every comment on a new line, this way, robots will never be confused by a potential formatting glitch. Examples: This is correct, as per the standard, but not recommended (a newer robot or a badly written one might read the following as "disallow the # We... Directory", not complying to the "disallow all" command): User-Agent: * Disallow: / # We decided to stop all robots but we were very silly in typing a long comment which got truncated and made the robots.txt unusable The way I recommend that you format this is: # We decided to stop all robots and we made sure Although theoretically, each robot should comply to the standards introduced around 1994 and enhanced in 1996, each robot acts a little differently. You are advised to check the documentation provided by the owners of those robots, you'll be surprised to discover a world of useful facts and techniques. For instance, from Google's site we learn that Googlebot completely disregards any URL that contains "&id=". Here are some sites to check: Google: http://www.google.com/bot.html Yahoo: http://help.yahoo.com/help/us/ysearch/slurp/ MSN: http://search.msn.com/docs/siteowner.aspx A database of robots is maintained at http://www.robotstxt.org/wc/active/html/contact.html A robots.txt validation tool - invaluable in finding potential typos that can completely change the way search engines see your site, can be found at: http://searchengineworld.com/cgi-bin/robotcheck.cgi There are also some extensions to the standard. For example, some robots allow wildcards in the Disallow line, some even allow different commands. My advice is: don't bother with anything outside the standard and you will not be unpleasantly surprised. A final word of caution: In this article I showed you how things should work in a perfect world. Somewhere along this article I mentioned that there are good bots and bad bots. Let's stop for a moment and think from a deranged person's perspective. Is there anything to prevent one from writing a robot program that reads a robots.txt file and specifically look at pages that you marked as "disallowed"? The answer is absolutely not, this entire standard is based on the honor system and is based on the concept that everyone should work hard to make the internet a better place. Basically, do not rely on this for real security or privacy. Use passwords when necessary. In conclusion, do not forget that indexing robots are your best friends. While you shouldn't build your site for robots, but for your human visitors, do not underestimate the power of those mindless crawlers - make su The Use of Common Stock in Venture Capital Transactions a comment on the same line with a command, I recommend that you start every command and every comment on a new line, this way, robots will never be confused by a potential formatting glitch. Examples:When raising capital for a business venture, a company can either raise debt capital, equity capital or a combination of the two. Debt capital is money loaned to the company at an agreed interest rate for a fixed time period. Conversely, equity capital is money invested by owners (shareholders) for use in business operations that need not be repaid. Combinations include convertible securities which may be debt that can be converted into equity at some point in the future.The simplest form of equity capital is common stock. Common stock has many distinguishing factors as follows:- Common stock is not convertible into another type of security- Each share enjoys one vote- Dividends are payable without limit but only when declared by the board of directors- In liquidation, common stock holders are the last priority to which to distribute assetsIn venture capital transactions, there may be two types of common stock which are issued. The first is Class A common stock, which is like preferred stock without the special voting rights which some statutes require in shares labeled "preferred." A second type of common stock is junior common stock. While this type of stock is not used very frequently, it allows companies to get cheap stock into the hands of key employees at minimal tax cost.Determining what type of capital to raise and how to structure the financing transaction is of critical importance to growing ventures. As such, it is crucial to understand the key terms and consult the appropriate legal and business advisors when embarking on the capital-raising process. This is correct, as per the standard, but not recommended (a newer robot or a badly written one might read the following as "disallow the # We... Directory", not complying to the "disallow all" command): User-Agent: * Disallow: / # We decided to stop all robots but we were very silly in typing a long comment which got truncated and made the robots.txt unusable The way I recommend that you format this is: # We decided to stop all robots and we made sure Although theoretically, each robot should comply to the standards introduced around 1994 and enhanced in 1996, each robot acts a little differently. You are advised to check the documentation provided by the owners of those robots, you'll be surprised to discover a world of useful facts and techniques. For instance, from Google's site we learn that Googlebot completely disregards any URL that contains "&id=". Here are some sites to check: Google: http://www.google.com/bot.html Yahoo: http://help.yahoo.com/help/us/ysearch/slurp/ MSN: http://search.msn.com/docs/siteowner.aspx A database of robots is maintained at http://www.robotstxt.org/wc/active/html/contact.html A robots.txt validation tool - invaluable in finding potential typos that can completely change the way search engines see your site, can be found at: http://searchengineworld.com/cgi-bin/robotcheck.cgi There are also some extensions to the standard. For example, some robots allow wildcards in the Disallow line, some even allow different commands. My advice is: don't bother with anything outside the standard and you will not be unpleasantly surprised. A final word of caution: In this article I showed you how things should work in a perfect world. Somewhere along this article I mentioned that there are good bots and bad bots. Let's stop for a moment and think from a deranged person's perspective. Is there anything to prevent one from writing a robot program that reads a robots.txt file and specifically look at pages that you marked as "disallowed"? The answer is absolutely not, this entire standard is based on the honor system and is based on the concept that everyone should work hard to make the internet a better place. Basically, do not rely on this for real security or privacy. Use passwords when necessary. In conclusion, do not forget that indexing robots are your best friends. While you shouldn't build your site for robots, but for your human visitors, do not underestimate the power of those mindless crawlers - make sure the pages you want to be indexed are clearly seen by robots, make sure you have regular hyperlinks that robots can follow without roadblocks (robots can't follow Flash based navigation systems, for instance). To keep your site at tip top performance, to keep your logs clean, your applications, scripts and private data safe, always use a robots.txt file and make sure you read your logs to monitor all robotic activity.
HTTP = HTML link (for blogs, profiles,phorums):
Related Articles:Lifestyles of Successful Network Marketers Multiple Streams Of Income and the Internet Fairy Reciprocal Links And Partner Sites
|