'Search Engine Basics' Category Archive

Posted on Mar 7th, 2007

To be successful in the search engines it’s important to design your web site with the spiders in mind. Using the latest in web page design is not generally the best way to go. Spiders don’t view web pages like humans do, they must read the HTML in the page to see what it’s about. Below you will find tips on how to best design your web site with search engines in mind.

Do not use frames at all. Some search engines cannot spider web pages with frames at all. For the other search engines that can, they can have problems spidering it and sometimes they too can’t index the web page. Do not only use image’s to link out. You should always use text links to link out to important content on your web site. Spiders can follow image links, but like text links more though.

Use external JavaScript files instead of using Java Script code in the HTML document, using Java Script in the HTML document will make the page size much larger. Using an external Java Script file to do the job will reduce page size and make it easier for both spiders and browsers to download the page. Using Cascading Style Sheets can reduce page size and making the download time much faster in most cases. It will allow the spider to index your web page faster and can help your ranking.

Avoid using web page creators such as FrontPage, Dreamweaver or a WYSIWYG editor. Software such as that will often times add scripting code that is not needed, making the page larger than it needed to be and making it harder to crawl. It will also add code that can’t be read by the search engines, causing the spider not to index the page or not index the whole web page. It is better to use standard HTML. Adding code that they can’t read or have a hard time to read can lead to major problems with your ranking.

Try not to use Flash when possible. Flash cannot be read by the search engines to date and will cause download time to slow a bit. If you do decide to use Flash anyway, make sure you add text to the web page, so the search engines have something to read and find out what your web page is about. It will also allow your visitors to have something to read while the Flash file loads. Also don’t use Flash as a way of navigation, as I said before spiders cannot read Flash.

It’s important to add a site map to your web site. Not only will this make it easier for internet surfers to get around your web site, but it will also allow spiders to find your site’s content easier and index your web page sooner. The site map should contain text links and not image links.

I highly suggest that you look at your web page with a Lynx browser because this is similar to how search engines will view your web page. There are other tools on the internet that will allow you to view your web page without a Lynx browser, but see a web page just like it, so you may want to check those out as well.

Matt Colyer is the owner of Superior Webmaster. He also is a php, CGI, and ASP developer.

Posted on Mar 5th, 2007

1994 was a big year in the history of Web search. The first hierarchical directory, Galaxy, was launched in January and, in April, Stanford students David Filo and Jerry Yang created Yet Another Hierarchical Officious Oracle, better known as Yahoo!.

During that same month, Brian Pinkerton at the University of Washington released WebCrawler. This, the first true Web search engine, indexed the entire contents of Web pages, where previous crawlers had indexed little more than page titles, headings, and URLs. Lycos was launched a few months later.

By the end of 1995, nearly a dozen major search engines were online. Names like MetaCrawler (the first metasearch engine), Magellan, Infoseek, and Excite (born out of the Architext project) were released into cyberspace throughout the year. AltaVista arrived on the scene in December with a stunningly large database and many advanced features, and Inktomi debuted the following year.

Over the next few years, new search engines would appear every few months, but many of these differed only slightly from their competitors. Yet the occasional handy innovation would find its way into practical use. Here are a few of the most successful ideas from that time:

Go To (now Overture) introduced the concept of pay-per-click (PPC) listings in 1997. Instead of ranking sites based on some arcane formula, Go To allowed open bidding for keywords, with the top position going to the highest bidder. All major search portals now rely on PPC listings for the bulk of their revenues.

Meta search engines, which combine results from several other search engines, proliferated for a time, driven by the rise of pay-per-click systems and the inconsistency of results among the major search engines. Today, new metasearch engines are rarely if ever seen, but those that remain possess a loyal following. The current crop of metasearch engines display mostly pay-per-click listings

The Mining Company (now About) launched in February 1997, using human experts to create a more exclusive directory. Many topic-specific (vertical) directories and resource sites have been created since, but About remains a leading resource.

Direct Hit introduced the concept of user feedback in 1998, allocating a higher ranking to sites whose listings were clicked by users. Direct Hit’s data influenced the search results on many portals for a long time, but, because of the system’s susceptibility to manipulation, none of today’s search portals openly use this form of feedback. DirectHit was later acquired by Ask Jeeves (now Ask), and user behavior may well be factored into the Ask/Teoma search results we see today.

Pay-to-play was introduced, as search engines and directories sought to capitalize on the value of their editorial listings. The LookSmart and Yahoo! directories began to charge fees for the review and inclusion of business Websites. Inktomi launched “paid inclusion” and “trusted feed,” allowing site owners to ensure their inclusion (subject to editorial standards) in the Inktomi search engine.

The examination of linking relationships between pages began in earnest, with AltaVista and other search engines adding “link popularity” to their ranking algorithms. At Stanford University, a research project created the Backrub search engine, which took a novel approach to ranking Web pages.

Elite data Solution provides exclusive and highly accurate Data Entry and Data Processing services. We have the aptitude and proficiency to process any type of handwritten data entry printed data entry from any format.

By: http://www.elitedatasolution.com/

Posted on Feb 28th, 2007

Depending on your site’s subject matter, you might even be able to start earning a significant income purely from the popularity of your site. For example, if you have a site on jokes you might be able to get a substantial amount of traffic, but you’re unlikely to earn a huge income.

On the other hand, if you have a top ranking site on Google for the keyword phrase “web hosting”, you’ll be able to earn at least $1-2 million (probably more like $3-4 million) per year from that site alone. However, to get into the top 10 in Google for the phrase “web hosting” is extremely difficult, you would need to be a search engine pro to do it - even then, it might take you two years to get there with a team of 5 people working around the clock.

A profitable site will be based on a good keyword phrase, a good topic (or subject material) and be different enough to stand out from the competition. If you can combine these elements successfully you’ll increase you chances of turning a profit from your site.

Keep in mind that there are a number of niche areas where there is not so much competition but still the opportunity to make a decent profit. There are hundreds of such areas where you have the potential to earn $500-2000 per month. We’ll talk about keyword research and finding profitable niche areas in later issues, but for now, let’s look at the basics of how to get a site to the top of the search engines.

I’m going to focus on SEO for Google because it’s the largest (and the best) search engine which will generate the most traffic, especially if you can master all of the right optimization methods. Google has a very complicated system for determining which sites make it to the top of the search engine. There are many factors involved and there is no precise formula. A further consideration is that Google often changes its formula so you can’t rely on it being the same from month to month. However, there are a few key factors which are unlikely to change dramatically:

Keywords phrases should be one of the most important considerations when you are optimizing your site (remember that for each page of the site you should concentrate on one or two keyword phrases).

Keyword phrases (which you want to get to the top of the search engine for) should be well researched.  For more information on researching keywords, have a look at   http://www.profitpuppy.com/keyword-research.htm where you can find links to useful keyword tools.

Key factors for getting to the top of Google:

1. The keyword phrase in the title of the page - you should have the phrase in the title of the page.
2. The keyword phrase in the text of the page – you want the keyword phrase to be up the top of the page, preferably as high up the top of the code on the page as possible. You also want the phrase repeated a few times on the page. Don’t worry about how many times, just at least 3-4 times and more if you think it fits in naturally with the content of the page.
3. Include the keyword phrase in a large font heading with H1 tags  (this may make a small difference).
4. Links into your site from other sites (covered in more detail below).
5. Links within your site (covered below).
6. Links to other sites (covered below).

Links into your site from other sites

This is probably the most important factor for getting good search engine listings. You need to have good links coming into your site from other sites. You want those links to be from popular sites that also have lots of links coming into them. Google uses a Page Rank system which is a rough measure of how many links are coming into your site. You want links from sites that have good Page Rank.  Creating 50 of your own sites and interlinking them won’t increase your page ranking.

How do you get links from other sites?
The main ways are to: Create a great site that people naturally want to link to, Get an affiliate program so people can link to your site and get paid, swap links with other sites, get listed in directories such as dmoz.org and buy links from people who have sites with a high PR (Page Rank). Again, we’ll be talking about this in more depth in another issue of the newsletter.

The most important consideration with incoming links is that you need to have the keyword phrase in the link that directs traffic to you your site. Getting listed in directories such as DMOZ (Open Directory Project) can also help you significantly in getting a good search engine ranking in Google.

Links within your site

It helps once again if you have the keyword phrase in the links to each page of your site. For example, let’s say you have a site on baby products and you have one page where you are trying to optimize for the keyword phrase “cheap baby supplies”.  You should make sure that links that come into that page have the phrase “cheap baby supplies” so that your links look like this:  Click here for cheap baby supplies

Links to other sites

You can improve your page rank if you link to other sites that have high search engine listings for the phrase that you are trying to optimize for. Note: it can also hinder if you link to pages that are off-topic.  For example, you don’t want to link to a chemical arms factory from your baby supplies site.

Some other things you don’t want to do (and some things that make no difference at all) are:

1. Repeating the keyword phrase multiple times on the page unless it makes contextual sense.
2. Placing hidden links on your site – for example, links that are the same color as the background of the page won’t be recognized by Google.
3. Optimizing your site too carefully – if it’s obvious that you have optimized your site for Google (for example, if all your incoming links use the same keyword phrase) you may be penalized. Google looks for sites that are more ‘natural’ in their structure, so if all links look the same this won’t look like an organic site to Google robots and you may be penalized for it.
4. Despite popular opinion, Meta tags don’t really help at all. However, the description tag is important because that is the description that will appear in your listing on Google.

Finally, you may have heard of people using automated pages and or ‘cloaked pages’ to get high search engine rankings in Google. Personally, I don’t believe in taking this approach as it’s a non-sustainable way of getting high search engine rankings (that’s not say that some people who aren’t very successful at it). I know of atleast 3 highly successful companies who generate automated pages using  this software http://www.profitpuppy.com/traffic. This is a sneaky way to make it to the top of of the search engine rankings, but it can work; one guy I know is earning several million dollars per year using this technique.  So it can work, but as I mentioned before, I don’t think it’s sustainable and the Google engine doesn’t take kindly to it.

 

 

How do I submit my site to the search engines?

My personal philosophy is that if your site is not found naturally by following links then it is unlikely to rank in the search engines - I don’t usually bother submitting to search engines.

So that’s the basics of how to get to the top of the search engines. We’ll have a look at the more advanced stuff later on. It takes a lot of time knowledge and dedication to get there but once you master it, SEO is a fantastic way to get traffic and make money on the web.

Robert Rawson

AKA: http://www.profitpuppy.com/

Posted on Feb 25th, 2007

Search engines have robots that come to your site and grab everything there is to grab. But because competition is so fierce, there is no way to get in the search engines, unless you pay for ads or hire a SEO (Search Engine Optimization) consultant, right? Wrong!

Even if you pay big money, if your site is not properly seen by the robots used by search engines for indexing, chances are many of your pages will never make it.

In this article I will discuss the importance of having your website structured properly, the importance of using the old fashioned hyperlinks versus the modern Flash menus, scripts and extensions and provide you with a very simple and free tool that will allow you to see your site in a similar fashion most indexing robots do. But first, let’s define some of the concepts.

What is a www robot?

A robot is a computer program that automatically reads web pages and goes through every link that it finds.

The first robot was developed by MIT and launched in 1993. It was named the World Wide Web Wander and its initial purpose was of a purely scientific nature, its mission was to measure the growth of the web. The index generated from the experiment’s results proved to be an awesome tool and effectively became the first search engine. Most of the online stuff we can’t live without today was born as a side effect of some scientific experiment.

What is a search engine?

Generically, a search engine is a program that searches through a database. In the popular sense, as referred to the web, a search engine is considered to be a system that has a user search form, which can search through a repository of web pages gathered by a robot.

What is a bot? What is a spider? What is a crawler?

Bot is just a shorter, cooler (for some) version of the word robot. Spiders and crawlers are robots, only the names sound more interesting in the press and within metro-geek circles. For reasons of consistency, I will use the term robot throughout this article, when referring to spiders, crawlers and bots.

Are there other… things that crawl out there?

Oh yeah, but these things are way beyond the scope of this article. Well, for the conspiracy theory aficionados, let’s see… we have worms - self-replicating programs, webants (or ants) - distributed cooperating robots, autonomous agents, intelligent agents and many other bots and beasties.

How do robots work?

As with all other things technical, I believe that the only way you will utilize a technology to its full potential and to your best advantage is if and when you understand how that technology works. When I say how it works, I don’t mean intricate technical details, but fundamental processes, big picture stuff.

Generally, robots are nothing but stripped down versions of web browsers, programmed to automatically browse and record information about web pages. There are some very specialized robots out there, some that look only for blogs, some that index nothing but images. Many (such as Google’s GoogleBot) are based on one of the first popular browsers, called Lynx. Lynx was initially a pure text browser, therefore, in today’s internet Lynx would be extremely robust and fast. Basically, if you can program, you can take Lynx, modify it and make a robot.

So how do these things actually work? They get a list of websites, and literally start "browsing" them. They come to your site and then start reading the pages and following every link, while storing different information, such as page titles, the actual text of the page, etc.

Based on the above, what would happen if instead of your beloved Internet Explorer, Firefox, Opera or whatever browser you are attached to, you go dig on the internet and download a version of the venerable Lynx browser?

I’ll tell you what would happen, and some will probably accuse me of giving away one of the secrets the SEO corporate community does not want you to know:

You will be able to see your site very close to the way a robot sees it. You will be able to look for errors in your pages and track down navigation errors that might block a robot from seeing portions of your site.

In plain English, let’s say you built a great looking site. There is an index page, the first page one sees when entering your site. On that page you have the most incredible Flash navigation system, with a huge button pointing to your products and services and the rest of the site. If Lynx goes to your index page and will not see a standard link, it will not be able to see the rest of your site. There are extremely high chances that a lot of indexing robots will not see your site either.

You will then understand why your very large site, that has one of the most intricate and functional Flash based navigation systems on the planet never makes it high into the search engines, even after all your efforts of manually submitting it everywhere. It’s simply because you forgot to add basic hyperlinks. It’s because when you submit a site - even manually - all that really happens is you telling the search engine "hey, Mr. Search Engine, whenever you think you can find some time, please send your trusty robot to my site".

Folks, robots can’t usually use a navigation menu made in Flash, Java script, PHP, etc. and will not be able to get to your pages, it’s as simple as that.

How do I get Lynx?

Lynx first started life as a UNIX application, written by the University of Kansas as part of their campus-wide information system. It then became a gopher application (a pre-web search tool), then a web browser. The official page for Lynx is http://lynx.isc.org, however, if you are not a Linux geek, used to play with binary distribution files and used to compiling your own apps (don’t worry about what I just said), you might want to find a version that someone else already made usable for your computer. For example, if you are a PC user running Windows, you might want to check links to “Win32 compiled versions”. At the time of writing, one such site is http://csant.info/lynx.htm (called a distribution site) where you can download a version that will install onto Windows machines in a fashion that will be familiar to non-geeks. After you install the browser, you might want to read the documentation. To get you going and to alleviate your beginner frustrations, I’ll tell you that you must press the G key (as in “go”), then type the complete URL of the site you want to browse (starting with “http://”), then hit Enter. Use the arrows to navigate.

Bottom line, use Lynx to verify that every page of site is accessible and let the robots do all the work for you. You’ll save yourself a lot of aggravation and maybe some money that you would waste on advertising your otherwise non-indexable site.


Andrei co-owns Bsleek - a company that specializes in web design, hosting, promotional items, printing, tradeshow displays, logos, CD presentations, SEO and more. Andrei has amassed an extensive technical knowledge and experience through his career as the CIO for a major travel management company and through his past careers in military research, data acquisition and airspace engineering. He also consults for Trinity Investigations, a New York based PI firm.


Bsleek - Redefining cheap web hosting

Posted on Feb 21st, 2007

So you heard about someone stressing the importance of the robots.txt file, or noticed in your website’s logs that the robots.txt file is causing an error, or somehow it is on the very top of the top visited pages, or, you read some article about the death of the robots.txt file and about how you should not bother with it ever again. Or maybe you never heard of the robots.txt file but are intrigued by all that talk about spiders, robots and crawlers. In this article, I will hopefully make some sense out of all of the above.

There are many folks out there who vehemently insist on the uselessness of the robots.txt file, proclaiming it obsolete, a thing of the past, plain dead. I disagree. The robots.txt file is probably not in the top ten methods to promote your get-rich-fast affiliate website in 24 hours or less, but still plays a major role in the long run.

First of all, the robots.txt file is still a very important factor in promoting and maintaining a site, and I will show you why. Second, the robots.txt file is one of the simple means by which you can protect your privacy and/or intellectual property. I will show you how.

Let’s try to figure out some of the lingo.

What is this robots.txt file?

The robots.txt file is just a very plain text file (or an ASCII file, as some like to say), with a very simple set of instructions that we give to a web robot, so the robot knows which pages we need scanned (or crawled, or spidered, or indexed - all terms refer to the same thing in this context) and which pages we would like to keep out of search engines.

What is a www robot?

A robot is a computer program that automatically reads web pages and goes through every link that it finds. The purpose of robots is to gather information. Some of the most famous robots mentioned in this article work for the search engines, indexing all the information available on the web.

The first robot was developed by MIT and launched in 1993. It was named the World Wide Web Wander and its initial purpose was of a purely scientific nature, its mission was to measure the growth of the web. The index generated from the experiment’s results proved to be an awesome tool and effectively became the first search engine. Most of the stuff we consider today to be indispensable online tools was born as a side effect of some scientific experiment.

What is a search engine?

Generically, a search engine is a program that searches through a database. In the popular sense, as referred to the web, a search engine is considered to be a system that has a user search form, which can search through a repository of web pages gathered by a robot.

What are spiders and crawlers?

Spiders and crawlers are robots, only the names sound cooler in the press and within metro-geek circles.

What are the most popular robots? Is there a list?

Some of the most well known robots are Google’s Googlebot, MSN’s MSNBot, Ask Jeeves’s Teoma, Yahoo!’s Slurp (funny). One of the most popular places to search for active robot info is the list maintained at http://www.robots.org.

Why do I need this robots.txt file anyway?

A great reason to use a robots.txt file is actually the fact that many search engines, including Google, post suggestions for the public to make use of this tool. Why is it such a big deal that Google teaches people about the robots.txt? Well, because nowadays, search engines are not a playground for scientists and geeks anymore, but large corporate enterprises. Google is one of the most secretive search engines out there. Very little is known to the public about how it operates, how it indexes, how it searches, how it creates its rankings, etc. In fact, if you do a careful search in specialized forums, or wherever else these issues are discussed, nobody really agrees on whether Google puts more emphasis on this or that element to create its rankings. And when people don’t agree on things as precise as a ranking algorithm, it means two things: that Google constantly changes its methods, and that it does not make it very clear or very public. There’s only one thing that I believe to be crystal clear. If they recommend that you use a robots.txt ("Make use of the robots.txt file on your web server" - Google Technical Guidelines), then do it. It might not help your ranking, but it will definitely not hurt you.

There are other reasons to use the robots.txt file. If you use your error logs to tweak and keep your site free of errors, you will notice that most errors refer to someone or something not finding the robots.txt file. All you have to do is create a basic blank page (use Notepad in Windows, or the most simple text editor in Linux or on a Mac), name it robots.txt and upload it to the root of your server (that’s where your home page is).

On a different note, nowadays, all search engines look for the robots.txt file as soon as their robots arrive on your site. There are unconfirmed rumors that some robots might even ‘get annoyed’ and leave, if they don’t find it. Not sure how true that is, but hey, why not be on the safe side?

Again, even if you don’t intend to block anything or just don’t want to bother with this stuff at all, having a blank robots.txt is still a good idea, as it can actually act as an invitation into your site.

Don’t I want my site indexed? Why stop robots?

Some robots are well designed, professionally operated, cause no harm and provide valuable service to mankind (don’t we all like to "google"). Some robots are written by amateurs (remember, a robot is just a program). Poorly written robots can cause network overload, security problems, etc. The bottom line here is that robots are devised and operated by humans and are prone to the human error factor. Consequently, robots are not inherently bad, nor inherently brilliant, and need careful attention. This is another case where the robots.txt file comes in handy - robot control.

Now, I’m sure your main goal in life, as a webmaster or site owner is to get on the first page of Google. Then, why in the world would you want to block robots?

Here are some scenarios:

1. Unfinished site

You are still building your site, or portions of it, and don’t want unfinished pages to appear in search engines. It is said that some search engines even penalize sites with pages that have been "under construction" for a long time.

2. Security

Always block your cgi-bin directory from robots. In most cases, cgi-bin contains applications, configuration files for those application (that might actually have sensitive information), etc. Even if you don’t currently use any CGI scripts or programs, block it anyway, better safe than sorry.

3. Privacy

You might have some directories on your website where you keep stuff that you don’t want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc.

4. Doorway pages

Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages.

5. Bad bot, bad bot, what’cha gonna do…

You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world.

6. Your site gets overwhelmed

In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you’ll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you’ll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot’s IP or name, read the list of active robots and try to identify and block it.

What’s in a robots.txt file anyway?

There are only two lines for each entry in a robots.txt file, the User-Agent, which has the name of the robot you want to give orders or the ‘*’ wildcard symbol meaning ‘all’, and the Disallow line, which tells a robot all the places it should not touch. The two line entry can be repeated for every file or directory you don’t want indexed, or for each robot you want to exclude. If you leave the Disallow line empty, this means you are not disallowing anything, in other words, you are allowing the particular robot to index your entire site. Some examples and a few scenarios should make it clear:

A. Exclude a file from Google’s main robot (Googlebot):

User-Agent: Googlebot
Disallow: /private/privatefile.htm

B. Exclude a section of the site from all robots:

User-Agent: *
Disallow: /underconstruction/

Note that the directory is enclosed between two forward slashes. Although you are probably used to see URLs, links and folder references that do not end with a slash, note that a web server always needs a slash at the end. Even when you see links on websites that do not end with a slash, when that link is clicked, the web server has to do and extra step before serving the page, which is adding the slash through what we call a redirect. Always use the ending slash.

C. Allow everything (blank robots.txt):

User-Agent: *
Disallow:

Note that when a "blank robots.txt" is mentioned, it is not a completely blank file, but it contains the two lines above.

D. Do not allow any robot on your site:

User-Agent: *
Disallow: /

Note that the single forward slash means "root", which is the main entrance to your site.

E. Do not allow Google to index any of your images (Google uses Googlebot-Image for images):

User-Agent: Googlebot-Image
Disallow: /

F. Do not allow Google to index some of your images:

User-Agent: Googlebot-Image
Disallow: /images_main/
Disallow: /images_girlfriend/
Disallow: /downloaded_pix/

Note the use of multiple disallows. This is allowed, no pun intended.

G. Build a doorway for Google and Lycos (the Lycos robot is called T-Rex) - do not play with this unless you are 100% sure you know what you are doing:

User-Agent: T-Rex
Disallow: /index1.htm
User-Agent: Googlebot
Disallow: /index2.htm

H. Allow only Googlebot..

User-Agent: Googlebot
Disallow:
User-Agent: *
Disallow: /

Note that the commands are sequential. The example above reads in English: Let Googlebot through, then stop everyone else.

If your file gets really large, or you just feel like writing notes for yourself or for potential viewers (remember, robots.txt is a public file, anyone can see it), you can do so by preceding your comment with a # sign. Although according to the standard, you can have a comment on the same line with a command, I recommend that you start every command and every comment on a new line, this way, robots will never be confused by a potential formatting glitch. Examples:

This is correct, as per the standard, but not recommended (a newer robot or a badly written one might read the following as "disallow the # We… Directory", not complying to the "disallow all" command):

User-Agent: * Disallow: / # We decided to stop all robots but we were very silly in typing a long comment which got truncated and made the robots.txt unusable

The way I recommend that you format this is:

# We decided to stop all robots and we made sure
# that our comments do not get truncated
# in the process
User-Agent: *
Disallow: /

Although theoretically, each robot should comply to the standards introduced around 1994 and enhanced in 1996, each robot acts a little differently. You are advised to check the documentation provided by the owners of those robots, you’ll be surprised to discover a world of useful facts and techniques. For instance, from Google’s site we learn that Googlebot completely disregards any URL that contains "&id=".

Here are some sites to check:

Google: http://www.google.com/bot.html Yahoo: http://help.yahoo.com/help/us/ysearch/slurp/ MSN: http://search.msn.com/docs/siteowner.aspx A database of robots is maintained at http://www.robotstxt.org/wc/active/html/contact.html A robots.txt validation tool - invaluable in finding potential typos that can completely change the way search engines see your site, can be found at: http://searchengineworld.com/cgi-bin/robotcheck.cgi

There are also some extensions to the standard. For example, some robots allow wildcards in the Disallow line, some even allow different commands. My advice is: don’t bother with anything outside the standard and you will not be unpleasantly surprised.

A final word of caution:

In this article I showed you how things should work in a perfect world. Somewhere along this article I mentioned that there are good bots and bad bots. Let’s stop for a moment and think from a deranged person’s perspective. Is there anything to prevent one from writing a robot program that reads a robots.txt file and specifically look at pages that you marked as "disallowed"? The answer is absolutely not, this entire standard is based on the honor system and is based on the concept that everyone should work hard to make the internet a better place. Basically, do not rely on this for real security or privacy. Use passwords when necessary.

In conclusion, do not forget that indexing robots are your best friends. While you shouldn’t build your site for robots, but for your human visitors, do not underestimate the power of those mindless crawlers - make sure the pages you want to be indexed are clearly seen by robots, make sure you have regular hyperlinks that robots can follow without roadblocks (robots can’t follow Flash based navigation systems, for instance). To keep your site at tip top performance, to keep your logs clean, your applications, scripts and private data safe, always use a robots.txt file and make sure you read your logs to monitor all robotic activity.


Andrei co-owns Bsleek - a company that specializes in web design, hosting, promotional items, printing, tradeshow displays, logos, CD presentations, SEO and more. Andrei has amassed an extensive technical knowledge and experience through his career as the CIO for a major travel management company and through his past careers in military research, data acquisition and airspace engineering. He also consults for Trinity Investigations, a New York based PI firm.


Bsleek - Redefining cheap web hosting

Posted on Feb 16th, 2007

Every business owner want to attract more orders and have more sales. You can do this in smart ways. Here are some tips and tactics.

First of all, your sales page should be well-written. Include emotional words in your advertisements. Use ones like love, security, relief, freedom, happy, satisfaction, fun, and more positive words.

Secondly, You can create a free ebook directory on a specific topic at your web site. People will visit your web site to read the free ebooks and may see your product ad. This is a viral marketing. When your ebook got passed around, your website will get exposed to many people. So you will get free traffic. Also you can allow people to download software or e-books from your web site at no cost. Just ask your visitors in return if they’ll refer their friends to your web site. It is easy to get that set up with pre-written script and put a referral form on your website. Aother way to get good results is to publish your e-zine only on your web site. Have people subscribe to a new issue" e-mail reminder. This could really increase your traffic and sales.

Combine a product and service together in a package deal. It could increase your sales. If you’re selling a book, offer an hour of consulting with it.

If you sell any products, you can build up the number of people that join your free affiliate program quickly by temporally offering your product for free to the people that sign up. If they help to sell your products, you pay them 20-70% of commision. There are a lot softwares which can help you to set up affiliate program automatically. The quickest way is to negotiate with e-zine publishers who owns large list of subscribers to get free or discounted ads by letting them join your affiliate program and earn commissions on the ad you run.

Add a free classified ad section to your web site. You could then trade banner ads with other web sites that have free classified ad sections. Sell advertising space in your product package. You could sell inserts, flyers, brochures, booklets, and digital ads for electronic products.

There are more ways to jump start your sales:

- Turn part of your web site into a members only site. Instead of charging for access, use it as a free bonus for one of your products.

- Find a strategic business partner. Look for ones that have the same objective. You can trade leads, share marketing info, sell package deals.

- Brand your name and business. You can easily do this by just writing articles and submitting them to e-zines or web sites for republishing.

- Start an auction on your web site. The type of auction could be related to the theme of your site. You’ll draw traffic from auctioneers and bidders.

- Model other successful business or people. I’m not saying out right copy them, but practice some of the same habits that have made them succeed.

- Offer daily or weekly visitor bonuses. This will increase your repeat traffic and sales because your visitors will visit regularly to get the visitor bonuses. Ask people online to review your web site. You can use the comments you get to improve your web site or you may turn the reviewer into a customer.

- Out source part of your workload. You’ll save on most employee costs. You could out source your secretarial work, accounting, marketing, and more. It can save you a lot of time. You can visit http://www.elance.com to get help with low cost.

Finally, remember to take a little time out of your day or week to brainstorm. New ideas are usually the difference between success and failure.

———————————————————
Julia Tang publishes Smart Online Business Tips, a fresh
and informative newsletter dedicated to supporting people
like you! To find out the best online business opportunities,
and to discover hundreds more proven and practical internet
marketing secrets, plus FREE internet marketing products
worth over $200, visit: http://www.best-internet-businesses.com
———————————————————-

Note:You may use this article in your ezine or on your site as long as the article and resource box remain unchanged

Posted on Feb 15th, 2007

To understand how to avoid stop words, you first have to understand what stop words are. Search engines have words or phrases that are considered ‘stop words’. When a spider or crawler encounters one of these stop words, they will immediately leave your website and any information they gathered from it will not be saved in their database. This means that your website will not be indexed. If your website is already indexed in a searched engine, the crawler will come back to see if there are updates — and if it finds stop words when it does this, then your site could get banned from the search engine. You will not be allowed to remove the words and re-submit: it’s too late.

Different search engines have different lists, but some are nearly universal – usually words that refer to sites with graphic sexual content, or other ‘adult’ material. We can’t really put a list here, or you’d never find this page! You should be able to tell what they are for the most part, but remember, pornographic web sites will often get indexed as well. The norm for search engines is that they will attempt to avoid content that is illegal. When they encounter adult oriented sites, they will generally only ban sites that contain especially vulgar or illegal materials. The problem being that many of these sorts of sites will use “gateway” pages that have no content other than “click here to enter.” These sites are less likely to be caught with the stop word censors.

These aren’t the only stop words, however. Some search engines create different lists of stop words for each different kind of website. What does this mean? Well, the algorithms which rank the pages determine how many times on a page a keyword is listed. Keep in mind that there are people trying to keyword stuff their pages to improve their ranking. If a word is not relevant to your site, don’t list it in your key words. This is one easy way to avoid losing your index privileges. There is no reason to try to lure people into your site on key words that don’t apply as they will not stick around long enough to provide you with revenue anyway.

Now you might be asking what keyword stuffing is. Keyword stuffing is when someone uses the same keywords over and over again in the meta tags and the content of the web page. If you’re searching the web and you come across a website that seems to be ranked far too highly, then keyword stuffing is usually to blame. The search engines work hard to stop people from using these kinds of tactics, and usually de-list sites that they find to be using them. That’s why some sites can be listed near the top for a while, before one day seemingly disappearing altogether.

Some sites, however, get ejected for repeated keywords, even though these sites aren’t trying to keyword stuff. This is where keyword analysis comes into play. To avoid repeating the same word or phrase too many times, you need to analyze your pages before you submit them – you wouldn’t want your efforts at finding good keywords to go to waste.

Although search engine optimization is a long-term effort, the processes and rules change frequently, without any warning. There are, however, tools available that will help you to stay on top of what’s going on in SEO. Many tools can be tried out at no cost, letting you try before you buy. There are so many to choose from that you can’t just start downloading them – you will want to read the information on each tool before you make a decision.

By using these tools before you submit your pages to any search engines, you will fully understand "stop words" and how to avoid them! You need to keep in mind, though, that different search engines have different stop word lists, so words that don’t matter to one search engine can stop your site from being listed on another.

There are also some words that aren’t included in searches, such as ‘and’, ‘of’, ‘the’, and other small words. You should keep these words out of your meta tags, as they’re just a waste of space.

About The Author:

Lawrence Andrews is an ePublisher, software developer, consultant, and author of numerous books. Visit his Private Label Content and Software site at http://www.lmamedia.com for more information about SEO and PRL.

You may use this article freely on your website as long as this resource box is included, a link point back to my site, and this article remains unchanged! Copyright 2005 Lawrence Andrews

Posted on Feb 13th, 2007

Over the last few years, search engine optimization (SEO) has been needed and used more and more, although it has been around for much longer than most people think. With new development tools being used to create websites that are heavy on Java, Flash and images, it’s important to have something that the search engines can read. If the content can’t be read by search engines then they can’t index it, and if your site doesn’t get indexed then it won’t be found when people search for it on Google, Yahoo, MSN, or anywhere else. This article will outline what SEO is, how it works, and some unethical SEO methods that you should avoid.

What is SEO?

SEO is a way of analyzing your site and modifying it to allow search engines to read and index it more easily. SEO is all about maintaining and building websites that get ranked highly on the major search engines.

You see, when people use a search engine, they generally don’t look beyond the top 20 or so results. If you want to make any money from your website, you need to get ranked in the top 20 out of potentially hundreds of websites.

How Does SEO Work?

Search engines maintain a huge database containing information from individual websites. Most of the information search engines collect isn’t listed on their results pages, but it is taken into consideration when it comes to deciding those results’ rankings.

It is very important that you encourage the search engines to rank your website in a high position, and you can do this through the keywords that you use on your website, as well as when you submit it. If the keywords you use in your submission tool don’t match the ones on your site then you could harm your rankings – be sure to have all the keywords you want to use on the website itself before you submit it.

Most websites don’t focus on their topic well, and so keyword lists containing 50 or more phrases per page are recommended. By focusing some of the pages of your site on keywords, you will score higher with the search engines.

Free Search Engines.

The major search engines on the Internet are still free, and it’s not hard to take advantage of this free advertising – you can do it in as little as an hour.

There are several companies that provide free SEO tools, or you can pay a professional to take care of it for you. Looking around on the web will turn up all sorts of useful resources.

What is Unethical SEO?

Unethical search engine optimization techniques can be unlawful, unscrupulous, or just in bad taste. You’d be surprised how many people use these methods. A lot of what is now called unethical SEO used to be accepted, until people went overboard and it started to have a negative affect on the web as a whole.

Keyword stuffing is when your site consists of long lists of keywords and nothing else. Don’t do it. There are ways to put keywords and phrases on your site without running the risk of getting banned.

You may have seen ‘invisible text’ if you’ve been selecting the text on a page and found words that are the same color as the background. This text is often lists of keywords put there in the hope of fooling search engine spiders while hiding the words from visitors. This is considered unethical, and you shouldn’t do it.

A doorway page is a page that isn’t designed for real people to see – it’s purely for the search engines and spiders, in an attempt to trick them into indexing the website in a higher position. This is a big no-no and should be avoided.

Even though unethical SEO is tempting, and does work, you shouldn’t do it – not only is it annoying to users, but it’s likely to get you banned from the search engines sooner or later. You sites’ search engine rankings just aren’t worth the risk. Use efficient SEO techniques to get your site ranked higher, and stay away from anything that even looks like unethical SEO.

SEO is a set of techniques used in order to attract visitors or prospective customers to your website, and the goal of a search engine is to provide high quality content to the users of the Internet. These two objectives are not in opposition, if you do SEO the way it should be done.

About The Author:

Lawrence Andrews is an ePublisher, software developer, consultant, and author of numerous books. Visit his Private Label Content and Software site at http://www.lmamedia.com for more information about SEO and PRL.

You may use this article freely on your website as long as this resource box is included, a link point back to my site, and this article remains unchanged! Copyright 2005 Lawrence Andrews

Posted on Feb 11th, 2007

Search engines will be a way for you to generate from as little as 20% to as much as 60% of your business online (depending on what other marketing techniques you use).

Since there are over 130,000,000 webpages in existence (yes that is 130 million!), it is extremely important to understand how they work and how to increase your chances of being placed in the top 20 of the search results. For example, if you were to type "music" and "CD" into the AltaVista search engine as a keyword the result would be over 1,000,000 related site URLs.

Search engines are a very powerful tool if you are in the top 50 results (preferably the top 20), but are completely useless if you are listed further down. You can bet that if you are further down than the 50th result, the searchee will not even see your site listing, much less be able to visit it.

As we all know, the beauty of search engines is that they can bring you a large amount of targeted traffic and it will not cost you a cent!

It is crucial you understand the basics of how search engines work if you want to get traffic to your site from them. There are three main types of search engines/directories. The first is a directory (sometimes called a category database). This is not a true search engine, but a listing of webpages by category. Many directories allow you to enter in the description and keywords for your site exactly as you would like them to appear. You usually have to select the category you want it cited under, too.

A directory will not list your URL and will never become aware of your site if you do not register with them. They do not make use of "indexing software" (robots that crawl the web looking for new sites and indexing them). An example of a directory is Yahoo.

Search engines (also called crawlers, spiders, robots, and worms) vary to a large degree. They will automatically index your site using "indexing software" or "indexing robots".

Depending on the complexity of the software, here is what different search engines might do:

1.Index the webpage (not the entire "website") you give them.

2.Index every word of every page at that site.

3.Visit external links to crawl through the web looking for any new sites 24 hours a day, 7 days a week going from URL to URL until they have visited every website that can be found on the Internet.

By simply telling the search engine what your URL is, its software robot will go there automatically and index everything they need. Every search engine has different criteria for returning search results which makes a difference on how you want to submit your site as it can drastically effect your ranking in search engines (we discuss this quite extensively in the course, but it takes up over 30 pages, so we will skip it in this newsletter).

It is important to realize that many search engines change their algorithms on a regular basis (i.e. weekly, monthly, etc.) - if you’re listed prominently today, that may no be true tomorrow.

There are also META search engines. These perform searches on multiple search engines simultaneously. In this instance, your ranking for the keywords inputted is calculated by the combined ranking of all the search engines simultaneously used. The key to getting ranked high is to make sure you’re listed in all the search engines used by the META search engines (They use: OpenText, Lycos, WebCrawler, InfoSeek , Excite, AltaVista, Yahoo, HotBot, and Einet Galaxy).

It is not necessary to submit your site to META search engines since they use the results of the major search engines (not their own).

I hope this helps in your future marketing decisions.

About The Author

David Bell

http://www.wspromotion.com/

Advertising research and development center

Posted on Feb 8th, 2007

First, go to the search engine where you want to check your site’s ranking, and enter the keywords you want to check. Your result pages will come up, and you will need to look through them until you find your website. SEO experts recommend that if you aren’t listed in at least the top 20 then you should continue to optimize, as most people won’t look any further than that. This is simply common sense. When you are determining if your rank is high enough simply think to yourself, “Would I look for this long for this page?”

You will want to do this with each search engine and directory until you have some idea of where you are. Check your website’s rankings regularly, because changes to algorithms can affect them drastically and quickly. Keep in mind also that thousands of new web pages are added daily, and many of them are actively trying to get ranked ahead of you. That’s right. There are thousands of other in on the same game as you so you must keep sharp. Your competitors may be reading these same articles and using these same tricks!

If you can’t find your website in a search engine’s results, you should enter “site:” your domain name in the keyword field to see whether you are listed at all. If your URL appears with the name or description of your site then you are somewhere in the search engine’s index. If all you get back is a blank page, then you’re not in the search engine’s database at all – you need to wait longer. This trick of typing “site:www.yourdomain.com” also helps you to determine how many pages you have indexed on each particular search engine. The more pages that are indexed, the more likely somebody is to encounter your site.

It you find that your website is miles away from the top 20, don’t be discouraged – you can change that! You may need to re-evaluate your keywords, and try to find new ones that are more relevant to your site. Many search engines have human-edited rankings for the most commonly searched-for phrases, and it is often difficult to get in that list. Good content is the best way to increase your chances of getting a high ranking for a certain topic. The more popular that your page is with the masses, the more popular it will be with search engines.

Search engines are a perfect example of “the chicken or the egg.” In this case, there is an answer! Search engines attempt to deliver sites that the populous has deemed important, not the other way around. This is why it is so important that you have good, relevant content and plenty of it.

If you want to check to see if a single web page on your site has been indexed, visit the search engine and enter the complete URL, like this: http://www.yourpage.com/yourpagename.html.

If the search engine has indexed that particular page then it will come back with a description of it. If it hasn’t then you’ll see a message saying something like “Sorry, no information is available for that URL”.

On Google, if your URL has been indexed, this page will offer you to show the cached version of the page, or to find similar pages, as well as pages that link to your page or that contain your URL on the page.

You could go ahead and use these manual tracking methods, but we would recommend that you consider using online tools or downloadable software that will allow you to check these things more quickly. It can be a very tedious and time-consuming job to do by hand, especially if you have several sites to monitor.

Top25web.com is one such search engine-ranking tool. You can find out where your website ranks in Google, Inktomi, and AltaVista for free. You can also analyze the results of a particular keyword search, to create a plan for improving your site’s ranking.

URL Ranker offers instant, online reports of website rankings in 17 top search engines, including Google, Yahoo, AOL Search, MSN, AllTheWeb and AltaVista, again for free. It will tell you if your site is listed in each engine, and tell you the ranking if it is.

These tools alone offer an excellent way of checking your sites rankings. Once you know where you stand, you can continue with your SEO plan, and move on to other aspects of marketing too.

About The Author:

Lawrence Andrews is an ePublisher, software developer, consultant, and author of numerous books. Visit his Private Label Content and Software site at http://www.lmamedia.com for more information about SEO and PRL.

You may use this article freely on your website as long as this resource box is included, a link point back to my site, and this article remains unchanged! Copyright 2005 Lawrence Andrews

« Prev - Next »