ua

Archive for the 'Random Thoughts' Category

What Were They Thinking?

Thursday, October 28th, 2010

Being my own systems administrator, one of the things that I like to do is browse my server logs.  Most of the time I see traces of some bot feebly trying to exploit my server.  But one thing that provides me with mild entertainment is seeing what search terms lead people to my site.

The top search term usually has something to do with the show What’s With Andy, a Canadian cartoon that was on the air between 2001-2007.  To this day I have no idea why people click on my page when they’re looking for the show.  You can clearly see from the google description that this site has nothing to do with some obscure cartoon.  Many of these hits come from eastern Europe or a country in the former USSR.  My guess is that they’re showing reruns of the show in these far off, former communist countries, and the viewers don’t know enough English to realize that my page doesn’t have anything to do with what they’re actually looking for.  Because so many people click on my page from this search term, it is now number two in google page ranking.

But today I came across the most interesting search term I have ever seen.  Someone wound up on this page with the search term “full-size sex robot.”  Really?

Two things come to mind:

  1. What the hell?
  2. Why did you click on my page for this search?

Does this site really resemble a sex shop that much?  Is the title of my page “Andy Online, purveyor of fine sexbots”?

Some things I will never understand.

Am I the only one not disappointed by the Lost finale?

Monday, May 24th, 2010

Seriously.  Almost all the comments I’ve seen from friends are about how much they were disappointed with the finale.  I have a guess as to why.  They were expecting too much.

I would say 90% of your enjoyment 0f a TV show or movie comes from one thing: your expectation.

My expectations for the final season were different when it started than when it ended.  I started season 6 expecting all the mysteries of the Island to be explained.  As the season progressed, I realized that there was no way that the creators could do that.  Answering every single question was simply not possible.

Lost has never been about answering questions.  With every question they answer, they give you two more questions.  Sometimes I fondly look back to season 1 when the biggest question on everyone’s mind was “who are the Others?”  Frankly, I think it would be an insult if they did answer all our questions.  Some things are better when you can create your own theories.  (Ask me about my dual purgatory theory)

Lost was never about the Island.  It was about the people on it.

Don’t Use MSN Live Search!

Thursday, May 14th, 2009

I don’t see why anyone would use anything but Google to begin with, but for those of you who use MSN Live as your search engine, you might want to rethink it.  Here’s why:

Search engines use what are called “spiders” or “bots” to index web pages for their search engines.  What these spiders do is crawl the Internet, downloading the content of random web pages to be included in their search databases.  They will get the home page of a site, find all the links in the page, and download the content of those links.  This is all fine and dandy…if they follow the rules.

Since some people don’t want certain parts of their website to be searchable, they can give a spider visiting their site a list of places they are and are not allowed to go.  These rules are put into a file called robots.txt (to view my rules, take a look at http://andyonline.org/robots.txt).  All major search engines claim to follow these rules, including MSN.  However, I have caught MSN breaking the rules on more than one occasion.

Search engines are not the only people who use spiders.  They are also commonly used by the baddies of the Internet.  They search web pages for email addresses to send spam or to find sites that are vulnerable to hacking.  These bots rarely follow your instructions.

There are various techniques that I utilize to prevent bad spiders from accessing this site while still allowing legitimate visitors.  One of those is a “bad bot” trap.  There is a link on this site that is only visible to spiders, which I instruct them not to visit. If that link is visited, it bans that computer from visiting my site in the future.  This link points to http://andyonline.org/bot-trap/ (for the love of God, don’t go there or you won’t be able to access this site again).

My bot trap has caught the MSN Live spider…twice.

Here is the current content of my robots.txt file:
User-agent: *
Disallow: /wp-admin/
Disallow: /podcast/
Disallow: /ua/
Disallow: /bot-trap/

“User-agent: *” is a blanket statement meaning “any spider visiting this site”. The next four lines detail the folders that I don’t want crawled.

Here is an excerpt from my site access log from early this morning:
65.55.106.112 – - [14/May/2009:02:31:33 -0600] “GET /robots.txt HTTP/1.1″ 200 91 “-” “msnbot/2.0b (+http://search.msn.com/msnbot.htm)”
65.55.106.112 – - [14/May/2009:02:32:29 -0600] “GET /bot-trap/index.php HTTP/1.1″ 200 1892 “-” “msnbot/2.0b (+http://search.msn.com/msnbot.htm)”

The funny thing is that the MSN spider grabbed my robots.txt file right before it got itself banned.

This exact scenario played itself out last October.  I unbanned their spider and tried to contact MSN tech support to tell them to stop being jerks, but their support system is a tangle of help pages and canned responses. Their basic response was “Just deal with it”. They said that I must have recently changed my robots.txt file and their spider hasn’t caught up with the changes. I call shenanigans. I haven’t modified my robots.txt file since August 2008. For the first incident, it hadn’t changed in 2 months, the second time, it hadn’t changed in 9.

The MSN Live spiders blatantly go where they don’t belong.

I’m done cleaning up the mess from the MSN spider plowing through like a bulldozer in a sandbox.  I am no longer unbanning the MSN spider when it goes where it doesn’t belong.  It should know better.  Will it affect this site showing up in MSN Live Search?  Possibly.  I’m not too worried about that though.  MSN still has plenty of other unbanned spiders still happily crawling away at this site.  Plus, most of my search engine traffic comes from Google anyway.  Over the last 30 days, traffic to this site originating from Google out numbered traffic from MSN by 8:1.

I’m not the only one who uses a bot trap.  How many other web sites have blacklisted the MSN spider?  How much information is unavailable through MSN Live Search because of their behavior?

MSN uses unfriendly tactics when building their search database.  You shouldn’t support their behavior.

Don’t use MSN Live Search.