Wednesday, February 10, 2010

All your paywalls are belong to us!

I subscribe to Broadcast (our industry's magazine) Tweet feed. Most of the tweets point to subscriber-only content which is annoying. I use a great site 'Be The Bot';

Have you ever been googleing something, and you see exactly what you need in the preview, but when you click the link it doesnt show you what you want to see? This is because the owners of the site are trying to trick you into buying something, or registering. It's a common tactic on the internet. When Google visits the site, it gives something called a "Header". This header tells the site who the visitor is. Google's header is "Googlebot". The programmers of the site check to see if the header says "Googlebot", and if it does, it opens up all of its content for only googles eyes.
Now, all we have to do is trick the site's headers, into thinking that we ARE google. That's what this site does. See the How to use box to the right for instructions on usage

It would appear that Broadcast don't even bother to do that - if you cut'n'paste the headline into Google and click the article you can read it all - it's the HREF tag they must be checking.

