Sunday, May 01, 2011

De-bit.ly-fying a URL

Ever since the URL shortening services began with tinyurl.com, I've been extremely suspicious of them, probably because back in The Old Days it was a popular way to put up a goatse or a tubgirl link (if you don't know, don't ask) for the newbs.  Fortunately, that kind of abuse is A Thing Of The Past now.  But... you never know.

Just today, I got somebody else's SPAM in my mailbox (long story—some guy on my ISP thinks my email address is his wife's email address—this has been going on for years).  Normally I just delete the shit.  Today I was curious, so I dragged the email out of my InBox and onto the desktop and peeked at it with Notepad.

I'm not sure why, but I was quite surprised to find bit.ly links inside the email.  There was no way in Hell I was clicking on any of them, so I wrote a tiny kidscript called "debitly" to check them out. 

And before you decide to leave a comment to enlighten me, yes, I know you can hover your mouse pointer over a bit.ly link in a browser and get the full URL—this is different.  This is HTML I don't want to render in a browser or in an email or anywhere else.  It's plainly in ten foot pole territory. 

That said, here it is:

#!/bin/bash
URL=$(echo $1 | cut -d / -f 4)
echo -e "GET /$URL \
HTTP/1.0\r\n\
Host: bit.ly\r\n\
User-Agent: Mozilla\r\n\r\n" |\
nc bit.ly 80

"nc" is our old buddy, Netcat.  It might work with the Windows version of Netcat, but that's not how I roll.

Here is sample output from a random bit.ly URL posted on Twitter:

:~# ./debitly http://bit.ly/kQZUAt
HTTP/1.1 301 Moved
Server: nginx
Date: Sun, 01 May 2011 21:05:35 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Set-Cookie: [removed];
Cache-control: private; max-age=90
Location: http://home.comcrud.net/~joe-blow/VA7751.jpg
MIME-Version: 1.0
Content-Length: 137

[followed by some HTML BlogSpot can't render as text for some reason]

You could tack a "| grep Location:" on the end of that code to lose the headers, but they are there for your enlightenment.

"Joe Blow" is not his real name.  And comcrud.net is not the domain, but you get the picture.  And if I get into Deep Shit over this, it was Hypponen who Tweeted it in the first place, so don't harass me about it.  Keep your fucking Digital Millenium Copyright Act in your pants, OK?
 
As it turns out, the bit.ly links in the SPAM email were "legitmate".  That is to say they pointed to the opt-in SPAM customer's Web site, which is all well and good, but it was a disappointment to find out bit.ly is in the SPAM business, even if it is opt-in SPAM. 

Why was I disapppointed?  Well, they had a write-up of bit.ly's chief scientist, Hilary Mason, in Scientific American last month and I thought she was cute as Hell.  I was smitten, but now I know she's just another Advertising Slut.  sigh

But I was pleased to see they were using nginx!  That makes a lot of sense if you're throwing out a shitload of 301 redirects 24x7.  At least they have good taste in Web servers.

No comments:

Post a Comment