Saturday, March 08, 2008

FRAMEd

Blogspot FRAMEd

My GoDaddy Web site, www.mrhinkydink.com was primarily purchased to host the UT mods for the various BOT House servers that have come and gone over the past five years and in that capacity it has served us well. But as a Web site it left much to be desired. Besides the World Domination Map, there really isn't much else there.

Now, if you go there it brings you here. All of the Blogspot material is wrapped underneath a FRAME so it that looks like its coming from www.mrhinkydink.com. This is a very common thing to do on Blogspot. It can also be a Bad Thing™ in the hands of the wrong people and has generated quite a bit of news in the past week.

But relax. I'm one of the Good Guys.

This does not affect the downloads of the UT mods in any way. They're all still there.

You may also notice http://www.mrhinkydink.net/ brings you here as well.

Proxy Fun!

My proxy research marches on. Last year I wrote a few scripts for harvesting and verifying proxies that appear on many of the publicly available proxy list sites. Unfortunately, they all have the same information. Truly, if you've seen one, you've seen them all.

I latched on to one particular site (which will remain nameless) that is updated every hour. Naturally, I harvest it every hour. This has been going on for months.

Sometime last week, the site operator apparently noticed and changed the format to make it harder to scrape his pages. My scripts stopped working.

This is not the first time this has happened to me. I used to scrape http://www.proxy4free.com/ Back in the day when they had 10 pages of data instead of the usual 2 these days). One day, they changed their format and I had to adjust. Annoying (stripping HTML with grep and sed is no fun... perl tards can STFU, thank you), but not a big deal.

This guy, however, took great pains (for him I guess) to obfuscate the addresses & ports so that you couldn't just simply strip the HTML and run off with the data. This is the code he's using:

function proxy(mode,arg1,arg2,arg3,arg4,port){
var ret;
switch(mode) {
  case 1: ret=arg1+"."+arg2+"."+arg3+"."+arg4+":"+port;
          break;
  case 2: ret=arg4+"."+arg1+"."+arg2+"."+arg3+":"+port;
          break;
  case 3: ret=arg3+"."+arg4+"."+arg1+"."+arg2+":"+port;
          break;
  case 4: ret=arg2+"."+arg3+"."+arg4+"."+arg1+":"+port;
          break;
  }
document.write(ret);
}


See what's going on here? This function is called with six parameters, like this:

proxy(1,'201','91','212','1',3128);

In "mode 1" the 2nd arg is the 1st octet of the IP address, the 3rd is the 2nd, etc. and the last is the port value. In the other modes he simply rotates the octets left.

What's the point? You need a browser to execute the Java and display the results. I use wget or curl for my scripts. Therefore script no workie. Script kiddie he sad. :(

Big deal. So I rewrote the script. It took about an hour.

In the end, he made it easier. And that's the funny part. Like I said, scraping the proxy addresses and ports out of HTML isn't fun. He put everything on one line, making it easier to grep. Plus, he gives you the obfuscation code up front. Whutta guy.

I'd like to think this fellow went through all this trouble just because of Little Old Me, but I'm not that much of a megalomaniac. As I pointed out earlier, if you've seen one proxy list you've seen them all. To his credit this is a great list. If I was running a proxy list Web site, I'd steal his data. And this is probably happening to him on a large scale.

It's not just the Dinkster.

BTW, look out for my new, up to date free public proxy page! COMING SOON! :P

3 comments:

  1. That's what I'm doing right now, harvesting proxies. I was searching in google that function to find more proxies lists. How many you got approximately?

    ReplyDelete
  2. Ok... 4138608... I saw hahaha. What methods you use to search them? Harvest from other databases? Spiders? Test all IP ranges? (HAHA)

    Just curiosity :)

    I'm havervesting for make random anonymous connections (legal things).

    ReplyDelete
  3. I never scan IP ranges. I just scrape other proxy lists, every day, on the hour. I keep the bad ones in a database so I won't have to test them again and publish the good ones. Then I purge the good ones when they go bad.

    I've been doing this over three years now and it's all automatic (mostly bash kidscripts and some custom C code).

    ReplyDelete