Saturday, March 29, 2008

2 Weeks and 14,925 Proxies Later...

I've been hacking away at proxy lists around the globe for the past two weeks and am nearing the 15,000 mark rapidly.

And out of all that there are only 819 "good" proxies, which comes out to about a 5.5% success rate.

That's not very good, but it gets much, much worse. I did a cursory check on the 819 live proxies and found only about 150 of those were still active. I'm starting to wonder if anyone (besides me) even bothers with proxies anymore. The public lists are nearly worthless, except for a few decent standbys that have a handful of new addresses every day (out of hundreds - the casual proxy hunter would give up after a few pages).

Luckily, the dead proxies are good for one thing and one thing only: if you know an address is dead, you don't waste time checking it over and over. All the lists have the same data. A dead proxy in one list is a dead proxy in a hundred lists.

And the deadest proxies are PlanetLab proxies.

PlanetLab is "a global research network that supports the development of new network services." One of their research projects, CoDeeN, runs thousands of proxy servers across the globe. They've been online since 2003. And they've been abused ever since. In fact they have published some excellent research on why running a public proxy is bad (duh).

Since they have been hit so hard in the past, they have learned how not to be abused. Primarily, they don't take requests outside of the academic networks they operate in. This is why all the PlanetLab proxies listed in the proxy lists appear to be dead. If you're some non-academic schmuck (like myself) you'll get nowhere (this is not entirely true, but if you can find an open CoDeeN node you will find it works for GET requests, but not POST requests, which limits the functionality of many Web sites).

However, the network is operational and they have some pretty graphs (like this one) you can watch in Near Real Time™.

Most proxy lists use PlanetLab/CoDeeN proxies as "filler". Almost all of the proxies from .edu domains in my database are PlanetLab proxies. I try to avoid them if I can but they'll all be in there sooner or later.

I am nowhere near exhausting the supply of lists. Today I did a Google search of all occurences of "proxy list" in the .ru domain (Russia). About 38,000 hits in all. I browse the cached result first (after all, they're Rooskies) and if I find anything promising I make sure my anti-virus is running and dive right in.

Sometimes, if the site is dead, I cut & paste from the cached page and do an ad hoc run. Most are dupes, but there can be surprises.

And I've been getting better at sed/grep/cut/tr -ing the data out of these pages. I've been using html2text and links2 for the most part because they work direct-to-text, but sometimes you have to play games with Javascript and cookies.

Now that the data is flooding in, I'm almost ready to work on my own list, like I have threatened many times before.

Stay tuned.

Tuesday, March 25, 2008

State of Florida Stops Proxy Abuse - ALMOST!

Florida state employees may have screwed themselves over, according to this story, quoted below.

"Some state employees who used a "proxy server" in Germany to tap into their online payroll data may have exposed their personal information to identity theft, prompting a statewide reset of passwords."

A statewide reset of passwords. Hoo-boy. However, all is not lost, as noted in the last paragraph of the article...

"DFS has broken all links with known proxy services. Cate said each user is responsible for using firewalls and anti-virus software, monitoring system updates and not sharing log-ins or passwords with others."

All links with known proxy servers. Hmmm... think so? Let's take a closer look. Below is a screen capture of the proxy the Florida state workers used.

Sharp-eyed readers will noitce the little purple logo in the lower left corner of the page.

If you know what this little guy is, you are a rare person. This is a little Web bug brought to you by eXTReMe Tracking, a company that could care less about your privacy. This graphic sits on a Web page and passively collects information about visitors. Clicking on it will take you to a page showing who has accessed your site, in the form of their IP address, the referrer that brought them there, and a whole lot of other handy information.

To be clear there is no PII (Personally Identifiable Information). Well, almost. It depends on how you get to the site, but I won't go into that.

If you pay for this service, it's private. If you use the free version, anybody can view the information of who is hitting your site.

Right after I read the news story - no more than a minute - I went to the proxy site, found the eXTReMe bug and clicked on it. State of Florida workers were still hitting it.

Here's the screen capture (click for a larger view):

I've edited out all the non-State of Florida host names.

Florida's IT department may want to stop patting themselves on the back take a closer look at this issue.

Sunday, March 23, 2008

Checking Proxies

For a few years now I have been using one of the many online public proxy checkers. These are generally run by people who also publish proxy lists. In a real sense, they are passive collectors of proxy data.

One of the things these sites check for is the anonymity level of the proxy. Generally they do this by simply checking the X-Forwarded-For and Via HTTP headers of the request your browser sends to the Web.

If an X-Forwarded-For header is present, the proxy is marked as transparent, because the value of the header is your Internet Protocol (IP) address. Traffic through the proxy can be traced back to you.

You might not want this.

The Via header is a little more complex. It lists all the proxies your browser request went through. If there are more than one, but no X-Forwarded-For header, an X-Forwarded-For header may still have been recorded in a downstream server (downstream from the viewpoint of the last proxy your request went through, upstream from you). A classic Man-in-the-Middle (MITM) information disclosure scenario.

To recap: no X-Forwarded-For is good, no X-Forwarded-For and no Via header (or a Via header with only one hop) is better. This is enough to rate a proxy "Anonymous".

"High Anonymity" usually means a proxy will do SSL. "Elite" means "pay me $20 a month for access".

Rather than do my checking through the public proxy scanners I decided to put a page on to do the checking for me. I found the following ASP code on the Web:

For Each Key in Request.ServerVariables
response.write Key & ": " & request.servervariables(Key) & " <br><br>"

I threw that on the Web server and it worked fine. It was exactly what I needed.

But I had second thoughts. I didn't want to host a public proxy scanning server. That might be a Bad Thing. GoDaddy might frown on that sort of activity. They have been known to boot people for less.

While mulling this predicament over I accidentally discovered, through a Google code search, that this same, exact code is on literally hundreds of servers worldwide.

God bless the Google Bots!

The absolute best part of this fluke is that the requisite HTTP headers aren't buried in HTML crap, which makes it a snap to pick one URL out at random and grep the results.

After all, it wouldn't be right to hammer just one of these sites to death with traffic.

So I made an "A" list of about 25 servers, based on how busy the servers are (and a lot of them are in dark, dusty, unused corners of the Interwebs) and how long they've been around (from Netcraft's point of view). The proxy validation script picks one at random, and if that site is down for some reason, it tries again using the page as a backup.

The Security Dude inside me says this is an Information Disclosure vulnerability, and it is. Besides the HTTP header details, this code also reveals "too much information" about a Web server's capabilities and arhitecture, which could be very valuable to an attacker. This blog posting should really be an Advisory (it would make a great Nessus plug-in - give me a byline if you write one), but the... ummm... uh... Script Kiddie inside me needs this service and, after all, there are only hundreds (not thousands or millions) of sites with this code hanging out.

At the end of the day (GAWD I hate that phrase) it's a "blame the programmers" problem (don't get me started on Web programmers). It is "sample code", which should never be placed on a production server.

And yet... I did it myself.


However, you may now accuse me of "security by obscurity" (a lesser offense, IMHO) because I didn't reveal the URL.

Maybe you can Google it.

The Googlebots are EVERYWHERE!

Saturday, March 22, 2008

Obfuscated Proxy Lists

A couple of weeks ago I blogged about a proxy list site that had tried to make it difficult for bots to harvest proxy IP:port information from its pages. It turned out to be easier than scraping HTML. For me, at least.

I also threatened to start my own proxy list. I'm still threatening. I'm basically HTML-impaired when it comes to this Web crap. Making a Web page with a WYSIWYG editor (Ive been using Kompozer under Windows - it sucks ass on Linux) is more maddening than learning Word 2.0 was back in the early 90s. After a frustrating afternoon trying to figure out tables and wrapping (I said I was HTML-impaired), I gave up and went back to harvesting more data for the proxy database, so the list is on the back burner for now. But it is coming.

I found some interesting stuff and after a few days of hacking I now have 5000+ entries in the database, with a success rate of a little more than 10%, which is, truthfully, better than I expected.

These proxy list maintainers have gone to great lengths to keep their data "proprietary", but none of their methods are very effective (as illustrated below).

One list displays IP addresses in GIF files to prevent page-scraping. This is not a big deal. GOCR translates them back to ASCII nicely. You have to watch for zeroes that turn into "letter ohs" and ones that show up as "letter els", and 8's as B's, etc., but it's a finite set of substitutions that's taken care of with a short sed script.

Another site silently changes the URL you submit, prepending a fixed string to the URL displayed in the browser. This place had thousands of entries, most of them dead (very common with most of these sites), but some good. A couple of other sites used this same trick.

(Of course, I have to track the dead sites to prevent them from being checked again, so they go into the database as well.)

Another common trick is escaping the content and using JavaScript to unescape it. Some sites stop there but others use a trivial XOR method to further obfuscate matters. Rhino is a grea tool for this. To send the unobfuscated content to standard output (and from there to wherever you want it) you simply have to replace document.Write() with print(). Simplicity is Schweet.

Then there's a site in Belarus that plays cookie games. One cookie is delivered to you where you'd expect it, in a Set-Cookie: header, but another is stashed away as a META tag inside gzip'd HTML content. You need both to start page scraping.

That took a few hours to figure out. The result was the same: hundreds of dead addresses, but a few good ones. They all went into the database and I scheduled a once-a-day page scrape.

But my favorite so far has been a PHP-based site that tries to limit your query against their database to 100 rows. Apparently these folks never heard of Input Validation and if you hack the HTTP request packet you can get all the rows out of their database (you have to conclude if they are that stupid there's probably some good SQL injection mischief to be had with their site, but that's not my goal here - it's more valuable, to me, alive than dead).

That was pretty cool, and I do admit getting a chuckle out of it, but the bad news is that after about two thousand rows, the data is over seven years old (they've been around for a long time regardless - in spite of? - of their insecure programming techniques) and not very useful.

Still, into the database it goes.

There's a few refinements I need to make on this whole process before I put the list up. "Good" proxies have to be re-checked. Timed-out proxies will need to be re-checked until the port is known to be closed. The database needs to get a little fatter.

And I'm going to have to take some remedial HTML classes.

Saturday, March 08, 2008


Blogspot FRAMEd

My GoDaddy Web site, was primarily purchased to host the UT mods for the various BOT House servers that have come and gone over the past five years and in that capacity it has served us well. But as a Web site it left much to be desired. Besides the World Domination Map, there really isn't much else there.

Now, if you go there it brings you here. All of the Blogspot material is wrapped underneath a FRAME so it that looks like its coming from This is a very common thing to do on Blogspot. It can also be a Bad Thing™ in the hands of the wrong people and has generated quite a bit of news in the past week.

But relax. I'm one of the Good Guys.

This does not affect the downloads of the UT mods in any way. They're all still there.

You may also notice brings you here as well.

Proxy Fun!

My proxy research marches on. Last year I wrote a few scripts for harvesting and verifying proxies that appear on many of the publicly available proxy list sites. Unfortunately, they all have the same information. Truly, if you've seen one, you've seen them all.

I latched on to one particular site (which will remain nameless) that is updated every hour. Naturally, I harvest it every hour. This has been going on for months.

Sometime last week, the site operator apparently noticed and changed the format to make it harder to scrape his pages. My scripts stopped working.

This is not the first time this has happened to me. I used to scrape Back in the day when they had 10 pages of data instead of the usual 2 these days). One day, they changed their format and I had to adjust. Annoying (stripping HTML with grep and sed is no fun... perl tards can STFU, thank you), but not a big deal.

This guy, however, took great pains (for him I guess) to obfuscate the addresses & ports so that you couldn't just simply strip the HTML and run off with the data. This is the code he's using:

function proxy(mode,arg1,arg2,arg3,arg4,port){
var ret;
switch(mode) {
  case 1: ret=arg1+"."+arg2+"."+arg3+"."+arg4+":"+port;
  case 2: ret=arg4+"."+arg1+"."+arg2+"."+arg3+":"+port;
  case 3: ret=arg3+"."+arg4+"."+arg1+"."+arg2+":"+port;
  case 4: ret=arg2+"."+arg3+"."+arg4+"."+arg1+":"+port;

See what's going on here? This function is called with six parameters, like this:


In "mode 1" the 2nd arg is the 1st octet of the IP address, the 3rd is the 2nd, etc. and the last is the port value. In the other modes he simply rotates the octets left.

What's the point? You need a browser to execute the Java and display the results. I use wget or curl for my scripts. Therefore script no workie. Script kiddie he sad. :(

Big deal. So I rewrote the script. It took about an hour.

In the end, he made it easier. And that's the funny part. Like I said, scraping the proxy addresses and ports out of HTML isn't fun. He put everything on one line, making it easier to grep. Plus, he gives you the obfuscation code up front. Whutta guy.

I'd like to think this fellow went through all this trouble just because of Little Old Me, but I'm not that much of a megalomaniac. As I pointed out earlier, if you've seen one proxy list you've seen them all. To his credit this is a great list. If I was running a proxy list Web site, I'd steal his data. And this is probably happening to him on a large scale.

It's not just the Dinkster.

BTW, look out for my new, up to date free public proxy page! COMING SOON! :P

Saturday, March 01, 2008

E. coli O157:H7 - The Tinfoil Hat View

For hundreds of thousands of years the tiny little bacterium, E. coli, was our friend. It had carved a little niche for itself in our collective guts. Each human being - and all warm blooded animals - throughout history was host to millions of generations of the little guys. So long is our history with these critters that we really can't live without them. They are part of us.

At one time E. coli was considered a harmless bacteria.

That is, it was before 1982.

Since 1982 there have been 73,000 cases of infection and 60 deaths per year in the United States.

What was so special about 1982? Why did E. coli turn bad on us?

The particular strain of E. coli (noted in the title of this discussion) that caused the 1982 outbreak had only been seen once before, in a sick patient, in 1975. Before 1982, E. coli O157:H7 was considered "rare".

Where did it come from? Go back to 1972, the year the first successful recombinant DNA experiments were performed.

Back then researchers focused on E. coli because it was a bacteria that was well known and well docmented. There were no "rare strains". And it had structures known as plasmids which contained DNA and which were easier to tinker with than the microbe's main chromosomal DNA. This was a simpler time, and the method used in the experiments was commonly referred to as the "shotgun" approach. Foreign DNA was blasted into the plasmids in a random way, the microbes were cultured, and observed for changes.

And then of course when they were done with that batch they threw them into the trash and did another round of experiments.

The use of plasmids is a smoking gun:

"E. coli O157:H7 serotypes are closely related, descended from a common ancestor, divergent in plasmid content more than chromosomal content..."

Do you see where this is going? I'll spell it out:
  • 1972 - The first Recombinant DNA experiments are performed
  • 1975 - E. coli O157:H7 shows up for the first time
  • 1982 - E. coli O157:H7 starts killing people Big Time
Bringing us to the present with 70,000 cases per year (not to mention contributing to the millions of pounds of beef being recalled in the past year alone - and don't get me started on downers).

Granted, there are other factors involved, such as the Reagan-era slashing of the USDA budget (which drastically reduced the number of USDA meat inspectors in the field), the scourge of factory farming, and the rise of evil corporations (I haven't mentioned ConAgra - until just now) but it's astounding that no one, not even the tinfoil hat crowd, has ever investigated the link to the recombinant DNA experiments of the 1970s.