Classifying the proxies went much faster than I had anticipated, although there are a few details and maybe one bug left to hammer out.
The result, after 90 days of hacking on this thing, was a total of 1010 usable proxies out of a total of 215,000 stolen from various lists on the Web.
I decided not to list the CoDeeN/PlanetLab proxies, although there were almost 2500 of those in good working order. First, they don't like outsiders. Second, they're more useless than transparent proxies if anonymity if your goal. Third, there's that whole "we cooperate with Law Enforcement Agencies" thing. Ew.
And fourth, they're only used to pad proxy lists anyway.
I upped the output to the 200 most recent (non-CoDeeN) proxies and changed the refresh to once every two hours. You will find a new page after every even hour (except 4AM) EDT. On the odd hours it scans the Web for more. At 4AM it does the "Big Run", which leeches off all the most active proxy lists. As such, the 6AM posting might not make it until 6:15-6:30, since the page-making process synchronizes the "gold" list with the "raw" list every time it runs.
I have enough data to make this a ten page list, with 100 proxies to a page and I think that will be the next evolution.
Stay tuned.
ItsAProxy.info
ReplyDelete