I've been working on this over the past couple days and have the following.
- Script takes 2 files as input:
- replication.txt, which is a plaintext list of URLs from which to pull data
- blacklist.txt, which is a list of URLs to exclude from the replication list output.
- Script does the following:
- CURLs each URL on the list a file: /lainrep/replication.txt and /lainrep/banner.png.
- /lainrep/replication.txt should be the remote host's own replication list
- /lainrep/banner.png, which should be the remote host's desired banner for the web ring
- builds a big-big array of all of the entries in everyone's replication lists.
- builds a hash table of all locally stored banners (currently in ./banners/).
- if the remotely CURL'd banner's hash isn't found in the hash table, it will save the banner to ./banners/ to a file named after a sanitized version of the URL + ".png"
- searches the big-big array of everyone's replication lists for entries that occur >= 3 times (this is a trust count to keep soykafheads from mudding up the list just because one guy decides to add them to their site). Builds a "final" array out of only these URLs.
- parses blacklist.txt and removes all of the matching elements from the "final" array.
- rebuilds and overwrites the replication.txt list based on the final array; this will remove dead sites/sites without a replication list currently.
- outputs formatted HTML based on the final array that should look similar to the webring listings from urof.net and similarly well made sites.
To-do:
- Tor and I2P support. I'm pretty sure I can get Tor working, but I have NO idea how I2P works or how to send CURLs to eeepsites or whatever. I also have no testing infrastructure for I2P, so if anyone knows how all this works, enlighten me please.
- Up/Down detection. I'm pretty sure I know how to do this in a rudimentary way, and will be probably the next thing I really work on.
- Make a config file so the user can set the input and output filepaths as well as the trust count for their particular instance of the list.
- polishing things like readmes and other stuff to make it easy to install and use.
- maybe some refactoring as things move forward.
Since it was noted that I shouldn't crowd the webring thread with this project so that we can see new sites instead, I'll start posting incremental updates on my site's news page instead (
http://oh3curby3abfknsydatt2qc3vgxggfjxd6infybwvlbgaezjhvhzqhad.onion) Once I get a release candidate ready, I'll ping the thread again so that everyone can be aware of it!
Sorry I didn't take anyone's advice re: json or csv or doing something better, but doing it in a way I'm more familiar with, I have been able to make fast progress. Hopefully I can get a useful tool out to the community either way.