A self-described mother of five posted an article on Linux.com about a new content filtering add-on for Firefox called Glubble. Like NetNanny, Dansguardian, and all the rest, it relies on white lists and black lists of key words. This approach has its cumbersome limitations. My anonymous suggestion in the comments (which you should recognize after reading this post) is that content filters should tap into the wisdom of crowds.
James Surowiecki wrote an excellent book about the Wisdom of Crowds, where he described the power of crowdsourcing for solutions to difficult problems. This concept goes back to the days of “Wanted” posters in the Wild West. Even if the sheriff couldn’t find a criminal, with enough eyes the culprit could be caught. Today we have Amber Alerts for the same reason.
Eric S. Raymond applied this principle to software development when he famously said, “Given enough eyeballs, all bugs are shallow.”
Likewise, many companies use crowds to build their business models. Dell has IdeaStorm and Ubuntu / Canonical have Brainstorm. Powerset, which was recently bought by Google for US$100 million, is developing its natural language search engine by allowing people to rate the results of its search.
Gmail’s spam filter went from allowing 20-30% of spam to slip through down to less than 1% because it was “taught” the difference between spam and non-spam by users who click the Spam button.
A good content filtering system could be developed if users — parents — were allowed to rate web sites based on their appropriateness for various age groups: say, 0-3 years, 3-6, 6-10, 10-14, and 14-17. The filter would perform poorly at first (thus it would probably have to go through a beta phase where a few thousands select users contributed to its knowledge, much like the development phase of the Powerset search algorithm), but eventually it would achieve much better results than current content filtering systems.
In the best of all worlds, it would go beyond using average ratings for particular domains or web pages and implement a natural language algorithm that could parse sentences and “understand” context. Thus a search for “urinary tract infection” would not block a bunch of pages because of all the associated words that tend to get flagged.
Given that Powerset’s bread and butter is the development of algorithms that understand natural language, this seems a like a good business opportunity for them / Google.
Maybe they will read my blog. :)
Read Full Post »