As the Internet continues to grow, it becomes increasingly difficult to sift through all the information that is available. With the overwhelming amount of data and choices that can be made, people need a filter to increase the Internet's usability. A technique used for dealing with this problem is collaborative filtering (also known as social filtering), which reduces the time spent searching and increases the accuracy of retrieval. In Social Information Filtering: Algorithms for Automating "Word of Mouth," Shardanand and Maes write:
Social Information filtering exploits similarities between the tastes of different users to recommend (or advise against) items. It relies on the fact that people's tastes are not randomly distributed: there are general trends and patterns within the taste of a person and as well as between groups of people. Social Information filtering automates a process of "word-of-mouth" recommendations. A significant difference is that instead of having to ask a couple friends about a few items, a social information filtering system can consider thousands of other people, and consider thousands of different items, all happening autonomously and automatically (Shardanand and Maes, 1995).
Collaborative filters predict someone's personal preferences for information and/or products by keeping track of their likes and dislikes, and then connecting that information with a database of other peoples' preferences to look for matches, making predictions based on such things as purchases.
For example, Amazon uses an automated collaborative filtering (ACF) system to make predictions and offer advice. People can help each other find things and choose products. Systems such as the one Amazon uses do not require the loss of privacy or even any input from users. The collaborative recommendation system is based on monitoring and is based on the assumption that similar items will interest similar consumers, with tastes in common.
ACF systems have successfully been used when there is low risk involved in a bad choice, such as buying a book or a CD. However, when the risk is greater and choices involve larger amounts of money, for instance, what happens to trust in such a system? Who can you consult with to ask some of the questions that need answers before a high-risk choice can be made? Can people find explanations?
Although collaborative filtering technologies attempt to solve many problems, there are inherent problems that require solutions in order for them to become more useful. Some of these problems include:
1. The amount of time a user must spend rating various products (Procter). In a Feed magazine interview, Cory Doctorow also points this out.
In the Ringo world, and I loved Ringo -- I mean half the music in my collection came out of Ringo, but only because I'm a really, really dedicated info whore, an obsessive compulsive manic hand-washer when it comes to finding new music. And, in the Ringo world, what you had to do was remember all the bands that you liked, remember how you felt about a bunch of things. Tell the software about that, wait for the software to bring you back a list of recommendations, go out and buy the music, remember how you feel about the music, and then tell the software about it.
2. The difficulties involved with the development of reputations and trust amongst users (Procter).
3. The issue of the 'cold start problem,' means that a system can only generate good recommendations after a certain numerical threshold of ratings has been reached (Guo).
4. The possibility that unreliable reviews may be given by some of the participants.
In Augmenting Information Seeking on the World Wide Web Using Collaborative Filtering Techniques, Don Turnbull states that collaborative filtering of information on the Internet has roots that predate the World Wide Web of today. Early filtering technologies were directed at enabling more efficient use of USENET and email.
1. The developers of Tapestry were the first to use the term 'collaborative filtering' as a way to gather qualitative data. It was developed at Xerox PARC as a way to handle the large amounts of email and messages posted to newsgroups. Integral to the design and concept behind was that by using humans the filtering process would be more effective.
2. Another of the early filtering applications that set the standard for those to come was Lotus Notes. Although used predominantly in corporations as groupware, it also has collaborative filtering mechanisms built in.
3. In 1992, Paul Resnick and associates at the University of Minnesota started the GroupLens project to examine automated collaborative filtering systems, and eventually created a Usenet news client that allowed the readers to rate each other's messages with a simple interface and make the ratings available upon request. GroupLens was important because it built upon the Tapestry concept, but incorporated a distributed network, which as a scalable architecture allowed new clients and servers to join in and provide additional capabilities. It also had an improved query engine that could compare ratings, make suggestions according to the ratings of others, collect each users queries, allowed for privacy through anonymity, and could make predictions, with the scalable architecture allowing for scalable prediction capabilities. Its development continues at the University of Minnesota.
With the advent of the World Wide Web, new collaborative filtering technologies were needed to sift through the mountains of information.
1. Mosaic, the first graphical Web browser developed at the University of Illinois-Champaign, facilitated collaboration by allowing users to publish additional information to Web pages as comments and notes.
2. Helpful Online Music Recommendations (HOMR) - "HOMR automates the word of mouth process, learning about the user and his/her opinions, and leveraging that information to best serve the users needs. It is the predecessor of the Firefly technology." In 1995, MIT developed the AI technology to allow music lovers to share information with each other about the bands they liked. In 1995, the company incorporated as Agents Inc.
3. Ringo was a social information filtering system that allowed users to make recommendations for music. Through computer analysis, the information would be matched against the recommendations of others, and then present the user with a list of music that he or she might like.
4. Firefly - Growing out of the Ringo project at MIT in 1996, the Firefly technology was a collaborative filter that anonymously gathered user preferences, and used the information to suggest web sites they might enjoy visiting and brought together people with similar tastes in music and movies. Firefly was important because its technology had privacy protection built right in and offered users some control over the amount of personal information that was submitted to web sites. Firefly was bought by Microsoft in 1998 to eventually become Microsoft Passport.
5. Yahoo!, started by Princeton students David Filo and Jerry Yang, was an early Web link directory. Its classification system became the dominant standard and utilized experts, both humans and computer technology. Yahoo! takes the contributions of its users and filters them through their team of librarians and other experts in order to make better quality information more easily accessible.
6. Point's Top 5% was a New York City-based company that was the first to qualitatively rate web sites. Reviewers surfed the Net and critiqued links that were sent in by people in an effort to find the top 5% or best web sites.
7. PHOAKS (People Helping One Another Know Stuff) -- parsed through Usenet groups searching for URLs, and would then add the most posted URLs to a web site as a collection of what were the most popular sites based upon the number of times they appeared in messages.
8. Fab, designed by Marko Balabanovic, was another of the earlier web site recommendation systems. It allowed users to create filters that were content-based filters, comparing the information to find out the best matches for collaborative recommendations.
9. Webdoggie (1994-1995) was an early collaborative filtering system that helped people find web sites, according to their interests.
10. Alexa Internet, started in 1996 by Brewster Kahle and Bruce Gilliat, is a paradigm of Web navigation that utilizes the pool of peoples choices who have and are using it to share those choices with others. When a surfer is visiting a given web site, it provides a list of web sites that have been visited by others who have gone to the same page.
The next step up for collaborative filtering may come from OpenCola, with their application called OpenFolders. Its importance lies in the fact that it attempts to discover what users might be interested in knowing about based on previously disclosed preferences. Specifically, OpenFolders is a method of sharing documents by placing a folder upon the user's desktop where they can put items of interest. Based upon those items, the application places items from the network in the folder that might also be of interest to the user. The program not only shares resources between like-minded individuals based upon common tastes, but also, and perhaps more importantly, by observing what is then done with the delivered data. These reactions are compiled and used to create added relevance filters, which aid in finding how and where the individual users agree and disagree in order to clarify the profile of each user, thus greatly increasing the precision of documents delivered.
Creator, Cory Doctorow, in his Feed magazine interview with Steven Johnson entitled, The Taste Test, says "the things you don't know you don't know is a much more interesting domain." For example, Sally and Jane both enjoy herbal horticulture, yoga and collect antique tea pots, so they do have interests in common about which they can share information. Sally is also quite knowledgeable about Feng Shui and is a member of the Feng Shui Society. Jane might also like to learn about Feng Shui since both Sally and her have so many other shared interests.
Related Papers
Short History of Barcodes