Deima Elnatour

Sunday, August 20, 2006

Live at KDD 2006!

The 12th international conference on knowledge discovery and data mining (KDD 2006) is taking place in Philadelphia from August 20th to 23rd. My advisor Tony (Xiahua) Hu is the chair of local arrangements committee so I am part of the student volunteer group that is comprised of PhD students from UPenn, Temple and Drexel. We have about 750 attendees total, which is a good size for a specialized conference. I will have light attendance this year. But now that I have the proceedings book I can look for interesting papers and discuss with colleagues. Today I spent the day in a workshop that addresses knowledge discovery on the web. There were some very interesting papers that I will be discussing later on my blog.

Friday, August 18, 2006

Hooked on Sudoku!


Sudoku is a really cool game and I am totally into it now. I have to play it at least once every day. I am in the habit now to start a puzzle and solve it before bed every night. I did not know much about it few months ago and some friends at school got me started and now I can not stop. It is exciting and makes your mind creative in coming up with new strategies to solve the puzzle. Anyhow, I would like to build a java program that people could invoke to provide them the next step when they are stuck. I do get stuck sometimes! I will load the program when it is ready. Schools everywhere in the US use Sudoku as part of the science curriculum to stimulate thinking. I had a good time playing it with my 12-year old nephew.

I highly recommend giving it a try if you have not done so already. Have fun :)

You Tube is Now Promoting Music Video Clips




You Tube serves videos to 60% of the Internet users that are interested in videos. Now they are in communications with all the major television network and music artists to serve short music clips as promotional items. Here is a video clip of what CEO and co-founder Chad Hurley had to say about this.

Thursday, August 17, 2006

Google Chinese Style!

Google and China.com have joint forces through a partnership agreement. This partnership will allow the two companies to leverage resources across advertising, search technologies, branding and content. China.com is one of the most visited portals by Chinese professionals. "China.com firmly believes that this partnership between the world's premier search company and one of China's leading portals with over 5 million daily users is a perfect fit," said Xiaowei Chen, Executive Director and CFO of China.com.
Click here to access the full story.

Wednesday, August 09, 2006

When Sharing is Bad!

AOL released search query data of 500,000 customers from March to May 2006 without customers’ consent. The purpose was to provide this real data to the Information Retrieval research community for examination and discovery purposes. Screen id were not released and replaced by a unique id. As you can imagine, so many people objected to this and considered it a breach of privacy forcing AOL to unexposed the data file, which they did. Get more info at digg where 2,512 pepole so far responded to this story. However, some believe that the data currently is in the hands of about 1,000 people. This means that the data can not be re-claimed and will be further disseminated. This is the magical power of the web!

Tuesday, August 08, 2006

The Future of Indexing

Document indexing is a key to good search. The better the index is, the more relevant the search results are likely to be. However, indices are traditionally viewed as a static vocabulary structure that represents a document collection. There used to be human involvement in indexing, which did not work and had many limitations for obvious reasons. With massive amounts of data finding its way to the web the need for automatic indexing became pressing. So nowadays, most indices are generated automatically by machines. However, IR community still views indices if a collection as a static set. I believe that indices are properties that evolve and grow over time due to the social construct of collection usage. If you are not sure what I mean then do this: go to google and type miserable failure and see what comes up. Then ask yourself the question of how did this actually happen.

I believe that indices must be obtained in real time and they need to be built with a dynamic notion that allows them to grow and change over time. This is in deed the reason why my research is focused on discovering human indices or what you know as tags which is also called Folksonomy (taxanomy by folks or people). I believe that this new construct is the one foundation for quality searches in the future.

Topical Relevance – Is It a Good Idea?

Topical relevance has increasingly become the focus of researcher over the past 5 years. The idea of topical relevance is quite simple. A search engine results would try to match the topic of the query thus returning better (more relevant) results. Topical relevance is one way to try to capture context of a search request.

For example, if I am looking for info about mouse, this could be taken as mouse the animal or mouse as in a computer peripheral. So in this case the search engine would try to use other query words to detect the topic (animals vs. computers) then render results in that domain. If the query is too short and does not provide hints on what the topic is, we could provide a shortcut menu as navigational hints. But let’s assume that this is not the case.

Topical relevance could be a promising notion if a number of assumptions have been met. If such assumptions are not satisfied, search results could be far from good or relevant. One of these assumptions is that the words mean what they are supposed to mean. If you type tomatoes in Google you will get Rotten Tomatoes as first link. Rotten Tomatoes is an entertainment site – a well-known one. So one of the researchers was arguing that this was a bad thing and that we should find ways to not match an entertainment site to the word tomatoes. Should we? In this case, do we not match the work Amazon to Amazon.com?

Basically, there is no silver bullet to when topical relevance can be useful in search – I recommend using caution when you are thinking of topical relevance. Topical relevance works in well-defined domains and structured search environments …etc. Topical relevance is not always appropriate in general search settings!

SIGIR Pics

A picture is worth a million words! Click here to see SIGIR 2006 pics on Flickr including pics of CJ van Rijbergan 2006 Salton Award Winner - Keith is considered on of the most influential scholars in information retrieval over the past 30 years.

Monday, August 07, 2006

The New Face of Advertising – Economics of the Web!

Yahoo vs. Google’s ad pricing models.

Google prices the adds based on revenue that the add is expected to generate – this is called action based pricing or transactional based model. Yahoo, on the other hand, prices the ads based on the highest bid for each slot location on the sponsored-link banner. Consequently Google has been reporting ad revenue of approximately 40% higher than Yahoo’s, experts say. Yahoo announced early last week that they will move to the Google’s model of ad pricing which is expected to boost Yahoo’s future ad revenue. Well … shortly after this announcement, Yahoo’s made an announcement delaying the usage of this new model for three months causing their stock to drop 22%!

Yahoo researchers are not sure how will they users react to this and are worried that their existing advertisers will go to Google. They are re-thinking what to do. I guess it will take some more studies and analysis to figure out Yahoo’s next steps.

Future of Filght and Personalized Search

Boeing sponsored a reception dinner at the Flight of Future Aviation center in Everett, which is about 35 miles north from the University of Washington district. The event was comprised of a nice size crown and we were surrounded by Boeing planes – next generation of course. I am not sure if you knew that Boeing also makes roller coasters. I was hoping to see something about the future of thrill..

In any event, personalization became a discussion of a small group during reception when someone walked up to me and pointed to a colleague from Carnegie Mellon University and appraised the work he’s done with personalization of search. So I started inquiring about the work and the here is what I found: researchers are focused on personalizing search and trying to find a way to provide relevant search results to each user in an automated fashion and without having to maintain user profiles! So the current ideas are focused on automated desktop indexing. I mean indexing of email, docs, pictures, visited online sites ...etc. Google desktop does this today and this free product has been out for a at least couple of years. But how would this local index be used is where most of the debate is. You can simply imagine that we need to create a pretty complex multi-dimensional search algorithm that makes good use of local or “client” index. Some believe that the search results need to be similar or relevant to what you find in the user’s index while others believe that in some cases we need to do the opposite. I say you can take a guess but there is no way to tell right from wrong so you simple do not know what to do with that index. For example if someone is in the process of building a custom home and has been searching for tile, cabinets, wood …etc. The user have probably visited the builder sites and online forums as well as have emails about stuff related to this new home. Let’s say the user one day types “Windows” in the search box. Would we provide house windows, provides, prices …etc. Or would we provide Windows the OS? It depends you can make an assumption that the user wants windows for the new house or having a windows problem. We really do not know what the user wants. We need to be careful of holding the user captive to their own index. To solve this problem, researchers are thinking of providing cluster of topics in this case – more like navigational items that will allow the user to decide what he/she wants to go.

This yet does not address problems like size of index, machines with multiple users, or users with multiple machines. A good usage of user’s local index would be to eliminate duplicated and ensure continuity of search results. Think about it and let me know how you can utilize this index – most importantly is this the best way to achieve personalized search?

Sunday, August 06, 2006

The New Norm

The web has become the new norm in the American way of life. Those that do not go online represent an ever shrinking population. Studies show that 77 million users in the US serf the web daily. It is fascinating to see how the paradigm of search box has penetrated its way to the public unconsciousness – especially when we compare that to ever flashing clock on DVD and VCR machines (users could not easily figure out how to set the time despite the simplicity of such task).

The Pew Internet and American Life studies the social impact of the Internet. Here is a recent report that provides important data.

Why is this important? This emphasizes the importance of online presence. Weather you are a large organization or a small specialized business – online presence is the name of the game to survival and profitability.

Live at SIGIR 2006

I am now in Seattle Washington attending the 29th annual SIGIR conference. SIGIR is an important international conference that connects industry practitioners with academic researchers around topics in the broad field of information retrieval or search. The conference is taking place on the University of Washington campus, August 6th to 11th. I will be spending the week in collaborations and stimulating discussions about search. My goal is to detect the next big thing in search and learn about what is in the innovation pipeline of the field of information retrieval and representation.

Good news is that everyone is here; Yahoo, Google, Microsoft, Ask.com, IBM Research, Amazon.com, AOL, Boeing …etc. In addition, most universities that care about IR are also well represented in this conference for obvious reasons.

Stay tuned to hear more about what I find!