Note: Submit requests for corpus access by Fri April 30, 2010 (midnight PDT)
The aims of the challenge are to encourage researchers and practitioners to build and demonstrate information access systems satisfying at least one of the following:
- Not only deliver relevant documents, but provide facilities for making meaning with those documents.
- Increase user responsibility as well as control; that is, the systems require and reward human effort.
- Offer the flexibility to adapt to user knowledge / sophistication / information need.
- Are engaging and fun to use.
There is no need to build a system for the challenge. Participants may also report on the use of existing systems. All entries must use the challenge data set. Interface sketches, mockups, wireframes, etc. are not permitted.
The data set used will use The New York Times (NYT) annotated corpus. The corpus is a collection of over 1.8 million articles annotated with rich metadata published by The NYT between January 1, 1987 and July 19, 2007. Use of The NYT corpus for the HCIR challenge is appealing for several reasons:
- The content is broadly accessible without any special domain expertise.
- The annotations are rich enough to support rich interactive approaches without requiring sophisticated information extraction techniques.
- The size of the collection is large enough to be interesting without being so large as to cause scale challenges.
The focus of the challenge is on the development or use of interactive techniques not on data wrangling. As such, we will index the collection and provide a baseline retrieval system. The NYT corpus is available to challenge participants free of charge from the Linguistic Data Consortium (LDC) upon receipt of signed forms. Free access to the corpus for challenge participants is being generously supported by The NYT. We are very grateful to the LDC for covering the cost of shipping The NYT corpus to challenge participants.
A baseline search system for The NYT corpus can be built using Solr. Solr scripts for building a searchable index of The NYT corpus are available here.
A pilot evaluation of the system is optional. To help compare systems, if you perform a pilot evaluation, you are requested to use some or all task scenarios from a set of historical exploration tasks based on the NYT corpus. The task descriptions are:
- Learn about a topic that has a long history:
- Draw a rough chart of how has subway crime in New York varied over the past two decades.
- Draw a rough chart of how the price of a slice of pizza in New York varied over the past two decades.
- Understand the competing perspectives on a controversial topic:
- Enumerate the main arguments that have been made for and against rent control in New York.
- Enumerate the main arguments that have been made for and against the impeachment of U.S. president Bill Clinton.
- Answer a question that requires looking at more than one document:
- Enumerate the major venues in New York City that offer free concerts.
- Determine if a member of the Communist party has ever held a legislative or executive post in New York State.
Those participating in the HCIR challenge will have an opportunity to write a four-page challenge report describing their work. Reports should be submitted through the EasyChair site by Mon July 12, 2010 (midnight PDT). Papers can report just on the system that was developed/used or can report on the system plus the findings of a pilot evaluation using the task scenarios defined above. All challenge papers will be included in the proceedings. A shortlist of challenge papers will be chosen for presentation based on the HCIR evaluation criteria defined below.
Entries will be judged on the following criteria that are important for HCIR systems:
- Effectiveness: Is a user able to complete the task?
- Efficiency: How efficiently does the user complete the task?
- Control: To what extent does the system give the user control over the information seeking process?
- Transparency: Does the user understand what the system is doing?
- Guidance: How much direction does the system provide to help the user refine their search strategy or reach their search goal?
- Fun: Is the system engaging and fun to use?
Entries will be judged on these criteria by an expert panel, as well as voted on more holistically by the general audience of workshop attendees. The panel will apply evaluation criteria and select a shortlist for presentation at the workshop. Panelists may also consider the findings of any pilot evaluations in their decisions too. A people's choice award will be presented at the workshop to the shortlisted system or interactive technique most favored by workshop attendees.
Obtaining The NYT Corpus
Information on obtaining the corpus is available from Daniel Tunkelang. Note that requests for corpus access should be submitted by Fri April 30, 2010 (midnight PDT). We will also create a discussion group for participants.
The organizers greatly appreciate the help of Evan Sandhaus and Tommy Chheng in volunteering their efforts to make the data available to HCIR Challenge participants.