Global Sources
EE Times-India
 
EE Times-India > EDA/IP
 
 
EDA/IP  

IIIT proves its mettle in automatic summarization

Posted: 09 Jun 2006     Print Version  Bookmark and Share

Keywords:International Institute of Information Technology  Document Understanding Conferences  Prachi Garg  summarisation contest  Advanced Research and Development Activity 

The Search and Information Extraction Lab, International Institute of Information Technology (IIIT), Hyderabad, topped this year's Document Understanding Conferences (DUC) contest, the most prestigious contest in this area.

About 35 teams from all over the world participated in this year's contest, with the IIIT team being the only one from India. The team comprised of J. Jagadeesh, an MS research student; Prasad Pingali, a Ph.D. student; and Dr. Vasudeva Varma, a faculty member.

Every year, DUC holds this summarisation contest, with world-class universities and organisations participating. The contest is sponsored by the Advanced Research and Development Activity (ARDA), and the conference series is run by the National Institute of Standards and Technology (NIST). These organisations aim to propagate advances in summarisation techniques, and enable researchers to participate in large-scale experiments. NIST defines the system tasks and evaluation criteria each year, and teams are invited to participate.

The task for the contest this year was to automatically generate a summary from multiple documents for a set of questions expressing the information need of a user.

Dr. Vasudeva Varma explained to EE Times India how IIIT's summarisation system works. "Our system automatically builds a semantic model by analysing the way various words occur in documents. For instance, the model makes it possible to identify related terms such as 'horse' and 'animal.' Such a semantic model is then used to rank sentences from documents, and pick the top ranking sentences to form a summary. Then, there could be further steps such as smoothing the summary and eliminating redundancy," he said.

Summarisation systems are evaluated using an automatic evaluation technique known as ROUGE, which is considered to be the best automatic way of comparing a human summary with an automatic summary.

Dr. Varma discussed how the ROUGE method evaluates automatic summarisation techniques. "A set of persons are asked to write summaries for each set of documents. The machine-generated summaries are then automatically compared using the ROUGE framework."

There are various types of measures within ROUGE depending on the way comparisons are made. "ROUGE-1, ROUGE-2, ROUGE-3 and ROUGE-4 measure how many n-grams match between machine-generated summary and human summary, with 'n' ranging from 1 to 4. An n-gram is a sequence of consecutive 'n' words glued together as a single unit. Then, there are other types of ROUGE measures such as ROUGE-SU4, ROUGE-L and ROUGE-W, which are all different ways of comparing the human summaries with the automatically generated summaries," Dr. Varma shared.

The system from IIIT topped in the ROUGE metrics with a significant margin from the runner up. This was a significant improvement in its performance as compared to the previous year, when it had first participated in the contest. Last year, IIIT placed 3rd, 4th and 8th in ROUGE-1, ROUGE-2 and ROUGE-SU4, respectively.

"This year, we used a technique called pseudo-relevance feedback which is basically a mining technique using web search engines," Dr. Varma revealed.

IIIT's winning team now wants to concentrate on improving the linguistic quality of their summaries, which are manually evaluated, and also wants to perform as close to human summarisers as possible.

- Prachi Garg
&nbps;&nbps;EE Times India




Comment on "IIIT proves its mettle in automatic ..."
Comments:  
*  You can enter [0] more charecters.
*Verify code:
 
 
Webinars

Seminars

Visit Asia Webinars to learn about the latest in technology and get practical design tips.

 

Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

 
Back to Top