[go: up one dir, main page]

John Conroy

Also published as: John M. Conroy


2023

pdf bib
Multi-domain Summarization from Leaderboards to Practice: Re-examining Automatic and Human Evaluation
David Demeter | Oshin Agarwal | Simon Ben Igeri | Marko Sterbentz | Neil Molino | John Conroy | Ani Nenkova
Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

Existing literature does not give much guidance on how to build the best possible multi-domain summarization model from existing components. We present an extensive evaluation of popular pre-trained models on a wide range of datasets to inform the selection of both the model and the training data for robust summarization across several domains. We find that fine-tuned BART performs better than T5 and PEGASUS, both on in-domain and out-of-domain data, regardless of the dataset used for fine-tuning. While BART has the best performance, it does vary considerably across domains. A multi-domain summarizer that works well for all domains can be built by simply fine-tuning on diverse domains. It even performs better than an in-domain summarizer, even when using fewer total training examples. While the success of such a multi-domain summarization model is clear through automatic evaluation, by conducting a human evaluation, we find that there are variations that can not be captured by any of the automatic evaluation metrics and thus not reflected in standard leaderboards. Furthermore, we find that conducting reliable human evaluation can be complex as well. Even experienced summarization researchers can be inconsistent with one another in their assessment of the quality of a summary, and also with themselves when re-annotating the same summary. The findings of our study are two-fold. First, BART fine-tuned on heterogeneous domains is a great multi-domain summarizer for practical purposes. At the same time, we need to re-examine not just automatic evaluation metrics but also human evaluation methods to responsibly measure progress in summarization.

2019

pdf bib
RANLP 2019 Multilingual Headline Generation Task Overview
Marina Litvak | John M. Conroy | Peter A. Rankel
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources

The objective of the 2019 RANLP Multilingual Headline Generation (HG) Task is to explore some of the challenges highlighted by current state of the art approaches on creating informative headlines to news articles: non-descriptive headlines, out-of-domain training data, generating headlines from long documents which are not well represented by the head heuristic, and dealing with multilingual domain. This tasks makes available a large set of training data for headline generation and provides an evaluation methods for the task. Our data sets are drawn from Wikinews as well as Wikipedia. Participants were required to generate headlines for at least 3 languages, which were evaluated via automatic methods. A key aspect of the task is multilinguality. The task measures the performance of multilingual headline generation systems using the Wikipedia and Wikinews articles in multiple languages. The objective is to assess the performance of automatic headline generation techniques on text documents covering a diverse range of languages and topics outside the news domain.

2017

pdf bib
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres
George Giannakopoulos | Elena Lloret | John M. Conroy | Josef Steinberger | Marina Litvak | Peter Rankel | Benoit Favre
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

pdf bib
MultiLing 2017 Overview
George Giannakopoulos | John Conroy | Jeff Kubina | Peter A. Rankel | Elena Lloret | Josef Steinberger | Marina Litvak | Benoit Favre
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

In this brief report we present an overview of the MultiLing 2017 effort and workshop, as implemented within EACL 2017. MultiLing is a community-driven initiative that pushes the state-of-the-art in Automatic Summarization by providing data sets and fostering further research and development of summarization systems. This year the scope of the workshop was widened, bringing together researchers that work on summarization across sources, languages and genres. We summarize the main tasks planned and implemented this year, the contributions received, and we also provide insights on next steps.

2015

pdf bib
Vector Space Models for Scientific Document Summarization
John Conroy | Sashka Davis
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

pdf bib
MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations
George Giannakopoulos | Jeff Kubina | John Conroy | Josef Steinberger | Benoit Favre | Mijail Kabadjov | Udo Kruschwitz | Massimo Poesio
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization
Kai Hong | John Conroy | Benoit Favre | Alex Kulesza | Hui Lin | Ani Nenkova
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In the period since 2004, many novel sophisticated approaches for generic multi-document summarization have been developed. Intuitive simple approaches have also been shown to perform unexpectedly well for the task. Yet it is practically impossible to compare the existing approaches directly, because systems have been evaluated on different datasets, with different evaluation measures, against different sets of comparison systems. Here we present a corpus of summaries produced by several state-of-the-art extractive summarization systems or by popular baseline systems. The inputs come from the 2004 DUC evaluation, the latest year in which generic summarization was addressed in a shared task. We use the same settings for ROUGE automatic evaluation to compare the systems directly and analyze the statistical significance of the differences in performance. We show that in terms of average scores the state-of-the-art systems appear similar but that in fact they produce very different summaries. Our corpus will facilitate future research on generic summarization and motivates the need for development of more sensitive evaluation measures and for approaches to system combination in summarization.

2013

pdf bib
A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
Peter A. Rankel | John M. Conroy | Hoa Trang Dang | Ani Nenkova
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
ACL 2013 MultiLing Pilot Overview
Jeff Kubina | John Conroy | Judith Schlesinger
Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization

pdf bib
Multilingual Summarization: Dimensionality Reduction and a Step Towards Optimal Term Coverage
John Conroy | Sashka T. Davis | Jeff Kubina | Yi-Kai Liu | Dianne P. O’Leary | Judith D. Schlesinger
Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization

2012

pdf bib
Assessing the Effect of Inconsistent Assessors on Summarization Evaluation
Karolina Owczarzak | Peter A. Rankel | Hoa Trang Dang | John M. Conroy
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization
John M. Conroy | Hoa Trang Dang | Ani Nenkova | Karolina Owczarzak
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization

pdf bib
An Assessment of the Accuracy of Automatic Evaluation in Summarization
Karolina Owczarzak | John M. Conroy | Hoa Trang Dang | Ani Nenkova
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization

2011

pdf bib
Squibs: Nouveau-ROUGE: A Novelty Metric for Update Summarization
John M. Conroy | Judith D. Schlesinger | Dianne P. O’Leary
Computational Linguistics, Volume 37, Issue 1 - March 2011

pdf bib
Ranking Human and Machine Summarization Systems
Peter Rankel | John Conroy | Eric Slud | Dianne O’Leary
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Mind the Gap: Dangers of Divorcing Evaluations of Summary Content from Linguistic Quality
John M. Conroy | Hoa Trang Dang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Measuring Variability in Sentence Ordering for News Summarization
Nitin Madnani | Rebecca Passonneau | Necip Fazil Ayan | John Conroy | Bonnie Dorr | Judith Klavans | Dianne O’Leary | Judith Schlesinger
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)

2006

pdf bib
Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score
John M. Conroy | Judith D. Schlesinger | Dianne P. O’Leary
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2003

pdf bib
QCS: A Tool for Querying, Clustering, and Summarizing Documents
Daniel M. Dunlavy | John Conroy | Dianne P. O’Leary
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations