Defended publicly on December 18, 2019 before the jury members:
- Mme. Violaine PRINCE. Professor, LIRMM, Montpellier 2 (Rapporteur)
- M. Eric GAUSSIER. Professor, LIG, Grenoble (Rapporteur)
- Mme. Fatiha SADAT. Professor, GDAC, Montréal (Examiner)
- M. Laurent BESACIER. Professor, LIG, Grenoble (Examiner)
- M. Kamel SMAILI. Professor, LORIA, CNRS-Lorraine-Inria (Examiner)
- M. Alfonso MEDINA URREA. Researcher, CELL, COLMEX (Examiner)
- M. Juan-Manuel TORRES-MORENO. Associate Professor HDR, LIA, Avignon (Advisor)
- M. Eric SANJUAN. Associate Professor HDR, LIA, Avignon (Co-Advisor)
As multimedia sources have become massively available online, helping users to understand the large amount of information they generate has become a major issue. One way to approach this is by summarizing multimedia content, thus generating abridged and informative versions of the original sources. This PhD thesis addresses the subject of text and audio-based multimedia summarization in a multilingual context. It has been conducted within the framework of the Access Multilingual Information opinionS (AMIS) CHISTERA-ANR project, whose main objective is to make information easy to understand for everybody.
Text-based multimedia summarization uses transcripts to produce summaries that may be presented either as text or in their original format. The transcription of multimedia sources can be done manually or automatically by an Automatic Speech Recognition (ASR) system. The transcripts produced using either method differ from wellformed written language given their source is mostly spoken language. In addition, ASR transcripts lack syntactic information. For example, capital letters and punctuation marks are unavailable, which means sentences are nonexistent. To deal with this problem, we propose a Sentence Boundary Detection (SBD) method for ASR transcripts which uses textual features to separate the Semantic Units (SUs) within an automatic transcript in a multilingual context. Our approach, based on subword-level information vectors and Convolutional Neural Networks (CNNs), overperforms baselines by correctly identifying SU borders for French, English and Modern Standard Arabic (MSA). We then study the impact of cross-domain datasets over MSA, showing that tuning a model that was originally trained with a big out-of-domain dataset with a small in-domain dataset normally improves SBD performance. Finally, we extend ARTEX, a state-of-the-art extractive text summarization method, to process documents in MSA by adapting preprocessing modules. The resulting summaries can be presented as plain text or in their original multimedia format by aligning the selected SUs.
Concerning audio-based summarization, we introduce an extractive method which represents the informativeness of the source based on its audio features to select the segments that are most pertinent to the summary. During the training phase, our method uses available transcripts of the audio documents to create an informativeness model which maps a set of audio features with a divergence value. Subsequently, when summarizing new audio documents, transcripts are not needed anymore. Results over a multi-evaluator scheme show that our approach provides understandable and informative summaries.
We also deal with the field of evaluation measures. We have developed Window-based Sentence Boundary Evaluation (WiSeBE), a semi-supervised metric based on multi-reference (dis)agreement, which examines whether evaluating an automatic SBD system based on a single reference is enough to assess how well the system is performing. We also explore the possibility of measuring the quality of an automatic transcript based on its informativeness. In addition, we study the extent to which automatic summarization may compensate for the problems raised during the transcription phase. Lastly, we study how text informativeness evaluation measures may be extended to passage interestingness evaluation.