Towards Accurate Quote-Aware Summarization of News using Generative AI
Attribution is a fundamental principle of journalism. Correctly quoting a news source without distorting the sense of what was stated, or worse, adding information the journalist inferred, is an essential skill for any reporter. Even just recognizing a quote can be a struggle for algorithms. And large language models (LLMs) introduce a new challenge: they can potentially make up quotes or misattribute accurate quotes to the wrong sources. This is because of the way they work by predicting the next most likely word in a sequence based on the previous text. These kinds of errors could potentially erode trust in the media and should be avoided.At IPPEN.MEDIA, we’ve been experimenting with numerous use cases for Large Language Models (LLMs). Some of them include suggesting headlines and lead variations, as well as summarizing or rewriting an article to target different audiences. When it comes to handling quotes, things can easily go wrong in generating summaries or text variations. During our first round of testing, we discovered that ChatGPT tends to rewrite quotations, even when explicitly instructed not to. When we tried to summarize an article while keeping all the quotes unchanged by adding specific constraints to the prompt, ChatGPT just ignored those constraints and rewrote the quotes. Worse, while the prompts sometimes worked as expected and all the quotes were reproduced correctly, most of the time they were not.
0 Comments