An assessment of Zero-Shot Open Book Question Answering using Large Language Models

An Evaluation of Zero-Shot Open Book Question Answering with Large Language Models

In recent years, the popularity of publicly available generative Artificial Intelligence (AI) systems has exploded. A notable development in this area is the unprecedented adoption of chatbots, which have proven their value in both personal and commercial use cases. The main driver behind this growing interest in Natural Language Processing (NLP) systems is the development of Large Language Models (LLMs). These models have demonstrated impressive capabilities in generalization, reasoning, problem solving, abstract thinking, and understanding complex concepts.

Social and economic implications of generative AI

The rise of generative AI systems has profound implications for everyday life and organizational practices. These developments carry significant social and economic implications. It is estimated that approximately 19% of jobs in the United States have about 50% of their tasks exposed to LLMs, when current model capabilities and expected LLM-driven software are taken into account. The increasing demand for AI solutions underscores the need for robust and accurate systems that can handle complex questions while providing precise and explainable answers.

Evolution of Question Answering systems

Historically, Question Answering systems were primarily extractive in nature, focused on identifying specific information from supplied texts. In recent years, however, the NLP domain has shifted toward more abstract Natural Language Generation (NLG) approaches. NLG refers to the generation of natural language to achieve specific communicative goals. This output can range from a short answer to extensive explanations covering multiple sentences or even pages.

Developments in model architectures

This paradigm shift has been made possible by advances in model architectures, such as the Text-To-Text Transfer Transformer (T5) and the Generative Pre-Trained Transformer (GPT). These models have opened up new possibilities for exploring the capabilities of language models within a wide range of applications.

Evaluation of state-of-the-art language models

This study examined the performance of state-of-the-art language models within a Zero-Shot Open Domain QA setting, with a specific focus on technical topics related to cloud computing and containerization. Both extractive and generative approaches were evaluated using a uniform methodology. The goal was to gain insight into the strengths and weaknesses of different models and to assess their suitability for answering technical questions.

Although some LLMs can be used for a wide range of tasks outside of QA, these applications fell outside the scope of this study. In addition, due to resource and practical limitations, the focus was placed on models with fewer than 10 billion parameters. Multi-modal Machine Reading Comprehension (MRC) approaches, such as tri-encoder retrievers and table readers, were also excluded.

Research contribution

The research results provide valuable insights into model performance within technical QA contexts and contribute to a better understanding of Zero-Shot QA architectures. The accompanying code is available on GitHub under the CC-BY 4.0 license.

Download
Privacy overview
This website uses cookies. We use cookies to ensure that our website and services function properly, to gain insight into the use of our website, and to improve our products and marketing. For more information, please read our privacy and cookie policy.