An assessment of Zero-Shot Open Book Question Answering using Large Language Models

Recently, there has been an explosive increase in the popularity of publicly available generative Artificial Intelligence (AI) systems. One notable trend has been the unprecedented adoption of “chatbots,” which have proven to be highly useful for both personal and commercial use cases. The primary driver behind the popularity and attention garnered by Natural Language Processing (NLP) systems in recent years is the development of Large Language Models (LLMs). LLMs have demonstrated remarkable capabilities, including generalization, reasoning, problem-solving, abstract thinking, and comprehension of complex ideas.

The rise of these generative AI systems has caused significant disruptions in everyday life and organizational practices. Such advancements carry considerable social and economic implications, as approximately 19% of jobs in the United States have around 50% of their tasks exposed to LLMs when factoring in current model capabilities and anticipated LLM-powered software [3]. The growing demand for AI systems underscores the need for robust and accurate solutions capable of handling complex questions while providing precise and explainable answers.

Historically, Question Answering (QA) systems have primarily been extractive, focusing on identifying specific pieces of information from given texts. However, in recent years, the NLP paradigm has shifted towards the more abstract approach of Natural Language Generation (NLG). NLG refers to the process of generating natural language text to meet specific communicative goals. This output may range from a concise phrase in response to a question to extended explanations spanning multiple sentences or even pages.

This shift has been driven by advancements in model architectures, such as the Text-To-Text Transfer Transformer (T5) and the Generative Pre-Trained Transformer (GPT). These models have opened new avenues for exploring the capabilities of language models in various applications.

Evaluation of the Performance of State-of-the-Art Language Models

In our research, we explored and evaluated the performance of state-of-the-art language models in a Zero-Shot Open Domain Question Answering (QA) setting, particularly focusing on technical topics related to cloud technology and containerization. The investigation aimed to assess both extractive and generative approaches and evaluate model performance using a uniform methodology. We investigated the strengths and weaknesses of various models to provide insights into the suitability of Zero-Shot Open Domain QA architectures for answering technical questions. While some LLMs are capable of a broad range of tasks beyond QA, those tasks were excluded from the scope of this research. Additionally, due to resource and practical limitations, the study focused on models with fewer than 10 billion parameters. Multi-modal Machine Reading Comprehension (MRC) approaches, such as tri-encoder retrievers and table readers, were also excluded from consideration.

The research results offer valuable perspectives on model performance in technical QA contexts and contribute to a deeper understanding of Zero-Shot QA architectures. The associated code is available on GitHub under the CC-BY 4.0 license.

Stay up to date
By signing up for our newsletter you indicate that you have taken note of our privacy statement.
Nick Methorst

Let's talk!


* required

By sending this form you indicate that you have taken note of our privacy Statement.
Privacy Overview
This website uses cookies. We use cookies to ensure the proper functioning of our website and services, to analyze how visitors interact with us, and to improve our products and marketing strategies. For more information, please consult our privacy- en cookiebeleid.