Get in touch

We use HubSpot CRM to process and manage contact and information requests. Please accept the "Functional Cookies" and reload the page to load the contact form.

Insights /
Blog /
Tech

Insights / Blog / Tech

Generative KI and Large Language Models (LLMs) – A Glimpse into the Future

December 06, 2023

Artificial intelligence has increasingly been making inroads into companies and their IT projects in recent years. Especially since last year, companies like OpenAI and services like ChatGPT in particular have triggered a lot of hype around the possibilities of LLMs (large language models) and generative AI. Many companies are rightly wondering about how they can best deploy the new possibilities, while taking risk-related topics such as data and privacy protection into account at the same time. These technologies offer innovative solutions to many business challenges, from automating complex tasks to improving interactions with customers.

In this article, we will provide an overview of the transformative potential of generative AI, as well as potential use cases. In addition, we will present an example to show how language models can be used to generate new applications and added value from in-house information and knowledge bases.

AI chatbot — Figure 1: Screenshot of an example application that offers chat-based access to company knowledge as an alternative to the classic search function.

Generative AI and Large Language Models in a Nutshell

Generative artificial intelligence (AI) is a type of artificial intelligence that is capable of generating new content such as texts, images, videos, and music whose quality can hardly be distinguished from content created by humans (or not at all).

One sub-area of generative AI that has made astounding progress, particularly in recent years, involves LLMs – large language models that form the foundation of well-known applications like ChatGPT, Google Bard, and Midjourney. Originally developed to generate text-based content, LLMs have now become capable of handling a variety of tasks in a wide range of areas:

Customer service and communication: development of chatbots and virtual assistants to answer customer inquiries and provide support
Content creation: generation of texts for articles, reports, product descriptions, and other texts
Summarizing texts
Translating text into different languages
Generating and suggesting software code
Creating image/video/audio content
Compliance and legal affairs: analysis and classification of legal documents, contracts, and legal advice
Sentiment analysis: identifying the emotional tone of texts
Healthcare: analysis of medical texts, patient files, and to support diagnoses...

How do Large Language Models Work?

Technically speaking, a large language model is a complex neural network that is trained on a huge amount of data (books, articles, websites, and the like) to understand meanings, concepts, and interconnections in natural language. This trained knowledge is coded in the form of parameters, which the model uses to generate answers to user queries that have a certain probability of being correct. The “size” of a language model refers to the number of parameters, or elements, in the neural network and thus its fundamental ability to learn complex patterns. The architecture of OpenAI’s GPT-4, for instance, uses around 1.76 trillion parameters. The more parameters a model has, the more computing power is needed. In addition to the number of parameters and the training data, there are many other properties that can be used to assess a model. We will examine a few of them in more detail further below.

Queries are submitted to the model in the form of prompts – a question, the beginning of a sentence, or additional context that serves as the initial text for which the model generates the answer. It is important to note, however, that language models can also deliver inaccurate or imprecise results, which are called “hallucinations”. One reason why hallucinations can occur is that the model simply does not know the answer to a question. ChatGPT 3.5 was trained on data collected up to January 2022, for example, which is why the chatbot cannot answer questions that go beyond this cutoff date. Another reason, already mentioned above, is that the answers generated by an LLM are based on probabilities (and, of course, on the quality of the underlying training data). This can result in the generation of factually wrong or even fictitious content. However, hallucinations can be desirable in the context of creative answers, since the model is prompted to produce novel ideas and concepts that are not actually present in the training data.

Important criteria for large language models include:

Model size (number of parameters): This is a key characteristic of LLMs. As mentioned above, the number of parameters determines the capacity of the model for understanding complex patterns and interconnections. Models with more parameters are typically more powerful, but also need more computing capacity for training and use. What’s more, smaller models that are trained for specific tasks or with specific domain knowledge may return better results.
Context window size: The context window size refers to the number of words or partial words (called “tokens”) that the model can process in a single query. A larger context window size enables the model to take longer text passages into account and improves both the understanding of interconnections and the coherence of answers. GTP-4, for instance, has a context window size of 32,768 tokens.
Efficiency – especially RAM requirements: Larger models need more memory in order to save and process the model parameters. This can affect the deployment and scalability of a model.
Data quality and diversity: The type and spectrum of the training data determine how well the model is capable of understanding different language styles, topics, and contexts.
Training method and intended use: A number of models are trained and fine-tuned for specific intended uses. For example, there are models that are optimized for image generation, voice recognition, question-and-answer scenarios, and so on.

Closed-Source vs. Open-Source LLMs

The main differences between closed-source and open-source language models are their accessibility and potential uses. Closed-source language models are typically developed by large tech companies or research institutions and usually operated commercially, which means users must pay license fees, subscription fees, or the costs of API accesses. The source code is not made public in these models, so users and developers cannot see how the model is structured, which algorithms are used internally, or how the model was trained. Accordingly, options to customize the models for proprietary applications are often limited. Examples of well-known closed-source language models include ChatGPT and GPT-4 from OpenAI, Google Bard, and Anthropic’s Claude.

In contrast, open-source models are often available freely and permit developers to customize or enhance the model in order to implement their own specific solutions based on the new foundation. They are usually free of charge and often have license-friendly terms that allow widespread use. It is important to note, however, that operating a language model can incur significant costs for the IT infrastructure (depending on the size of the model and the number of accesses per day), regardless of whether it is run on premise or uses cloud services. Well-known examples of (commercially usable) open-source models include Meta’s LLama 2, Falcon from TII UAE, MosaicML MPT, and Mistral7B from Mistral AI.

It is impossible to say which model is the “best” in general, because that depends on the specific use case at hand. Based on factors such as the suitability, performance, and adaptability of the model; operating and training costs; the desired level of data security; and other factors, potential users must decide whether they want to run a proprietary model or turn to a commercial solution.

Model Customization and Retraining

How can existing language models be customized for a company’s in-house applications? In the next section, let’s examine the different training options for LLMs.

How can language models be trained?

Fine-tuning: In the fine-tuning approach, an existing model is used as the base and trained with new, specific data, to improve the model’s performance for specific tasks or application areas. The parameters of the base model are changed over the course of this training. Fine-tuning is often used in industry sectors that require models with specific expertise or understanding for specific types of texts, for instance, in medicine, law, or the world of finance. One example involves a system that assists doctors in answering medical questions. This method can also be used to adapt the fundamental behavior of an LLM, for example, the method or characteristic style in which the model provides answers. Since the basic model has already been trained on an extensive dataset, fine-tuning usually requires less computing power and time (and therefore lower costs) than if it were trained again from scratch.
In-context learning: Under in-context learning, the model is not retrained, technically speaking, which means no model parameters are changed. Instead, information is provided as a context in the query prompt (such as instructions, example answers, chat history, or documents) and benefits from the ability of the language models to use this context to answer the query. In this approach, the context should be as relevant to answering the question as possible. This means if a legal text is entered in the prompt and a question is asked about this specific text, the model should generate an appropriate answer within the provided context. In-context learning is especially useful in scenarios in which the provided data changes frequently or when no resources are available for extensive training. It enables flexible, dynamic interaction with the model, which is beneficial in interactive applications like chatbots and assistance systems. Since the context window size is limited in LLMs, the provided context should be kept as brief as possible. This is where the concept of retrieval-augmented generation comes in.

Retrieval-Augmented Generation: Knowledge Management Use Case

Retrieval-augmented generation (RAG) describes an architecture that combines an information retrieval model with a large language model. When a query is submitted, the retrieval model searches its database for potentially relevant information – which can stem from text documents, websites, or other data sources – and then passes this information on to the language model together with the query. As a result, the model can utilize the additional information in the framework of in-context learning to better understand the query and generate a more precise answer.

Let’s examine a specific use case: a chatbot that enables free-text search of a knowledge base such as Confluence. How does the retrieval model find the right information for a query in this case?

Several advance steps are needed. First, the underlying data is exported from Confluence (as HTML documents, for example). Each of these documents is then split into chunks (smaller parts like sentences or words), which in turn are converted to embeddings (vector representations) that are better suited to processing by machine learning algorithms. These vectors are designed to map the meanings and relationships between different words and phrases in such a way that words with similar meanings will (ideally) be close together in the vector space. In conclusion, the generated vectors and a reference to the corresponding documents are saved in a vector database, such as ChromaDB or Pinecone, for fast access. When a question is then asked of the chatbot, that question is then split into its individual components – like the documents previously – and compared with the entries in the vector database. The content that is most similar to the question, and therefore the most relevant, is returned by the database and fed into the language model as context. The answer generated by the language model is then displayed as the result in the chatbot.

Retrieval-augmented generation (RAG) — Figure 2: Schematic sequence of RAG-based applications

Conclusion

The integration of large language models in enterprise applications marks a watershed in the digital transformation. These models not only enable companies to optimize their processes and improve their communications, but also open up new horizons for personalized services, precise analytics, and efficient decision-making. Their versatility makes them a useful tool for companies that aspire to create innovative solutions, to become and remain successful in a constantly changing and increasingly data-driven business world.

Tech

The future of content management – How headless and decoupled CMS simplify content delivery

The world of Content Management Systems (CMS) has evolved significantly in recent years. While traditional CMS systems like WordPress or Joomla have long dominated the market, new solutions like Headless and Decoupled CMS are gaining importance.

Tech

Cloud computing trends and tips for successful migration

Cloud technology has undergone remarkable development in recent years, becoming an indispensable part of modern business strategies. In this article, we will examine the latest trends in cloud computing and provide practical tips for successful cloud migration.

Tech

AOE Technology Radar #7

Welcome to the AOE Tech Radar – our view on on tools, technologies and patterns that we use in our software development projects.