The remarkable achievements of transformer-based models such as GPT-2 and GPT-3 have prompted the research community to explore large language models (LLMs). Furthermore, the recent success and popularity of ChatGPT has only served to increase people’s interest in LLMs. Learning in context and thought chain suggestion are two other important discoveries that have significantly improved the accuracy of the models. These discoveries go beyond simply answering questions, where an input prompt containing a question is used to produce a reasonable answer.
While these hinting tactics have been effective in improving performance, current transformer-based LLMs can only condition to a fixed input string length, which limits the computations they can represent. This can also be understood as any deterministic language model that relies on finite length strings is computationally limited since the model is equivalent to a finite automaton. To counter this, researchers looked into the possibility of adding an external feedback loop to LLMs, where model outputs are provided as input after some post-processing. However, the question of whether this method substantially expands a set of calculation models is still open.
Google Brain and researchers at the University of Alberta worked together to work on this problem statement. They added external read-write memory to an LLM to verify that it could emulate any algorithm on any input. Their research is summarized in the paper Memory Augmented Large Language Models are Computationally Universal, which shows that an LLM augmented with associative read-write memory is computationally universal.
The Flan-U-PaLM 540B was the LLM chosen by the researchers. The idea behind the research is to use a simple stored instruction computer to link the LLM and associative memory. This allows output and input prompts that need to be passed to the language model to interact in a loop. External associative storage can be thought of as a dictionary, with key-value pairs being variable names/locations of addresses and values. The language model and memory use regular expression matches to perform each parsing step.
A unique prompt program is then developed to direct the system to simulate the execution of a universal Turing machine after establishing a stored instruction computer. In the end, demonstrating the reliability of simulations boils down to examining a small number of prompt result models and confirming that the language model generates the appropriate output for any finite set of possible input strings. The fact that this study does not involve any extra training of the language model or the alteration of its pre-trained weights is a major strength of the work. Instead, the build depends solely on building a kind of stored instruction computer that can then be programmed with certain prompts.
In contrast to previous research in this field that explores the computational universality of models, this study is distinctive. The main contrast is that the researchers showed how increasing external memory could elicit universal computational behavior using a fixed language model with fixed pre-trained weights. The results demonstrate that large language models are already computationally universal as they currently exist as long as they have access to infinite external memory.
Check out thePaper.All the credit for this research goes to the researchers of this project. Also, don’t forget to subscribeour Reddit page,Discord channel,ANDEmail newsletterwhere we share the latest news on AI research, cool AI projects and more.
Khushboo Gupta is a Consulting Intern at MarktechPost. He is currently pursuing his B.Tech in Indian Institute of Technology (IIT), Goa. She is passionate about Machine Learning, Natural Language Processing and Web Development. She likes to learn more about the technical field by participating in different challenges.
#research #confirms #large #transformerbased #language #models #computationally #universal #augmented #external #memory
Image Source : www.marktechpost.com