Patronus AI Finds Alarming Safety Gaps in Leading LLMs

(Jamie-Jin/Shutterstock)

Patronus AI, an automated evaluation and security platform, has released the results of a diagnostic test suite that shows critical safety risks in large language models (LLMs). The announcement sheds light on the limitations of AI models and emphasizes the need for improvement, especially for AI use cases in highly regulated industries, such as finance.

The findings from Patronus AI come at a time when there are growing concerns about the accuracy of GenAI systems such as ChatGPT and the potential of GenAI systems to provide harmful responses to queries. There is also a rising need for ethical and legal oversight of the use of AI.

The Patronus AI SimpleSafetyTest results were based on testing some of the most popular open-source LLMs for SEC (U.S. Securities and Exchange Commission) filings. The test comprised 100 test prompts designed to test vulnerabilities for high-priority harm areas such as child abuse, physical harm, and suicide. The LLMs only got 79 percent of the answers correct on the test. Some models produced over 20 percent unsafe responses.

The alarmingly low scores could be a result of underlying training data distribution. There is also a tendency for LLMs to “hallucinate”, which means they generate text that is factually incorrect, inadvertently overly indulgent, or nonsensical. If the LLM is trained on data that is incomplete or contradictory, the system could make errors in associations leading to faulty output.

The Patronus AI test shows that the LLM would hallucinate figures and facts that weren’t in the SEC filings. It also showed that adding “guardrails”, such as a safety-emphasis prompt, can reduce unsafe responses by 10 percent overall, but the risks remain.

Patronus AI, which was founded in 2023, has been concentrating its testing on highly regulated industries where wrong answers could have big consequences. The startup’s mission is to be a trusted third party for evaluating the safety risks of AI models. Some early adopters have even described Patronus AI as the “Moody’s of AI”.

Patronus AI co-founders Anand Kannappan (left) and Rebecca Qian (Image courtesy Lightspeed)

The founders of Patronus AI, Rebecca Qian, and Anand Kannappan, spoke to Datanami earlier this year. The founders shared their vision for Patronus AI to be “the first automated validation and security platform to help enterprises be able to use language models confidently” and to help “enterprises be able to catch language model mistakes at scale”.

The latest results of the SimpleSafetyTest highlight some of the challenges faced by AI models as organizations look to incorporate GenAI into their operations. One of the most promising use cases for GenAI has been its potential to extract important numbers quickly and perform analysis on financial narratives. However, if there are concerns about the accuracy of the model, it could cast some serious doubts on the model’s application in highly regulated industries.

A recent report by McKinsey shows that the banking industry has the largest potential to benefit from GenAI technology. It could add an equivalent of $2.6 trillion to $4.4 trillion annually in value to the industry.

The percentage of incorrect responses in the SimpleSafetyTest would be unacceptable in most industries. The Patronus AI founders believe that with continued improvement, these models can provide valuable support to the financial industry, including analysts and investors. While the massive potential of GenAI is undeniable, to truly achieve that potential, there needs to be rigorous testing before deployment.

Immuta Report Shows Companies Are Struggling to Keep Up with Rapid AI Advancement

O’Reilly Releases 2023 Generative AI in the Enterprise Report