AI monitoring agent eyed to prevent harmful output in real-world scenarios

Getting your Trinity Audio player ready...

As artificial intelligence (AI) risks attain new heights, a group of researchers has developed a monitoring tool to flag harmful outputs associated with large language models (LLMs).

AutoGPT, developed in partnership with Microsoft (NASDAQ: MSFT) researchers and scientists from Northeastern University, can reportedly halt the execution of harmful outputs in real time. According to the report, the AI monitoring tool demonstrated impressive results during deployment with leading LLMs like OpenAI’s ChatGPT and Google’s (NASDAQ: GOOGL) Bard in stifling code attacks.

“We design a basic safety monitor that is flexible enough to monitor existing LLM agents, and, using an adversarial simulated agent, we measure its ability to identify and stop unsafe situations,” read the report.

Using an adversarial simulated agent, the tool showed competence in identifying and stopping threats while adopting multiple protective layers, including a final human review to eliminate the threat.

“Agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behavior ranked and logged to be examined by humans,” read the report.

The “Testing Language Model Agents Safely in the Wild” report noted that existing monitoring tools may have shown proficiency in controlled environments but falter in real-world scenarios. The researchers wrote that the reason for the lackluster performance lies in the multiple possibilities of harm vectors arising from the use of AI.

Researchers noted that even attempts to use AI safely may result in several unintended dangers from seemingly benign prompts. To achieve above-average results, the researchers trained the tool using a cache of 2,000 vetted human interactions in nearly 30 distinct tasks and intentionally adding unsafe parameters.

Tested on leading LLMs, the model could differentiate harmful and safe inputs 90% of the time under multiple test environments. According to the report, the researchers suggested deploying the monitoring tool for various applications, such as enhancing the agent’s training signal and determining when to escalate issues for user approvals.

Grim risks for AI

Back in October, AI researchers from Anthropic revealed in their report that several LLMs favor sycophancy in their responses to user prompts rather than truthful answers. The report forms part of a laundry list of potential pitfalls associated with AI usage as regulators sound alarm bells over adopting emerging technologies.

OpenAI, makers of ChatGPT, have launched a new Preparedness unit to counter AI risks to cybersecurity and other critical sectors of the global economy. On the other hand, Meta (NASDAQ: META) disbanded its Responsible AI (RAI) team following an internal restructuring, denting the company’s ambitious plans to develop new AI tools safely.

“We take seriously the full spectrum of safety risks related to AI, from the systems we have today to the furthest reaches of superintelligence,” said OpenAI. “To support the safety of highly-capable AI systems, we are developing our approach to catastrophic risk preparedness.”

In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.

Watch: AI & blockchain will be extremely important—here’s why

Tagged:

AIAritificial intelligenceAutoGPTLarge Language ModelMicrosoftNortheastern UniversityTesting Language Model Agents Safely in the Wild

Recommended for you

The I-Ching as blueprint for decentralized systems

The I-Ching is not just an ancient philosophy but also a functional state machine that provides practical insights for modern...

By George Siosi Samuels

July 24, 2025

Quantum-proofing blockchains: How much of a problem is it?

Chinese researchers introduced EQAS, a modular system that aims to heighten information security by separating data storage from verification.

By Jon Southurst

July 8, 2025