October 2025: AI updates from the past month


OpenAI announces agentic security researcher that can find and fix vulnerabilities

OpenAI has released a private beta for a new AI agent called Aardvark that acts as a security researcher, finding vulnerabilities and applying fixes, at scale.

“Software security is one of the most critical—and challenging—frontiers in technology. Each year, tens of thousands of new vulnerabilities are discovered across enterprise and open-source codebases. Defenders face the daunting tasks of finding and patching vulnerabilities before their adversaries do. At OpenAI, we are working to tip that balance in favor of defenders,” OpenAI wrote in a blog post.

The agent continuously analyzes source code repositories to identify vulnerabilities, assess their exploitability, prioritize severity, and propose patches. Instead of using traditional analysis techniques like fuzzing of software composition analysis, Aardvark uses LLM-powered reasoning and tool-use.

Cursor 2.0 enables eight agents to work in parallel without interfering with each other

The AI coding editor Cursor announced the launch of Cursor 2.0, the next iteration of the platform, featuring a new interface for working with multiple agents and its first ever coding model.

The new multi-agent interface centers around agents instead of files. With this new interface, up to eight agents can work in parallel, using git worktrees and remote trees to prevent them from interfering with each other. It also allows developers to have multiple models attempt the same problem and see which one produces the best output.

While this new interface is designed for agents, developers will still be able to open files or switch back to the classic IDE as needed.

The new coding model, Composer, is four times faster than similar models, the company claims. It was designed for low-latency agentic coding tasks in Cursor, and it can complete most turns in less than 30 seconds.

Workato launches Enterprise MCP for SaaS platforms

Organizations are spending huge dollars on AI agents, but are finding that integrating the agents into all the systems the business needs to function is a very high hurdle.

To help make SaaS platforms agent-ready, integration orchestration company Workato released Workato Enterprise MCP, which the company said in its announcement can “turn existing workflows, integrations, and APIs into rich, multi-step agent skills that any large-language-model (LLM)-based agent can call, including ChatGPT, Claude, Gemini, and Cursor.”

Adam Seligman, chief technology officer at Workato, told SD Times that “the thing we keep coming back to over and over again is agents show a lot of promise, but to really work for business, they have to get access to business data. And they have to be able to do things inside your business, but do it in a way that you trust. And it’s really hard to get those two things right.”

JetBrains launches open benchmarking platform for measuring AI productivity

JetBrains has released a new tool designed to enable developers to measure their actual productivity gains from AI tools.

The company’s Developer Productivity AI Arena (DPAI Arena) is an open benchmarking platform for how well AI development tools complete real-world software engineering tasks. According to the company, current benchmarks that LLMs are run against rely on outdated datasets, cover a narrow range of technologies, and focus mainly on issue-to-patch workflows.

“As AI coding tools advance rapidly, the industry still lacks a neutral, standards-based framework to measure their real impact on developer productivity,” the company wrote in a blog post.

DPAI Arena uses a flexible, track-based architecture to enable reproducible comparisons across workflows like patching, bug fixes, PR review, test generation, static analysis, and more.

GitHub unveils Agent HQ, the next evolution of its platform that focuses on agent-based development

During its annual conference, GitHub Universe, GitHub shared its plans for Agent HQ, its vision for the future of the platform where AI agents are natively integrated across all of GitHub.

As part of this Agent HQ initiative, over the next several months, paid GitHub Copilot users will gain direct access to popular coding agents from Anthropic, OpenAI, Google, Cognition, xAI, and more.

Agent HQ brings with it several new capabilities to support this next evolution, the first of which is mission control, a central command center for assigning, steering, and tracking the work of multiple agents across GitHub, Copilot CLI, and VS Code.

Mission control’s branch controls gives developers granular oversight over running checks for code created by the agents. Identity features will also be introduced to allow developers to manage agents like they would other coworkers and control which agent is building a task, manage access, and implement policies.

OpenAI completes restructuring, strikes new deal with Microsoft

OpenAI today announced that it has completed the restructuring of its business. When the company was founded in 2015, it was launched as a non-profit organization and that non-profit has controlled the for-profit arm of the business.

Today’s restructuring turns the for-profit arm into a public benefit corporation called OpenAI PBC. The OpenAI Foundation—the new name for the non-profit—will still control the for-profit and hold a 26% equity stake in OpenAI PBC, which is currently valued at around $130 billion.

Being a public benefit corporation differs from traditional corporate structures in that they are “required to advance its stated mission and consider the broader interests of all stakeholders, ensuring the company’s mission and commercial success advance together,” OpenAI’s website explains.

Microsoft announces public preview for planning capability that improves how Copilot in Visual Studio handles complex tasks

Microsoft has announced a public preview for a new feature that aims to enable Copilot in Visual Studio to tackle more complex projects.

With its new planning capability in Agent Mode, Copilot will research the codebase to break down big tasks into smaller and more manageable tasks, while also iterating on its plan as it works through the steps.

“Planning makes Copilot more predictable and consistent by giving it a structured way to reason about your project. It builds on techniques from hierarchical and closed-loop planning research – enabling Copilot to plan at a high level, execute step-by-step, and adjust dynamically as it learns more about your codebase and issues encountered during implementation,” Rhea Patel, product manager at Microsoft, wrote in a blog post.

GitKraken releases Insights to help companies measure ROI of AI

GitKraken, a software engineering intelligence company that specializes in improving the developer experience, announced the launch of GitKraken Insights to provide companies with better insights into AI’s impact on developer productivity.

Matt Johnston, CEO of Gitkraken, told SD Times that despite the incremental investments in and perceived velocity gains from AI, they struggle to understand the impact. “I was talking to a VP of developer experience at a large Silicon Valley company, and he was basically saying, ‘We’ve made investments of thousands of seats in Cursor and Copilot and Claude, and we can’t really tell what’s being used… and how the heck do I measure this in a way that’s compelling to my business leaders.”

GitKraken Insights brings together several different metrics—DORA metrics, code quality analysis, technical debt tracking, AI impact measurement, and developer experience indicators—to paint a picture of what’s happening within the development lifecycle.

Mabl announces updates to Agentic Testing Teammate

The Agentic Testing Teammate works alongside human testers to make the process more efficient. New updates include AI vectorizations and test semantic search, improvements to test coverage, and enhancements to the MCP Server that enable testers to do a number of tasks directly within their IDE, including Test Impact Analysis, intelligent test creation, and failure recommendations.

“This new work is built on the idea that an agent can become an integral part of your testing team,” said Dan Belcher, co-founder of mabl. “Unlike scripting frameworks and general-purpose large language models, mabl builds deep knowledge about your application over time and uses that knowledge to make it–and your team–more effective.”

Couchbase 8.0 adds three new vector indexing and retrieval capabilities

These new capabilities are designed to support diverse vector workloads that facilitate real-time AI applications.

Hyperscale Vector Index is based on the DiskANN nearest-neighbor search algorithm and enables operation across partitioned disks for distributed processing. Composite Vector Index supports pre-filtered queries that can scope the specific vector being sought. Search Vector Index supports hybrid searches containing vectors, lexical search, and structured query criteria in a single SQL++ request.

Anthropic expands memory to all paid Claude users

Anthropic announced that the recent memory feature in Claude is being rolled out to Pro and Max plan users, making it available to all paid users now.

Memory was initially announced in early September, but was only available to Team and Enterprise users to begin with.

Memory allows Claude to remember your projects and preferences so that you don’t need to re-explain important context across sessions. “Great work builds over time. With memory, each conversation with Claude improves the next,” Anthropic wrote in its initial announcement.

Harness brings vibe coding to database migration with new AI-Powered Database Migration Authoring feature

Harness is on a mission to make it easier for developers to do database migrations with its new AI-Powered Database Migration Authoring feature. This new capability allows users to describe schema changes in natural language to receive a production-ready migration.

For example, a developer could ask “Create a table named animals with columns for genus_species and common_name. Then add a related table named birds that tracks unladen airspeed and proper name. Add rows for Captain Canary, African swallow, and European swallow.”

Harness’ platform would then analyze the current schema and policies, generate a backward-compatible migration, validate the change for safety and compliance, commit it to Git for testing, and create rollback migrations.

Red Hat Developer Lightspeed brings AI assistance to Red Hat’s Developer Hub and migration toolkit

Red Hat Developer Lightspeed has been integrated into both the Red Hat Developer Hub and the migration toolkit for applications (MTA).

In the Red Hat Developer Hub, it acts as an assistant to speed up non-coding tasks, like exploring application design approaches, writing documentation, generating test plans, and troubleshooting applications.

In the migration toolkit, Red Hat Developer Lightspeed automates source code refactoring within the IDE. It leverages MTA’s static code analysis to understand migration issues and how to fix them, and also improves over time by learning what made past changes successful.

MariaDB unifies transactional, analytical, and vector databases in MariaDB Enterprise Platform 2026 release

MariaDB’s Enterprise Platform 2026 release was announced this week, with the promise that it will act as “the definitive database platform for building next-generation intelligent applications.”

To support agentic AI, the company added native RAG for grounding LLMs with context from MariaDB without needing embeddings, vector stores, or retrieval pipelines. The company also added ready-to-use agents within the platform, including a developer copilot that connects to the database and can respond to natural language queries, and a DBA copilot that can manage tasks like performance tuning and debugging.

Additionally, the company added an integrated MCP server so that agents can interact with MariaDB databases. The MCP interface in MariaDB allows users to integrate vector search, LLMs, and standard SQL operations, and allows agents to launch serverless databases in the cloud.

Spotify Portal now generally available and packed with features for improving dev experience

Spotify Portal for Backstage provides developers with a ready-to-use version of Backstage, its open source solution for building internal developer portals (IDPs).

AiKA, which is an AI assistant for Portal, can now connect to third-party MCP servers and trigger actions in Portal. AiKA itself also functions as an MCP server, allowing developers to connect it up to tools like Cursor or Copilot and access Portal data.

“The general availability of Spotify Portal marks a pivotal moment in how organizations build, measure, and optimize developer experience. What began as an internal tool for Spotify engineers is now a fully-fledged platform for enterprises, combining the reliability of Backstage, the insight of Confidence, and the speed of AI-driven workflows,” Spotify wrote.

Sonar announces new solution to optimize training datasets for coding LLMs

Sonar, a company that specializes in code quality, announced a new solution that will improve how LLMs are trained for coding purposes.

According to the company, LLMs that are used to help with software development are often trained on publicly available, open source code containing security issues and bugs, which become amplified throughout the training process. “Even a small amount of flawed data can degrade models of any size, disproportionately degrading their output,” Sonar wrote in an announcement.

SonarSweep (now in early access) aims to mitigate those issues by ensuring that models are learning from high-quality, secure examples.

It works by identifying and fixing code quality and security issues in the training data itself. After analyzing the dataset, it applies a strict filtering process to remove low-quality code while also balancing the updated dataset to ensure it will still offer diverse and representative learning.

Amazon launches Quick Suite to provide agentic AI across applications and AWS services

Amazon Quick Suite allows users to ask questions, conduct deep research, analyze and visualize data, and create automations.

It can connect to internal repositories, like wikis or intranet, and AWS services. Amazon also offers 50+ built-in connectors to applications like Adobe Analytics, SharePoint, Snowflake, Google Drive, OneDrive, Outlook, ServiceNow, and Databricks, as well as support for over 1,000+ apps via connecting to their MCP servers.

This deep connection across the enterprise enables Quick Sight to analyze data across all of a company’s systems and create complex business workflows across multiple applications and departments.

“Unlike traditional business intelligence tools that work only with databases and data warehouses, Quick Sight’s agentic experience analyzes all forms of data across all your systems and apps, including your documents,” Amazon wrote in a blog post.

Google unveils Gemini Enterprise to offer companies a more unified platform for AI innovation

Google is announcing a new offering built around Gemini, designed specifically with large enterprise use in mind.

Gemini Enterprise consolidates six core components:

  • Advanced Gemini models
  • A no-code workbench for analyzing information and orchestrating agents
  • Pre-built Google agents for tasks like deep research or data insights
  • The ability to connect to company data
  • A central governance framework for visualizing and securing all agents
  • Access to an ecosystem of over 100,000 industry partners

“By bringing all of these components together through a single interface, Gemini Enterprise transforms how teams work. It moves beyond simple tasks to automate entire workflows and drive smarter business outcomes — all on Google’s secure, enterprise-grade architecture,” Thomas Kurian, CEO of Google Cloud, wrote in a blog post.

Atlassian shares major updates to its genAI assistant Rovo at Team ‘25 Europe

Atlassian is hosting its annual user conference Team ‘25 Europe this week in Barcelona, and during the event, the company shared several new and upcoming updates to its generative AI assistant Rovo.

Atlassian announced the general availability of its AI coding agent Rovo Dev. Rovo Dev can help with code reviews, documentation, dependency cleanups, and more, and it leverages context from tickets, docs, incidents, and business goals to provide developers with information that will help them make more informed decisions.

Additionally, starting early next year, Rovo Search will become the default search in Jira, which will allow Jira’s search to suggest relevant issues and projects.

Rovo Chat will also be getting over 100 out-of-the-box modular capabilities from Atlassian and its partners that can be used in chat, agents, and workflows. Other new Chat capabilities include the ability to remember past conversations and preferences and a new collaborative workspace called Canvas.

Google launches ecosystem of extensions for Gemini CLI

Google is launching Gemini CLI extensions to allow different development tools to connect up to the Gemini CLI.

Each extension includes a playbook that teaches the CLI how to effectively use that tool, eliminating the need for developers to configure them. “If you want to look under the hood, Gemini CLI extensions package instructions, MCP servers and custom commands into a familiar and user-friendly format,” Google wrote in a blog post.

Twenty-two extensions are available at launch from Google partners Atlassian, Canva, Confluent, Dynatrace, Elastic, Figma, GitLab, Grafana Labs, Harness, HashiCorp, MongoDB, Neo4j, Pinecone, Postman, Qodo, Shopify, Snyk, Sonar, Stripe, ThoughtSpot, Weights & Biases by CoreWeave, and WIX.

IBM adds new capabilities to watsonx Orchestrate to facilitate agentic AI at scale

As IBM kicked off its annual developer event TechXchange 2025, it announced several new capabilities to enable organizations to unlock value from agentic AI.

“There’s certainly been a lot of buzz in the industry,” said Bruno Aziza, vice president of Data, AI, and Analytics Strategy at IBM Software. “I think if you look at the context of everything that’s going on, customers are struggling. They’re struggling to get value from their investment.

It announced many updates to its AI agent orchestration platform, watsonx Orchestrate. The platform now includes AgentOps, an observability and governance layer for AI agents; Agentic Workflows, standardized and reusable flows that can be used to build and sequence multi-agent systems; and Langflow integration to reduce agent setup time.

OpenAI DevDay: ChatGPT Apps, AgentKit, and GA release of Codex

OpenAI held its annual Developer Day event this week where it announced several updates to its products.

The company unveiled apps in ChatGPT as well as an SDK for developers to build them. Companies that have created apps that are already available include Booking.com, Canva, Coursera, Figma, Expedia, Spotify, and Zillow.

When a user says the name of an available app in a prompt, ChatGPT will automatically surface that app in the chat. For example, saying “Spotify, make a playlist for my party this Friday” will bring in the Spotify app. ChatGPT will also be able to suggest apps when it thinks they’re relevant to the conversation, such as suggesting Zillow’s app in a conversation about buying a house.

Google’s coding agent Jules now works in the command line

Google’s coding agent Jules now can be used directly in developer’s command lines so that it can act as more of a coding companion.

According to Google, it created this new command line interface—called Jules Tools—out of a recognition that the terminal is where developers spend most of their time.

Jules Tools allows developers to spin up tasks, inspect what Jules is doing, and integrate Jules into automation. “Think of Jules Tools as both a dashboard and a command surface for your coding agent,” Google wrote in a blog post.

Amazon Bedrock AgentCore MCP server now available

The AgentCore MCP server offers built-in support for runtime, gateway integration, identity management, and agent memory. It was created to speed up the process of creating components that are compatible with Bedrock AgentCore.

“What typically takes significant time and effort, for example learning about Bedrock AgentCore services, integrating Runtime and Tools Gateway, managing security configurations, and deploying to production can now be completed in minutes through conversational commands with your coding assistant,” AWS wrote in a blog post.

DigitalOcean updates Gradient AI Platform

The Gradient AI Platform is a platform for building AI agents without needing to manage the underlying infrastructure. New features that have been added include support for image generation, auto-indexing of knowledge bases, and VPC integration.

Additionally, DigitalOcean revealed that it will be expanding the platform further in the next few weeks with new offerings like the Gradient AI AgentDevelopmentKit and Gradient AI Genie, which integrates into IDEs and can be used to manage multi-agent systems using natural language.

Microsoft announces preview of its new Agent Framework

Microsoft has announced a preview of the Microsoft Agent Framework, an open-source development kit for .NET and Python for creating AI agents and multi-agent workflows.

It supports creating individual agents as well as graph-based workflows to connect up multiple agents.

According to Microsoft, the Agent Framework is a direct successor to its other projects Semantic Kernel and AutoGen, utilizing foundations from both. It brings together Semantic Kernel’s enterprise-grade features like thread-based state management, type safety, filters, telemetry, and model and embedding support, with AutoGen’s abstractions for single- and multi-agent patterns.

Mendix updates its low-code platform with agentic AI features

New agent and genAI features include an agent builder, the ability to create project plans using generative AI, the ability to create microflows and workflows with AI, and support for MCP.

Another focus area of the release is business process automation, and new features related to that include the ability for Mendix Workflows to call AI agents, dynamic case management, and Global Inbox, a single view for all tasks from multiple distributed workflows.

California passes law to ensure safe innovation of frontier AI models

Earlier this week, California’s governor Gavin Newsom signed a new law designed to ensure safe development and deployment of frontier AI models.

“California has proven that we can establish regulations to protect our communities while also ensuring that the growing AI industry continues to thrive,” Newsom said. “This legislation strikes that balance. AI is the new frontier in innovation, and California is not only here for it – but stands strong as a national leader by enacting the first-in-the-nation frontier AI safety legislation that builds public trust as this emerging technology rapidly evolves.”

The law, SB 53, establishes requirements for companies developing frontier AI models, spanning five categories: transparency, innovation, safety, accountability, and responsiveness.

Slack evolves to support agentic capabilities built on conversation data

Salesforce is announcing several major updates to Slack that will enable customers to leverage their conversation history for AI apps and agents.

The company is announcing a real-time search (RTS) API, which surfaces up-to-date discussions, files, and channels to provide agents access with context-aware information. To ensure secure use of information, data remains in Slack and the API adheres to existing user access permissions and only retrieves data that is relevant to the query.

“It unlocks your organization’s collective intelligence, securely connecting agents to conversations and decisions that were once trapped in silos,” Salesforce wrote in a blog post.

Anthropic claims its newly released Claude Sonnet 4.5 is the “best coding model in the world”

Claude Sonnet 4.5 achieves a 77.2% on the SWE-bench for software engineering, compared to 74.5% for Claude Opus 4.1 and 72.7% for Claude Sonnet 4. For external comparison, GPT-5 Codex scored at 74.5%, GPT-5 scored 72.8%, and Gemini 2.5 Pro scored 67.2%.

Additionally, it leads in the OSWorld benchmark, which tests AI models on real-world computer tasks. It scored 61.4% on that benchmark, beating out Claude Sonnet 4, which scored 42.2%.

“Sonnet 4.5 can produce near-instant responses or extended, step-by-step thinking that is made visible to the user,” Anthropic says.

According to Anthropic, Claude Sonnet 4.5 also shows better domain-specific knowledge and reasoning in the fields of finance, law, and medicine.

Workato announces MCP platform

Workato Enterprise MCP provides customers with access to over 100 fully managed MCP servers that can connect with different LLMs and agents, including ChatGPT, Claude.AI, Amazon Q, Cursor, and Google Gemini. Some of the MCP servers available in the platform include ones from Atlassian, Box, Reddit, Salesforce, Okta, and Shopify.

“At Workato, we hear every day that while MCP is exciting, enterprises still face challenges making MCP work securely, effectively, and reliably at scale,” said Adam Seligman, Chief Technology Officer at Workato. “Workato Enterprise MCP changes that by bringing the full spectrum of business processes, from the front office to the back office and everything in between, to AI agents through MCP. With pre-built, enterprise-grade servers and skills, we’re giving global enterprises a first-of-its-kind solution that unlocks AI agents to safely execute real business processes at scale, delivering measurable business value.”

VibeSec embeds security analysis into AI coding models to prevent generation of insecure code

OX Security is shifting security as far left as it can go with the launch of VibeSec, which it says can stop insecure AI-generated code before the code even gets generated.

It does this by embedding dynamic security context into the coding model so that it doesn’t suggest code that contains security issues.

“VibeSec doesn’t just accelerate security – it fundamentally changes how security operates. For the first time, security moves faster than vulnerabilities,” said Neatsun Ziv, co-founder and CEO, at OX Security.

OutSystems launches Agent Workbench

Agent Workbench allows users to create and orchestrate AI agents that leverage their company’s data sets and workflows. For example, in early access, Axos Bank built a log analysis agent to interpret error logs and Thermo Fisher Scientific used it to build a Customer Escalation Agent that interprets unstructured data from customer interactions.

“Agent Workbench was created to give our customers the tools they need to build the agentic future with OutSystems. Our Early Access Program participants have realized impressive results with Agent Workbench, positioning them as industry leaders in agentic AI,” said Woodson Martin, CEO of OutSystems.

Latest articles

spot_imgspot_img

Related articles

Leave a reply

Please enter your comment!
Please enter your name here

spot_imgspot_img