January 2026: AI updates from the past month

Coder creates AI Maturity Self-Assessment and AI Maturity Curve

These new tools will enable software development teams to assess how effectively they have adopted AI. The assessment asks teams questions like how standardized their developer environment is, what their governance approach for AI is, how they handle risks like sensitive data exposure, and more.

“As AI agents take on more responsibility inside engineering workflows, organizations need a clearer, more tangible way to understand maturity and governance readiness,” said Eric Paulsen, field CTO at Coder. “Without that baseline, it becomes difficult to scale agentic AI safely or predictably. Our self-assessment gives teams a concrete view of where they stand, so they can plan adoption intentionally, manage risk and scale with confidence.”

Anthropic makes tools within Claude interactive

Anthropic has announced that users will now be able to directly interact with certain tools within Claude.

Claude already had the ability to connect to tools and take action on a user’s behalf in those tools, so what’s new today is the ability for the user to actually go in and interact with those tools directly in the Claude window.

The tools that currently support interactivity include Amplitude, Asana, Box, Canva, Clay, Figma, Hex, monday.com, and Slack, and there are plans to soon add support for Salesforce as well.

OpenAI will retire GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT

The company had initially deprecated GPT-4o when it released GPT-5, but brought it back due to users stating they needed more time to make the transition and that they preferred the older model’s conversational style and warmth.

OpenAI has since incorporated that feedback into GPT-5.1 and GPT-5.2 by making personality improvements, offering greater support for creative ideation, and adding more ways to customize responses.

“We’re announcing the upcoming retirement of GPT‑4o today because these improvements are now in place, and because the vast majority of usage has shifted to GPT‑5.2, with only 0.1% of users still choosing GPT‑4o each day,” OpenAI wrote in a post.

Teleport tackles agentic trust with new Agentic Identity Framework

Teleport has announced the launch of its new Agentic Identity Framework that defines policies, practices, developer tools, and a reference architecture for securely deploying agents in production.

According to the company, agentic AI introduces new security challenges, as they invoke tools, access sensitive data, delegate tasks, and operate across environments at scale, all without human involvement.

Teleport says our current identity, access, and security models weren’t designed for non-deterministic systems, and current attempts at deploying agentic systems have led to identity fragmentation, secrets sprawl, limited visibility, and systemic risk.

The Agentic Identity Framework attempts to solve these issues by establishing an identity layer that is secured cryptographically with a hardware root of trust. It enables zero trust authentication, zero standing privileges, and real-time visibility into identity behavior.

Apiiro announces Guardian Agent

Guardian Agent rewrites developer prompts to make them more secure and ensure they meet current needs of the software architecture, runtime environments, organizational policies, and regulatory requirements.

According to the company, because of AI, security debt is being added faster than it can be fixed, and asking developers to fix vulnerabilities after code is written is no longer sufficient. “The reality is clear: Detection will never scale at the speed of AI. Only prevention will,” the company wrote in a blog post.

Ai2 releases Open Coding Agents

Open Coding Agents are a family of open agents that utilize a training method that makes it easier for developers to build their own coding agent trained on their internal codebases.

The first release is SERA (Soft-verified Efficient Repository Agent), which uses a fine-tuning method that can be specialized to any codebase. The company is also releasing SERA’s training data to help researchers study what worked and improve on it.

“Accessible open models can now inherit strong agentic behavior through a simple, reproducible pipeline—no large-scale RL infrastructure or engineering team required. Case in point, SERA was built largely by a single Ai2 researcher,” Ai2 wrote in a blog post.

Rocket Software launches AI assistant for operational diagnostics

Rocket EVA allows teams to ask questions about their core systems and trace issues from initial symptoms to system interactions to the responsible code. It also provides recommendations to help teams resolve issues more quickly.

“By tracing issues from the first symptom to the exact line of code, EVA provides a unified path to insight without the multi-product complexity other vendors require,” said Michael Curry, president of data modernization at Rocket Software. “Its ability to extend diagnostics across platforms and integrate third-party MCP tools reduces the time to resolve issues, setting a new benchmark for how enterprises maintain resilient, high‑performing systems.”

Report: AI hallucinates 27% of upgrade recommendations for open source projects

Open-source adoption is being accelerated by AI and automation, but developers need to proceed with caution to ensure they’re not introducing extra risk into their software supply chain.

Brian Fox, co-founder and CTO of Sonatype, explained that AI can accelerate good engineering, but it can also scale mistakes faster, especially if it doesn’t have real-world data to pull from. For example, if a model doesn’t know what versions exist or which ones have vulnerabilities, it predicts and fills in the blank, leading to upgrades to versions that don’t exist or recommendations that break builds.

In its 2026 State of Software Supply Chain report, Sonatype analyzed over 1.2 million malicious packages, 1,700 vulnerability records, and 37,000 AI-driven upgrade recommendations. It found that AI models recommended over 10,000 non-existent versions, which is a 27.75% hallucination rate.

GitHub Copilot SDK now in technical preview

The SDK allows developers to embed agentic capabilities into their applications using the same execution loop used by the GitHub Copilot CLI. The SDK repository includes setup instructions, starter examples, and SDK references for all of the supported languages.

GitHub recommends starting by defining a single task, such as updating files or running a command, and letting Copilot plan and execute steps while the application supplies domain-specific tools and constraints.

Anthropic drafts new constitution for Claude models

The constitution is Anthropic’s vision for Claude’s values and behavior. The main sections in this updated version include specifications related to helpfulness, ethics, safety, nature, and guidelines for how to handle specific issues, like medical advice or cybersecurity requests.

“The constitution is a crucial part of our model training process, and its content directly shapes Claude’s behavior. Training models is a difficult task, and Claude’s outputs might not always adhere to the constitution’s ideals. But we think that the way the new constitution is written—with a thorough explanation of our intentions and the reasons behind them—makes it more likely to cultivate good values during training,” Anthropic wrote.

OpenAI adds age prediction to ChatGPT

The company announced that it will be using age prediction technology on ChatGPT consumer plans to determine if a user is under 18.

“Age prediction builds on protections already in place. Teens who tell us they are under 18 when they sign up automatically receive additional safeguards to reduce exposure to sensitive or potentially harmful content. This also enables us to treat adults like adults and use our tools in the way that they want, within the bounds of safety,” OpenAI wrote in a post.

GitLab’s Duo Agent Platform is now generally available

GitLab has made its Duo Agent Platform generally available, providing development teams with agentic AI automation that has access to an organization’s full context, standards, and guardrails.

The GA release includes Agentic Chat, providing context-aware assistance throughout the GitLab platform. Agentic Chat builds on the previously released Duo Chat, and brings in context from issues, merge requests, pipelines, security findings, and more, and can perform actions on a developer’s behalf.

For example, in the Web UI, Agentic Chat can create issues, epics, merge requests, and highlight key findings and create actionable guidance based on organizational context. Additionally, in the IDE, it can generate code, configurations, and infrastructure-as-code, as well as fix bugs, generate texts, and produce documentation.

Other ways Agentic Chat can be used are helping developers understand, configure, or troubleshoot CI/CD pipelines or create new ones, and on the security front, it can explain vulnerabilities, help with issue prioritization, and recommend fixes.

Codenotary updates its free SBOM scanning tool with capabilities that better support AI apps

Codenotary is adding new capabilities to its SBOM.sh service, which provides free analysis of software bills of materials (SBOMs).

According to the company, the updates were made in consideration of AI applications, and the tool now treats datasets as software supply chain artifacts.

“Traditional SBOM tools were built for an earlier era – focusing primarily on source code to improve visibility into the software supply chain,” said Moshe Bar, CEO and co-founder of Codenotary. “Security teams are swimming in SBOMs, but they’re not getting the actionable clarity they need — especially as AI transforms software with AI applications are built on datasets which are entirely ignored by traditional SBOMs.”

Testlio launches new AI-powered QA analysis solution

Testlio has announced the release of a new AI-driven QA analysis solution called LeoInsights.

The new platform is powered by the company’s intelligence layer LeoAI Engine, which was trained on 13 years of testing data, 2.6+ million test cases, and 600,000+ devices.

It can provide executive summaries featuring key changes, emerging risks, and critical issues, simplifying multiple QA reports into one that can be shared with leaders.

LeoInsights also offers a value calculator that quantifies efficiency gains, cost savings, and quality impact, helping QA teams better demonstrate their value to leadership. The calculator can aggregate data across workspaces, do scenario modeling with adjustable inputs, and generate PDFs that can be shared with executives for budgeting and investment discussions.

New Relic adds monitoring for ChatGPT apps

New Relic customers will now be able to monitor their custom ChatGPT apps to ensure they’re delivering the intended performance, reliability, and user experience.

“Bringing business services into the natural flow of a ChatGPT conversation is a powerful, intuitive, and revenue-generating strategy,” said Brian Emerson, chief product officer of New Relic. “But once your carefully crafted application instantiates inside ChatGPT, it traditionally enters a black box where standard browser monitoring tools can fail.”

The company went on to explain that when an app is rendered in a conversation, developers can’t see things like layout shifts or broken buttons. Additionally, security headers, content security policies, i-frame sandbox rules, and limitations on client-side storage can hide important performance and user experience data.

Google unveils new open-source standard for agentic commerce

Google has announced a new open-source standard for agentic commerce called the Universal Commerce Protocol (UCP).

Developed in collaboration with a number of commerce companies, including Shopify, Etsy, Wayfair, Target, and Walmart, UCP establishes a common language and primitives for the commerce journey between consumer surfaces, businesses, and payment providers.

“As consumers embrace conversational experiences, they expect seamless transitions from brainstorming and research to final purchase. That means it’s critical to support real-time inventory checks, dynamic pricing, and instant transactions, all within the user’s current conversational context,” Google wrote in a blog post.

Newly redesigned Slackbot is now generally available

Salesforce announced that the newly redesigned Slackbot is now generally available, offering users an out-of-the-box AI agent that lives within Slack.

“By bringing the full power of the Agentic Enterprise where billions of workplace conversations already happen every week, working with enterprise-grade AI becomes as natural as talking to a coworker,” Salesforce wrote in an announcement.

According to Salesforce, Slackbot leverages context within Slack and connected tools to help find answers, organize work, create content, schedule meetings, and take action.

Kaggle introduces Community Benchmarks to allow for custom evaluations of AI models

Kaggle has announced that it now offers Community Benchmarks, enabling AI practitioners to design, run, and share their own benchmarks for evaluating AI models.

Kaggle is a community platform run by Google that offers models and resources for data scientists and machine learning practitioners. Last year, it had introduced Kaggle Benchmarks to provide evaluations from research groups, such as Meta’s MultiLoKo and Google’s FACTS suite benchmarks.

This latest announcement extends this to the community as a whole, allowing them to create benchmarks specific to their own use cases. According to Google, AI capabilities are evolving so quickly that the existing ways of benchmarking and evaluating them aren’t able to keep up. With Community Benchmarks, the company hopes to bridge this gap and provide a more flexible and transparent framework for evaluation.

Copilot Studio Extension now available in VS Code

Microsoft has announced the general availability of its Copilot Studio Extension for Visual Studio Code.

The extension allows developers to build and manage Copilot Studio agents directly from within their IDE.

According to Microsoft, the extension is useful because developers need to have similar controls and processes when developing agents as they do for other applications: source control, pull requests, change history, and repeatable deployments.

Box Extract intelligently pulls information from unstructured content to help with workflow automation

Box announced the launch of Box Extract, which intelligently pulls information from content and saves it as metadata, helping organizations automate workflows and accelerate decision-making by making information more easily accessible.

According to the company, a lot of organizational knowledge lives in contracts, product specifications, policy documents, charts, and other types of unstructured content. Box Extract utilizes agentic capabilities and AI models from Google, Anthropic, and OpenAI to accurately extract this information.

Box explained that legacy tools often focus only on extracting text, whereas Box Extract understands document structure and meaning. It breaks the document down into components like paragraphs, tables, and charts, and then pulls out important information from those components.

Google releases TranslateGemma

TranslateGemma is a suite of open translation models built on Gemma 3. They were trained and evaluated on 55 language pairs, and were additionally trained on almost 500 language pairs as a starting point for researchers even though they have not been evaluated yet.

According to Google, TranslateGemma significantly reduces error rates in translation compared to baseline Gemma models alone.

The 4B model is optimized for mobile and edge deployment, the 12B model is optimized for consumer laptops, and the 27B is designed for maximum fidelity and can run on something like a single H100 GPU or TPU in the cloud.