




Over the past month, there were multiple reports that Claude Code responses had worsened for some users. These reports were taken seriously and investigated in detail. The initial assumption was not that the model itself had changed, since the API and inference layer remained unaffected.
After investigation, the issue was traced to three separate changes. These changes affected different parts of the system, including Claude Code, the Claude Agent SDK, and Claude Cowork. Each change was introduced at a different time and impacted different sets of users.
Because of this, the overall experience did not degrade in a uniform way. Some users noticed weaker responses, while others saw issues with memory or consistency. Early on, these signals were difficult to separate from normal variation in feedback, and internal evaluations did not immediately reproduce the same problems.
All three issues have now been identified and fixed as of April 20. The purpose of this breakdown is to clearly explain what changed, how those changes affected behavior, and what has been done to address them.
The issue did not come from a single change. It emerged from a series of updates introduced over time, each affecting a different part of the system. These changes did not roll out together, and they did not impact all users in the same way.
Because of this, the feedback was inconsistent. Some users reported shorter or less detailed responses. Others noticed issues with memory, repetition, or continuity across sessions. In many cases, the problem only appeared under specific conditions, which made it harder to trace.
This is also why early discussions around the issue, including in recent Claude code news, did not point to a clear root cause. The signals looked scattered and did not immediately indicate a system-level problem.
Internal evaluations added to the challenge. The issues were not easy to reproduce in controlled testing, and early signals appeared similar to normal variation in user feedback. It took time to connect these reports and identify that multiple changes were interacting in ways that were not obvious during rollout.

In early March, the default reasoning effort in Claude Code was changed from high to medium. This setting controls how much time the model spends thinking through a task before responding. In general, more reasoning leads to better outputs, but it also increases latency and token usage.
The change was introduced to address a real issue. Some users experienced long response times in high-effort mode, and in certain cases the interface appeared unresponsive while the model was still processing. Moving to medium effort reduced these delays and improved overall responsiveness.
Internal testing supported this decision. Medium effort showed only a slight drop in intelligence for most tasks, while significantly improving speed and reducing the likelihood of long delays. It also helped users stay within their usage limits more efficiently.
However, user feedback after the rollout told a different story. Many users felt that responses were less thoughtful and less reliable, especially for complex tasks. While the option to switch effort levels was available, most users continued using the default setting.
In response to this feedback, the decision was reversed. As of April 7, higher reasoning effort was restored as the default, with even higher settings available for more demanding use cases.
Claude Code relies on prior reasoning to stay consistent across a session. Each step builds on earlier decisions, which allows the system to maintain continuity in longer tasks.
On March 26, a change was introduced to improve efficiency when users returned to sessions after a period of inactivity. The idea was to clear older reasoning from sessions that had been idle for over an hour. This would reduce the number of tokens sent in the next request and improve performance when resuming work.
The approach itself was valid, but the implementation had a flaw. Instead of clearing older reasoning once, the system continued to remove past reasoning on every turn after the session crossed the idle threshold. This meant that Claude gradually lost access to its own prior decisions.
In practice, this showed up as forgetfulness and repetition. The system would continue responding, but without a clear understanding of what it had already done. In some cases, even ongoing tasks lost consistency because earlier reasoning was no longer available.
This issue also had a secondary impact. Since reasoning history was repeatedly dropped, more requests resulted in cache misses. This likely contributed to reports of usage limits being consumed faster than expected.
The problem was difficult to detect because it appeared only in specific scenarios, such as sessions that had been idle and then resumed. It also interacted with other internal changes, which made reproduction harder during testing.
The issue was fixed on April 10. This update addressed the repeated clearing of reasoning and restored normal context handling.
Ahead of the release of a newer model version, the team worked on reducing response verbosity. The newer model tended to produce longer outputs, which increased token usage and affected response time in some cases.
To manage this, a system-level instruction was introduced. It set limits on how long responses should be, both during intermediate steps and in final outputs. The intention was to keep responses concise while maintaining quality.
Initial testing did not show major issues. The change appeared to control output length without affecting performance in the evaluation set that was used at the time.
However, broader testing during the investigation revealed a different outcome. The constraint limited how much the model could explain its reasoning, especially in coding tasks. Responses became shorter but also less complete and less useful for complex problems.
This impact was measurable. Further evaluation showed a drop in performance, which led to a decision to revert the change.
The prompt constraint was removed on April 20. This restored the model’s ability to provide more detailed and complete responses where needed.

Claude Code CLI interface screen
Each of these changes was introduced at a different time and affected different parts of the system. They did not impact all users in the same way, and they did not appear together in a single, clear pattern. This made the overall experience difficult to interpret.
Some users experienced reduced reasoning depth due to the change in effort settings. Others ran into context loss in longer or resumed sessions. In parallel, response length constraints affected how clearly the system could explain its outputs. Depending on how and when someone used the product, they might encounter one, two, or all of these issues.
This overlap created the impression that the system as a whole had become less reliable. The behavior felt inconsistent because the underlying issues were not uniform. One session could work as expected, while another could show noticeable gaps in reasoning or continuity.
This is why much of the discussion around this period, including broader Claude AI news, pointed to a general decline in quality. In reality, the model itself had not changed. The perception of degradation came from how these system-level changes interacted in real usage.
The team has outlined a set of changes to reduce the chances of similar issues in the future. These changes focus on how updates are tested, validated, and rolled out across the system.
A larger share of internal teams will now use the same production build that users interact with. This helps surface issues earlier, especially those that may not appear in isolated testing environments. The code review process is also being improved, with better support for evaluating changes across full repositories and not just limited contexts.
There will be stricter controls around system prompt updates. Each change will go through broader evaluation, including targeted testing to understand how individual prompt instructions affect performance. This is important because small prompt changes can have wider impact than expected.
Rollouts will be more gradual, with additional monitoring before changes are fully deployed. This includes longer validation periods and expanded evaluation coverage to catch issues that may not appear immediately.
These adjustments focus on reducing blind spots in testing and ensuring that system-level changes are better aligned with real-world usage.
This situation highlights a pattern that shows up in many real-world AI systems. The model may remain unchanged, but the system around it can still affect how it performs in practice.
Changes to reasoning depth, context handling, or response structure can influence output quality in ways that are not always visible during testing. These effects often appear only when the system is used across longer sessions, varied tasks, or real user workflows.
If you are building AI-driven products, this raises a practical question. Are you evaluating only the model or the full system that shapes its behavior?
In production environments, decisions around efficiency, cost, and responsiveness need to be balanced carefully. Reducing context, limiting reasoning, or constraining outputs may solve one problem but can introduce others that affect reliability.
This is where a data-first approach becomes important. Complete and consistent context allows the system to maintain continuity. Clear reasoning pathways help preserve decision quality. Without these, even a strong model can appear inconsistent.
The key takeaway is simple. Model capability alone does not define system performance. How the system manages context, reasoning, and constraints plays an equally important role.

