Why I Paid for Copilot Premium — And It Was Confidently Bad

Microsoft is pouring an immense amount of resources into its artificial intelligence features, establishing vast data centers and securing licenses for large language models from industry leaders like OpenAI and Anthropic. The ultimate goal, as dictated from the highest echelons of Redmond, is to transform Windows and Microsoft 365 into an “agentic OS.” This sophisticated system aims to automate the tedious tasks that often plague corporate life, from drafting memos and building presentations to organizing meetings and streamlining routine operations.

While developers seem to be enjoying significant productivity boosts from tools such as Claude Code and GitHub Copilot, the AI agents designed for business applications appear to be lagging in competence. Over the past few weeks, I’ve put the AI features in Microsoft 365 and Windows to the test with various everyday work tasks. My experience revealed that while Copilot occasionally exhibits flashes of brilliance, it more frequently delivers a chaotic mix of misinformation, outright hallucinations, and frustrating dead ends.

Putting Premium Copilot Agents to the Test

For months, Microsoft had been nudging me to upgrade to its new Microsoft 365 Premium plan, which promises higher AI usage limits and exclusive agents. In the spirit of journalistic inquiry, I invested $10 to upgrade an unused account for a month to experience these features firsthand. My first trial involved the Analyst agent, to which I provided a spreadsheet tracking my household income and expenses, requesting advice on design improvements.

After a bit of back-and-forth clarifying my objectives, the agent offered several valuable suggestions. These included tightening up formulas, consolidating duplicate tables, and eliminating redundant pages, culminating in a bold offer to construct a dashboard using only formulas and pivot tables. Copilot declared, “If you want, I can sketch a clean dashboard layout (exact cells and sections) tailored to your data so you can build it in ~15 minutes.”

I was under the impression that these agents were supposed to perform the work for me, not just sketch it out. So, I explicitly asked, “Can you build the actual Excel file for me or do I have to do that myself?” Copilot confidently responded in the affirmative, noting only one minor limitation: I’d have to create one pivot myself, a task it assured me would take “less than 10 seconds.” A minute or two later, Copilot announced, “I’ve created your modified workbook. Download it here,” followed by a cryptic and utterly unusable link: [sandbox:/mnt/data/Personal_accounts_modified.xlsm].

When I pointed out the non-clickable “sandbox path,” Copilot admitted its error: “Yeah — that’s on me… I’m going to regenerate the file and make sure it comes through as a real downloadable attachment.” Despite several more attempts, Copilot eventually conceded defeat, stating, “The file is ready. However, I need to be transparent: your chat interface is currently not rendering downloadable file attachments correctly… I did generate the file successfully, but it isn’t appearing as a clickable download in your UI.” The chatbot even suggested the link might have worked in ChatGPT or, astonishingly, that I could create the file in Google Sheets and send myself a link – a workaround that surely isn’t part of Microsoft’s grand vision.

The Research Agent’s Confusion

Next, I tasked the Microsoft 365 Premium Researcher agent with providing a concise explanation of the pros and cons of Microsoft 365 Premium. I was completely unprepared for its response: “To make sure I cover exactly what you need: Which specific plan do you mean by ‘Microsoft 365 Premium’? A) Microsoft 365 Personal, B) Microsoft 365 Family, C) Microsoft 365 Business Premium, or D) compare the consumer plans (Personal/Family) and briefly note Business Premium only if relevant.”

It was baffling. I was engaging with a premium feature of a product Microsoft is actively promoting, yet the agent had no clue what I was referring to. It felt akin to a job candidate responding to “Tell me about yourself” with “I don’t know her.” Only after I provided a link to the product page did Copilot deliver a rather generic summary of the new subscription’s features, compiled from a handful of third-party sources. This hardly qualified as “research,” and it certainly lacked any depth.

A Vibe-Sysadmin Fails

One of the most amusing, and often frustrating, aspects of AI chatbots is their unwavering confidence, even when they’re demonstrably wrong. When I challenged Copilot’s non-working instructions, its confidence never wavered, as it simply moved on to new, equally confident (and often flawed) suggestions. This morning, I faced a “server name on the certificate is incorrect” error while trying to connect to a computer on my office network using Remote Desktop. After some initial troubleshooting on my own, I decided to enlist Copilot’s help to “vibe-sysadmin” my way through it.

Copilot confidently declared, “The fix is straightforward,” explaining I merely needed to force Windows inside the VM to generate a new Remote Desktop certificate. It then provided “clean, reliable ways to do it,” which, predictably, did not work. Undaunted, Copilot interpreted this failure as “meaningful,” rattling off three likely reasons and concluding with, “Let’s fix it cleanly and surgically.” After a series of PowerShell commands and a reboot, I was still unable to connect, albeit now with a different certificate error. “Ah — that tells me exactly what’s happening now,” Copilot proclaimed, following another lengthy explanation with, “Let’s fix that cleanly.”

This cycle continued for about 20 minutes and half a dozen reboots of the virtual machine. Each failure sparked a new “AI epiphany” from Copilot, invariably accompanied by bold headings such as “Why I’m confident this is the right path,” “Why this is the correct fix,” and “Why this is the only explanation left.” Despite its unwavering confidence, Copilot remained stubbornly wrong. None of its suggested fixes worked. I finally told it to stop, re-examined the connection settings myself, and simply cleared one checkbox. That was it.

While I did pick up a few PowerShell commands for managing certificates and refreshed my understanding of how Windows handles them, the primary lesson learned was a stern warning against seeking Copilot’s assistance for this level of troubleshooting. Perhaps someday Copilot will achieve artificial general intelligence. For now, I’d happily settle for artificial general common sense, a destination that currently seems many stops away.

Source: ZDNet – AI

Kristine Vior

With a deep passion for the intersection of technology and digital media, Kristine leads the editorial vision of HubNextera News. Her expertise lies in deciphering technical roadmaps and translating them into comprehensive news reports for a global audience. Every article is reviewed by Kristine to ensure it meets our standards for original perspective and technical depth.

Putting Premium Copilot Agents to the Test

The Research Agent’s Confusion

A Vibe-Sysadmin Fails

Kristine Vior

Related Posts