My AI Pair Programming Experiment: A Reflection on Efficiency and Creativity
by Teobler on 03 / 09 / 2025
views
The Beginning
I first saw code and AI come together around 2019. A teammate started using Tabnine. I was amazed by how magical it was. We’d be pairing, and I’d watch him use it. Sometimes it was dumb, sure. But a few times it completed an entire function or a whole test at once. That might seem like nothing now. But back then, it was my first real wow moment with AI programming.
Fast forward to the end of 2023. AI programming was heating up fast. I got GitHub Copilot for free and really started using an AI coding tool for the first time. Copilot is pretty dated by today's standards, you know. But its completion ability blew me away at the time. Using comments to generate methods or tests became normal. So did asking it for help with refactoring. And since it was free, it's still my backup tool today.
I fell out of touch with AI tools for a while after that. I was lazy, and my company didn't support them for security reasons. So I had no chance to use them in my daily work. But I did try Cursor in my free time and got a taste of being a "Tab Engineer."
Now it’s 2025. Models have gotten much better, and agent capabilities are growing. The competition between AI programming tools is now about user experience and engineering. And the engineering part is mostly about how you manage context for the agent.
Things move so fast. Luckily, I browse Twitter a lot, so I haven't fallen too far behind. But watching from the sidelines isn't the same as getting your hands dirty. I remembered I'd spent the last year learning about investing. So I decided to jump in and build an investment system that fits my habits, all with the help of AI tools.
Finding the Right Tool
I’m calling this comparison of different tools "finding the pig." It’s about the hands-on experience. My takes here might not be perfectly accurate, so feel free to correct me.
Cursor
It's been a while since I last opened Cursor. And wow, everything related to AI moves ridiculously fast. Cursor has added a lot of new features since I last used it. But its core feeling hasn't changed. Cursor feels more like a code modification machine with no emotions.
First off, Cursor completely crushed GitHub Copilot when it appeared. Let's go back to my two points: user experience and engineering. From a user experience standpoint, both are based on VSCode. But Cursor’s customizations push it towards automation. That's a big step up for the user.
And for engineering, Cursor massively improved the model's context. Copilot just looks at the current and recent files. But Cursor creates a vector index of the entire codebase. This helps the model locate the right code for a user's question much more accurately. And it's not just limited to the current file.
As a bonus, users can easily add their own context. Just drag and drop or use the @ symbol. It feels very intuitive.
Now for my subjective feelings. After learning a few tips, I started using it. It felt like it could handle the coding tasks you gave it really well. But verifying the work and following conventions? That still depends heavily on the user. It just wants to finish the task.
Even though it has project tools like PR reviews and rules, its main focus is still on completing the code change the user asked for. Most of the time, it gives me a "Just tell me if the code is written yet or not" kind of vibe.
Cursor's reputation has been slipping lately. So, even though I haven't used every single feature in-depth, I decided to look for other tools.
Augment
I'm not sure how much Augment borrowed from its predecessors. But the final result is really great. On the engineering side, Augment uploads your entire codebase to the cloud for real-time indexing. They claim it can update in seconds. This gives the model the very latest version of your code.
It takes context a step further than Cursor. They use a special context model, not a generic embedding one. They say their special method finds the most useful context to solve a problem, not just a simple semantic search.
Also, Augment's Context Lineage feature includes all Git history as part of the context. This is like having knowledge that's "settled but always changing."
After all that, my very personal feeling is that Augment "gets me" better than Cursor, even with the same underlying model.
Let's talk about user experience. Augment has many features that solve real pain points. They are impressive.
- The Tasklist feature suggests that you should break down complex requests. The agent also breaks down complex problems into tasks, so you can see what it's doing.
- Next Edit was my next wow moment. It predicts where you'll edit next based on the current context. You hit Tab, and your cursor moves to the right spot with a suggestion. The first time I used it, I thought it was magic.
- Easy MCP lets you integrate popular and useful MCPs and platforms with just a few clicks. It treats your daily workflow as part of the problem to solve.
That's the objective stuff. Subjectively, Augment feels more like an excellent intern. It thinks about the bigger picture of software engineering, not just the code itself.
It easily integrates with different platforms. It quickly connects to common MCPs. It actively verifies that its changes are correct. With these features, Augment doesn't just help you write code. It creates a workflow: "task -> modify -> verify."
Combined with its CLI/TUI, it doesn't just change code as you ask. It actively uses the terminal to run tests, execute commands, and verify results. It closes the engineering loop within the AI process.
Another interesting thing is that every time it adds a feature or fixes a bug, it summarizes it in a doc in the repo. It records the state at that moment. These docs can get outdated as the code changes, but it does help with the problem of not liking to write documentation.
From all this, you can see that Augment Code is essentially an infrastructure and data company disguised as an AI coding tool. Its core strength comes from its backend system, which is invisible to the user.
This system solves a very hard problem: providing real-time, personalized, and historically-aware context at scale. Their blog posts repeatedly downplay the specific LLM they use. But they spend a lot of time explaining their "Context Engine," "Context Lineage," and custom inference stack.
It's easy to see their strategy. They see the LLM as a replaceable part. The infrastructure that provides context is their real, defensible asset. The value is shifting from the model to the application. And better models will only make their product stronger.
Claude Code
After using Augment for a while, I almost decided to stick with it long-term. But I couldn't resist trying the highly-praised Claude Code. Claude Code's style is completely different from the other two. This is true for both its user experience and its engineering.
From a user experience perspective, it's not an IDE or a plugin. It's a CLI tool. It can directly edit files, run commands, and create commits. It puts the "AI Agent" right into the developer's terminal workflow.
This means the user experience is a step down from an IDE, for sure. But this approach offers amazing compatibility. For example, I've used IntelliJ for a long time, so I can't really use VSCode-based tools. A CLI doesn't have that problem. This method works for all developers. Front-end, back-end, it doesn't matter. You always have to use the terminal. And the terminal is part of every developer's workflow.
And speaking of terminals, I have to mention Warp. It provides a user experience that crushes other terminal apps. Its AI features made it even better. But sadly, Warp is totally separate from editors, which makes the development experience feel too disconnected.
Anyway, back to the tool. Unlike Cursor and Augment, which index the whole repository, Claude Code doesn't build an index at all. All context searching is done by the agent through "Agentic Search." And "Agentic Search" is basically just using simple unix command-line tools, like grep.
The community has questioned this decision. People worry about high token costs, inaccurate searches, and long wait times. I haven't analyzed the difference in detail. But when I asked Claude Code to write missing tests and fix existing failed ones, it burned through 7 million GLM-4.5 tokens in an hour and a half. I can't say if the problem is Claude Code's strategy or if GLM is just too dumb. :)
Since it's a CLI tool, it works across the entire development pipeline. It can do so much in the terminal. Running commands and tests with bash is easy, of course. It can write a quick script to verify a new feature is complete or a bug is fixed. It can start a service in the background and use various MCPs to check if the code is correct and feasible.
After using it for a while, my feeling is that Claude's own models might be better suited for this "brute-force" solution. Using Sonnet and GLM definitely gave me different coding experiences. But I haven't tested GLM in the other tools, so that's just a guess.
As for its only problem right now, it might be payment. It can be difficult. You can either use an API, official or unofficial, but that can easily cost dozens of dollars a day. Or you can subscribe to a plan. But I've seen many people get their accounts randomly suspended. So you better have a stable IP and payment method.
Final Thoughts
The explosion of AI programming tools this year is another landmark event. It's making us change our old ways of writing code. With the support of these tools, we need to find a new paradigm.
I didn't mention this earlier, but these tools are a cheat code in some ways. But if you think you can just let the AI handle everything, you're wrong. All those memes about AI-generated "mountains of crap code" will become your reality.
I tried my best to explore each tool's capabilities while building my personal project. But I have to admit I didn't spend much time digging into Cursor. Subjectively, I think the three tools have different focuses: Cursor is more like a "code modification machine with no emotions." Its value is in quickly making the changes a developer asks for. The in-IDE experience is smooth, but it lacks a broader engineering mindset. It's perfect for quickly adding new features to an existing architecture.
Augment tries to look at the whole software engineering process. It uses full-repo indexing, a task-driven approach, and a CLI/TUI to connect the "modify—verify—commit" loop. It's like an energetic and capable intern helping you move the project forward.
And Claude Code completely breaks free from the editor. It's centered around the CLI, offering total flexibility and transparency across the whole pipeline. It’s more like a junior engineer you can delegate tasks to.
The development of AI programming has moved from "code completion" to "agents." Now that everyone can use the same models, the competition is no longer just about model capability. It’s about how to integrate intelligence into a developer's real workflow.
Just like that feeling of amazement when I first saw Tabnine's completions. Maybe we are at the beginning of another wow moment right now. Perhaps in a few years, we'll look back and see Cursor, Augment, and Claude Code as just transitional tools. But for us today, they are already reshaping the act of writing code.
Reference
- https://www.augmentcode.com/blog
- https://www.anthropic.com/engineering/claude-code-best-practices