A skeptic's review of Cursor

I recently wrote a blog post where I criticized the use of AI coding assistants on the premise that coding is mostly about thinking and not actually much about writing. Most of the criticisms in response to the post were some variation on this idea, "If you would just use an AI coding assistant then their value would be obvious."

While I have used coding assistants in the past, I understand that it's a rapidly evolving area with a lot of money being poured into it. A lot of proponents seem to like Cursor, so in this post I'm going to use it to explore a few typical scenarios where a coding assistant might be useful.

Basic Editing

The first thing that I tried with Cursor was basic editing operations like "Remove the line following the third line." I know that AI basically works based on probabilistic pattern matching, and I wanted to set a baseline for whether or not Cursor is able to kind of do simple things reliably without introducing extra noise or doing something that sounds like the right thing but isn't. There were also a few people who made the argument that when they use coding assistants, they do their own thinking, but they use it to speed along editing code.

Obviously, this kind of editing isn't particularly useful, but it's worth noting that Cursor did well on these tests. While I don't really know too much about it, I suspect that Cursor has some correction and/or post-processing layers to make simple tasks have more deterministic outputs (we'll see later how Cursor loses determinism for some other kinds of simple tasks that require a little bit of thought).

Basic Refactoring

The next thing that I tried with Cursor was doing simple refactorings. This time, I was still checking for reliability in things that would be simple for a human to do, but also thinking about utility: Big refactorings that require making many edits can be pretty time consuming.

I wrote a couple of toy TypeScript files and asked Cursor to do some basic conversions, like changing arrow functions to regular functions and back. Cursor also did really well with this, even converting functions in interfaces and classes. These were functions like:

export const sayHappyBirthday = () => {
    console.log("Happy birthday!");
};

export const square = (n : number) => n * n;

Now, does this have value? Obviously, there's some value in being able to automate simple refactorings, but there are many other tools for doing the same thing. For example, changing arrow functions to regular functions and back is a pretty easy thing to make a macro for in Vim. So, it feels like maybe Cursor isn't all that useful here, but obviously there are limits to what we can do with preexisting macros and automated refactorings. Let's up the difficulty a bit more and see if we can get a clear win for Cursor.

Simple refactorings with a little nuance

Given the example above, what about refactorings that other tools can't do very well? Like, what if we tried something that looks like a rename but put a little spin on it such that we don't want to rename every occurrence but only occurrences in a specific context?

I asked Cursor to "Rename the prefix foo_bar_baz to foo_baz_boo except in expressions." Here's the resulting diff:

@@ -2,16 +2,16 @@
 import "foo.bar.baz/bacon.ts";
 
 export const somethingInFooBar = () => {
   if (process.env.FOO_BAR_BAZ_BACON)
-    console.log("Some unrelated FOO_BAR_BAZ_BACON");
+    console.log("Some unrelated FOO_BAZ_BOO_BACON");
 };
 
 export const somethingSomewhereElse = () => {
   if (!process.env.FOO_BAR_BAZ_BACON)
-    console.log("This is something somewhere else FOO_BAR_BAZ_BO_BACON");
+    console.log("This is something somewhere else FOO_BAZ_BOO_BO_BACON");
 };
 
-export const fooBarBazSquare = (x: number) => x * x;
+export const fooBazBooSquare = (x: number) => x * x;
 
-export const fooBarBazBoCube = (y: number) => y * y * y;
+export const fooBazBooBoCube = (y: number) => y * y * y;
 
-export const fooBarBazBoMaxBacons = 12;
+export const fooBazBooBoMaxBacons = 12;

So, Cursor manages to avoid the renaming the environment variables, but still renames the console.log arguments. It also renames the prefix in fooBarBaz in fooBarBazBo despite the fact that that's part of a different, longer prefix, and it misses the import (although it picks up the import with variations on the prompt).

These aren't crazy mistakes to make — you could argue that because of the ambiguity of the prompt, they're not even mistakes. But they're the same kinds of issues that I have to think about when I blast my code with sed substitutions. In these cases, it feels like I'm essentially using Cursor to define macros and substitutions, and the actual work of writing the prompts is not much different from the work of writing the macros and substitutions myself. But, trying to get Cursor to do exactly what I want is definitely harder than just doing what I want. Furthermore, I'd prefer the substitutions and macros because I can save them and know that they'll work deterministically later, whereas Cursor might change its behaviors or models, which means I'll have to reauthor my prompts.

Asking Cursor to reason about refactorings instead

Alright, so when I tried to micromanage Cursor, I didn't give it enough details about what I wanted to make it work, but spelling out all those details as prose is actually just coding but slowly and indirectly. Well, if me telling Cursor exactly how to make changes is too slow, then let's try the alternative — let's see if we can get Cursor to do it. Here's the prompt that I used:

"bar.ts is part of an Express.js application containing multiple vhosts. Each vhost contains its own files and uses the vhost's name to namespace different the aspects of that specific vhost. This vhost's name is "foo.bar.baz", but it may contain code that references other vhosts. Rename this vhost "foo.bar.bo"

(there's a little typo in there)

This is probably pretty close to how I'd describe this to another developer. Unfortunately, Cursor falls pretty flat on its face here: It makes the change to the import, and then starts asking me if it should change some other occurrences of the "foo bar baz" prefix seemingly at random. Asking clarifying questions makes sense, but it's asking me exactly what to do, so we're back to coding slowly and indirectly.

It's almost certain that given enough context, Cursor could make this work, and there's probably a way for me to store that context somewhere in Cursor so that it can be reused. However, I suspect that Cursor's inability to understand nuances will generally carry forward — even if I gave it enough context to understand this, I would probably have to do a lot of explaining on a case by case basis to get things right in general. Again, that just sounds like slowly, indirectly coding.

A lot of people described coding assistants as "converting my thoughts directly to code." My experience has been the opposite: "converting my code thoughts into prose and then trying to get Cursor to reproduce them faster than I can."

Code insights

What about using Cursor to get insights into code?

for (let i = 0; i < 10; i++) {
    console.log(i);
}

I first tried this or something very similar (I lost the original example), and asked Cursor if there were any bugs in the code. It told me:

This would be better if I dropped the curly braces around the for loop
The variable i has the type any and needs an explicity type annotation
The for loop was not valid TypeScript syntax at the top of the file and needs to be wrapped in a top-level function of some kind.

Which is mostly wrong:

i's type is inferred to be a number there (leaving off type annotations when assigning to literals is fairly idiomatic TypeScript.)
Obviously lots of people use curly braces around loop bodies, even when they're only a single statement (in fact this is what Cursor autocompleted to).
for loops are perfectly valid at the top of a TypeScript file.

Worth noting that when I reran this today, it did better and told me there were no issues.

But, this stuff is trivial for professionals, and when I tried to get Cursor to recognize more nuanced things, like an element referenced before the DOM has loaded (a la the previous article), it was nowhere on its radar.

Big stuff

What about really big stuff where there's value in getting it mostly right or where Cursor can iterate on corrections on its own? Maybe for those uses it could be worth the extra time spent prompting? Or, maybe if we ask Cursor to try to verify its own work it'll be able to fix its own mistakes to produce better result?

Yesterday, I spent some time converting my blog from Next.js for use with my Express.js VHost setup. Here's a prompt that I issued to see if I could get Cursor to do something similar:

This project is a blogging application built for Next.js. Static assets are under the public/ directory and the src/images/ directory. Articles are in the src/app/articles directory and are mdx files (which are markdown files with React capabilities). Other subdirectories under src/ are TypeScript React files. This was originally created from the Tailwind Spotlight personal website template. The scripts in the package.json file show how the website is built and how the development browser is run.

Convert this application to an Express.js application.

The resulting application should:

Be able to bundle and serve the frontend code and rebundle when the code changes

Serve articles from the Express.js server

Serve an RSS feed like the original does

Serve Open Graph meta tags like the original does

Be performant

Articles should be listed on the home page like the original does

Articles should each be viewable on their own page like the original

Have test coverage that verifies all functionality

Run the TypeScript compiler once the application has been converted to verify that the resulting code typechecks. After that, generate tests and run the generated tests, and if they fail then fix the errors.

Time passes...

Alright, here's some stuff that immediately pops out at me:

Cursor decided to build this on an old version of React (18). This is an okay choice, but in my opinion the resulting code for something like this would be simpler and require less maintenance if you dropped the frontend JS. If we're going to use React then we should probably be on the most recent version.
Cursor used an old version of Express (4)
Cursor prompted me to install a bunch of NPM packages that I don't normally use and I'm unfamiliar with. I then had to go manually verify that all of those packages were legit. This is probably something that I could mitigate by running Cursor with a preexisting package.json file, but that's also more context that I need to feed it. It also added @types/gray-matter to the list of dependencies, which apparently doesn't exist. So, that doesn't give me a ton of confidence that it's only going to install well-known, uncompromised packages.

Specifics aside, Cursor didn't generate anything that will run or compile. When I attempted to run the project, I got an error about the package file specifying the wrong module type. When I review the code, I can see that there are a bunch of other issues here, too:

It's reloading articles from the filesystem on each request including for the RSS feed
It's referencing server side endpoints that it never wrote
None of the styling from the original blog seems to have been perserved, which was really the point of the task

Basically, this is a big pile of shit, and it's far worse than what I could easily find on GitHub.

Conclusion

I still feel mostly the same about coding assistants after playing with Cursor. The problem isn't Cursor itself, which is an undeniably powerful tool. The problem is that it's just not useful to me. In the best case scenario, Cursor generates code from detailed descriptions of code, and those descriptions take longer to write and verify than the code would. In the worst case scenario, Cursor takes simple problems and makes them complicated by introducing bugs, maintenance issues, code bloat, performance issues, feeding the programmer incorrect information, and so on. The problem, as we saw above, is that it doesn't take much complexity before code edits start requiring thought. And, since coding assistants don't think, in order to get it to do what we want we have to tell them everything.

However, I think they could also be useful for certain conversions or translations. Taking something that's already fully specified in code and converting it to some other language with more or less equivalent language features seems like a scenario where we could sidestep the need to fully specify the code when prompting. Certain kinds of refactorings where we're only changing between more or less equivalent code constructs probably also fall into this category (similar to the arrow function example but with something that's harder to target with other automation).