Two kinds of AI users are emerging - Comments

Two kinds of AI users are emerging

simmerup

Terrifying that people are creating financial models with AI when they don’t have the skills to verify the model does what they expect

martinald

They have an excel sheet next to it - they can test it against that. Plus they can ask questions if something seems off and have it explain the code.

fatheranton

[dead]

nebula8804

All we need is one major crash caused by AI to scare the capital owners. Then maybe us white collar workers can breath a bit for at least another few more years(maybe a decade+).

mkoubaa

It's not terrifying at all, some shops will fail and some will succeed and in the aggregate it'll be no different for the rest of us

myfakebadcode

I’m trying to learn rust coming from python (for fun). I use various LLM for python and see it stumble.

It is a beautiful experience to realize wtf you don’t know and how far over their skis so many will get trusting AI. The idea of deploying a rust project at my level of ability with an AI at the helm is is terrifying.

derrida

Business as usual.

taneq

If they have the skills to verify the Excel model then they can apply the same approach to the numbers produced by the AI-generated model, even if they can’t inspect it directly.

In my experience a lot of Excel models aren’t really tested, just checked a bit and them deemed correct.

superkuh

The argument seems to be that having a corporation restrict your ability to present arbitrary text directly to the model and only being able to go through their abstract interface which will integrate your text into theirs (hopefully) is more productive than fully controlling the input text to a model. I don't think that's true generally. I think it can be true when you're talking about non-technical users like the article is.

majormajor

The use of specialization of interfaces is apparent if you compare Photoshop with Gemini Pro/Nano Banana for targeted image editing.

I can select exactly where I want changes and have targeted element removal in Photoshop. If I submit the image and try to describe my desired changes textually, I get less easily-controllable output. (And I might still get scrambled text, for instance, in parts of the image that it didn't even need to touch.)

I think this sort of task-specific specialization will have a long future, hard to imagine pure-text once again being the dominant information transfer method for 90% of the things we do with computers after 40 years of building specialized non-text interfaces.

Havoc

The copilot button in excel at my work can’t access the excel file of the window it’s in. As in “what’s in cell A1” and it says I can’t read this file. Not even sure what the point is then frankly.

I’m happily vibe coding at work but yeah article is right. MS has enterprise market share by default not by merit. Stunning contrast between what’s possible and what’s happening in big corp

cmrdporcupine

Meanwhile the people I know who work at Microsoft say there's a constant whip-cracking to connect everything they're doing to "AI" and prove that's what they're doing.

bwat49

yeah I actually use AI a lot, but copilot is... useless. When microsoft adds copilot to their various apps they don't seem to put any thought/effort behind it beyond sticking a copilot button somewhere.

And if the copilot button does nothing but open a chat window without any real integration with the app, what the hell is the point of that when there's already a copilot button in the windows taskbar?

s-lambert

I don't see a divergence, from what I can tell a lot of people have only just started using agents in the past 3-4 months when they got good enough that it was hard to say otherwise. Then there's stuff like MCP, which never seemed good and was entirely driven by people who talked more about it than used it. There also used to be stuff like langchain or vector databases that nobody talks about anymore, maybe they're still used but they're not trendy anymore.

It seems way too soon to really narrow down any kind of trends after a few months. Most people aren't breathlessly following the next twitter trend, give it at least a year. Nobody is really going to be left behind if they pick up agents now instead of 3 months ago.

neom

Not sure how much falling behind there is even going to be, I'm an old school linux type with D- programming skills, yet getting going building things has been ridiculously easy. The swarms thing makes is so fast. I've churned 2 small but tested apps out in 2 weekends just chatting with claude code, the only thing I had to do was configure the servers.

Gigachad

The only people I see talking about MCP are managers who don't do anything but read linked in posts and haven't touched a text editor in years if ever.

_1tan

What‘s used instead of MCP in reality? Just REST or other existing API things?

NitpickLawyer

While I agree that the MCP craze was a bit off-putting, I think that came mostly from people thinking they can sell stuff in that space. If you view it as a protocol and not much else, things change.

I've seen great improvements with just two MCP servers: context7 and playwright. The first is great on planning sessions and leads to better usage of new-ish libraries, and the second is giving the model a feedback loop. The advantage is that they work with pretty much any coding agent harness you use. So whatever worked with cursor will work with cc or opencode or whatever else.

defrost

The "upside" description:

  On the other you have a non-technical executive who's got his head round Claude Code and can run e.g. Python locally.


I helped one recently almost one-shot converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.
Once the model is in Python, you effectively have a data science team in your pocket with Claude Code. You can easily run Monte Carlo simulations, pull external data sources as inputs, build web dashboards and have Claude Code work with you to really integrate weaknesses in your model (or business). It's a pretty magical experience watching someone realise they have so much power at their fingertips, without having to grind away for hours/days in Excel.
almost makes me physically sick.

I've a reasonably intense math background corrupted by application to geophysics and implementing real world numerical applications.

To be fair, this statement alone:

* 30 sheet mind numbingly complicated Excel financial model

makes my skin crawl and invokes a flight reflex.

Still, I'll concede that a Claude Code conversion to Python of a 30 sheet Excel financial model is unlikely to be significantly worse than the original.

decimalenough

I'm almost certain it will be significantly worse.

The Excel sheet will have been tuned over the years by people who knew exactly what it was doing and fixed countless bugs along the way.

The Claude Code copy will be a simulacrum that may behave the same way with some inputs, but is likely to get many of edge cases wrong, and, when you're talking about 30 sheets of Excel, there will be many, many of these sharp edges.

majormajor

One of the dirty secrets of a lot of these "code adjacent" areas is that they have very little testing.

If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output? Or maybe you'll be too worried about getting chided for "not being data driven" enough.

If an exec tells an intern or temp to vibecode that thing instead, then you definitely won't have any checkpoints in the process to make sure the human-language prompt describing process was properly turned into the right simulation. But unlike in coding, you don't have a user-facing product that someone can click around in, or send requests to, and verify. Is there a test suite for the giant excel doc? I'm assuming no, maybe I'm wrong.

It feels like it's going to be very hard for anyone working in areas with less black-and-white verifiability or correctness like that sort of financial modeling.

ChrisMarshallNY

Obligatory xkcd: https://xkcd.com/1667/

bitwize

The thing is, when you use AI, you're not really doing things, you're having things done. AI isn't a tool, it's a service.

Now, back in the day, IBM designed and built an "executive data terminal". It wasn't really a computer terminal in the sense that you and I understand it. Rather, it was a video and two-way-audio feed to a room with a team of underlings, which an executive could ask for business data and analyses, which could be called up on a computer display (also routed to the executive's office). This allowed the executive to ask questions so he (it was the 1960s, it was almost invariably a he) could make informed decisions, and the team of underlings to call up data or crunch numbers on the computer and show the results on the display.

So because executives are used to having things done for them, I can totally see AI being used by executives to replace the "team of underlings" in this setup—in principle. The fact is that were I in that CEO's chair, I'd be thinking twice before trusting anything an LLM tells me, and double-checking those results—perhaps with my team of underlings.

Discussed on Hackernews: https://news.ycombinator.com/item?id=42405462 IEEE article: https://spectrum.ieee.org/ibm-demo

PunchyHamster

we're going from "bad excel sheet caused recession" to "bad vibe-coded financial thing caused recession"

ed_mercer

> Microsoft itself is rolling out Claude Code to internal teams

Seems like Nadella is having his Baller moment

fdsf2

Nothing but ego frankly. Apple had no problem settling for a small market share back in the day... look where they are now. It didnt come from make-believe and fantasy scenarios of the future based on an unpredictable technology.

running101

Code red moment

decimalenough

> I helped one recently almost one-shot[3] converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.

I'm sure Claude Code will happily one-shot that conversion. It's also virtually guaranteed to have messed up vital parts of the original logic in the process.

linsomniac

It depends on how easily testable the Excel is. If Claude has the ability to run both the Excel and the Python with different inputs, and check the outputs, it's stunningly likely to be able to one-shot it.

Spivak

Doesn't it help you sleep at night that your 401k might be managed by analysts #yoloing their financial modeling tools with an LLM?

danpalmer

I've noticed a huge gap between AI use on greenfield projects and brownfield projects. The first day of working on a greenfield project I can accomplish a week of work. But the second day I can accomplish a few days of work. By the end of the first week I'm getting a 20% productivity gain.

I think AI is just allowing everyone to speed-run the innovator's dilemma. Anyone can create a small version of anything, while big orgs will struggle to move quickly as before.

The interesting bit is going to be whether we see AI being used in maturing those small systems into big complex ones that account for the edge cases, meet all the requirements, scale as needed, etc. That's hard for humans to do, and particularly while still moving. I've not see any of this from AI yet outside of either a) very directed small changes to large complex systems, or b) plugins/extensions/etc along a well define set of rails.

data-ottawa

It’s fantastic to be able to prototype small to medium complexity projects, figure what architects work and don’t, then build on a stable foundation.

That’s what I’ve been doing lately, and it really helps get a clean architecture at the end.

EnPissant

I have experienced much of the opposite. With an established code base to copy patterns from, AI can generate code that needs a lot less iteration to clean up than on green fields projects.

orwin

Yeah, my observation is that for my usual work, I can maybe get a 20% productivity boot, probably closer to 10% tbh, and for the whole team overall productivity it feels like it has done nothing, as senior use their small productivity gains to fix the tons of issues in PR (or in prod when we miss something).

But last week I had two days where I had no real work to do, so I created cli tools to help with organisation, and cleaning up, I think AI boosted my productivity at least 200%, if not 500.

Gigachad

Similar experience. I love using Gemini to set up my home server, it can debug issues and generate simple docker compose files faster than I could have done myself. But at work on the 10 year old Rails app, I find it so much easier to just write all the code myself than to work out what prompt would work and then review/modify the results.

tonfreed

My observations match this. I can get fresh things done very quickly, but when I start getting into the weeds I eventually get too frustrated with babysitting the LLM to keep using it.

stego-tech

Enterprise IT dinosaur here, seconding this perspective and the author’s.

When I needed to bash out a quick Hashicorp Packer buildfile without prior experience beyond a bit of Vault and Terraform, local AI was a godsend at getting me 80% of the way there in seconds. I could read it, edit it, test it, and move much faster than Packer’s own thin “getting started” guide offered. The net result was zero prior knowledge to a hardened OS image and repeatable pipeline in under a week.

On the flip side, asking a chatbot about my GPOs? Or trusting it to change network firewalls and segmentation rules? Letting it run wild in the existing house of cards at the core of most enterprises? Absolutely hell no the fuck not. The longer something exists, the more likely a chatbot is to fuck it up by simple virtue of how they’re trained (pattern matching and prediction) versus how infrastructure ages (the older it is or the more often it changes, the less likely it is to be predictable), and I don’t see that changing with LLMs.

LLMs really are a game changer for my personal sales pitch of being a single dinosaur army for IT in small to medium-sized enterprises.

Fr0styMatt88

I find AI great for just greasing the wheels, like if I’m overthinking on a problem or just feel too tired to start on something I know needs doing.

The solutions also help me combat my natural tendency to over-engineer.

It’s also fun getting ChatGPT to quiz me on topics.

K0balt

It seems to be fantastic up to about 5k loc and then it starts to need a lot more guidance, careful supervision, skepticism, and aggressive context management. If you’re careful, it only goes completely off the rails once in a while and the damage is only a lost hour or two.

Overall, still a 4x production gain overall though, so I’m not complaining for $20 a month. It’s especially good at managing complicated aspects of c so I can focus on the bigger picture rather than the symbol contortions.

somat

Isn't this true of any greenfield project? with or without generative models. The first few days are amazingly productive. and then features and fixes get slower and slower. And you get to see how good an engineer you really are, as your initial architecture starts straining under the demands of changing real world requirements and you hope it holds together long enough to ship something.

"I could make that in a weekend"

"The first 80% of a project takes 80% of the time, the remaining 20% takes the other 80% of the time"

Aeolun

I find that setting up proper structure while everything still fits in a single context window of Claude code, as well as splittjng as much as possible into libraries works pretty well for staving off that moment.

sevenzero

> Anyone can create a small version of anything

Yup. My biggest issue with designing software is usually designing the system architecture/infra. I am very opposed to just shove everything to AWS and call it a day, you dont learn anything from that, cloud performance stinks for many things and I dont want to get random 30k bills because I let some instance of something run accidentally.

AI sucks at determining what kinda infrastructure would be great for scenario x due to Cloud being to go to solution for the lazy dev. Tried to get it to recommend a way to self host stuff, but thats just a general security hazard.

smuhakg

> On one hand, you have Microsoft's (awful) Copilot integration for Excel (in fairness, the Gemini integration in Google Sheets is also bad). So you can imagine financial directors trying to use it and it making a complete mess of the most simple tasks and never touching it again.

Microsoft has spent 30 years designing the most contrived XML-based format for Excel/Word/Powerpoint documents, so that it cannot be parsed except by very complicated bespoke applications with hundreds of developers involved.

Now, it's impossible to export any of those documents into plain text that an LLM can understand, and Microsoft Copilot literally doesn't work no matter how much money they throw at it. My company is now migrating Word documents to Markdown because they're seeing how powerful AI is.

This is karmic justice imo.

martinald

Totally agree, though ironically Claude code works way better with Excel than I expected.

I even tried telling Copilot to convert each sheet to a CSV on one attempt THEN do calculations. It just ignored it and failed miserably, ironically outputting me a list of files that it should have made, along with the broken python script. I found this very amusing.

QuantumGood

Tim Berners-Lee thought pages would become machine-readable long ago, with "obvious" benefits, and that idea partly drove XML, RDF and HTML 5. Now the benefit of doing so seems even bigger (but are they?), and the time spent making existing documents AI readable seems to keep growing.

irishcoffee

> Microsoft has spent 30 years designing the most contrived XML-based format for Excel/Word/Powerpoint documents, so that it cannot be parsed except by very complicated bespoke applications with hundreds of developers involved.

I had interns use c++ to unzip, parse, and repackage to json a standardized visio doc. I had no say in the standard, but specific blocks meant specific things, etc. The project was successful. The xml was parse-able... at least for our needs. The overall project died a swift death and this tidbit will probably be forgotten forever in the depths of repo heirarchy.

drsalt

what is the source data? the author says they've seen "far more non-technical people than I'd expect using Claude Code in terminal" so like, 3 people? who are these people?

wrs

Some minor editing to how this would have been written in the mid-1980s:

“The real leaps are being made organically by employees, not from a top down [desktop PC] strategy. Where I see the real productivity gains are small teams deciding to try and build a [Lotus 123] assisted workflow for a process, and as they are the ones that know that process inside out they can get very good results - unlike a [mainframe] software engineering team who have absolutely zero experience doing the process that they are helping automate.”

The embedded “power users” show the way, then the CIO-friendly packaged software follows much later.

SubiculumCode

The power is in the tails

with

> The bifurcation is real and seems to be, if anything, speeding up dramatically. I don't think there's ever been a time in history where a tiny team can outcompete a company one thousand times its size so easily.

Slightly overstated. Tiny teams aren't outcompeting because of AI, they're outcompeting because they aren't bogged down by decades of technical debt and bureaucracy. At Amazon, it will take you months of design, approvals, and implementation to ship a small feature. A one-man startup can just ship it. There is still a real question that has to be answered: how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.

Gigachad

I swear in a month at a startup I used to build what takes a year at my current large corp job. AI agents don't seem to have sped up the corporate process at all.

mhink

> how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.

Ultimately, it's the same way you ship human-generated code at scale without causing catastrophic failure: by only investing trust in critical systems to people who are trustworthy and have skin in the game.

There are two possibilities right now: either AI continues to get better, to the point where AI tools become so capable that completely non-technical stakeholders can trust them with truly business-critical decision making, or the industry develops a full understanding of their capabilities and is able to dial in a correct amount of responsibility to engineers (accounting for whatever additional capability AI can provide). Personally, I think (hope?) we're going to land in the latter situation, where individual engineers can comfortably ship and maintain about as much as an entire team could in years past.

As you said, part of the difficulty is years of technical debt and bureaucracy. At larger companies, there is a lot of knowledge about how and why things work that doesn't get explicitly encoded anywhere. There could be a service processing batch jobs against a database whose URL is only accessible via service discovery, and the service's runtime config lives in a database somewhere, and the only person who knows about it left the company five years ago, and their former manager knows about it but transferred to a different team in the meantime, but if it falls over, it's going to cause a high-severity issue affecting seven teams, and the new manager barely knows it exists. This is a contrived example, but it goes to what you're saying: just being able to write code faster doesn't solve these kinds of problems.

PunchyHamster

> There is still a real question that has to be answered: how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.

It's very simple. You treat AI as junior and review its code.

But that awesomely complex method has one disadvantage, having to do so means you can't brag about 300% performance improvement your team got from just commiting AI code to master branch without looking.

DavidPiper

> To really underline this, Microsoft itself is rolling out Claude Code to internal teams, despite (obviously) having access to Copilot at near zero cost, and significant ownership of OpenAI. I think this sums up quite how far behind they are

I think it sums up how thoroughly they've been disrupted, at least for coding AIs (independent of like-for-like quality concerns rightly mentioned elsewhere in this thread re: Excel/Python).

I understand ChatGPT can do like a million other things, but so can Claude. Microsoft deliberately using competitors internally is the thing that their customers should pay attention to. Time to transform "Nobody gets fired for buying Microsoft" into "Nobody gets fired for buying what Microsoft buy", for those inclined.

doom2

I guess this is as good a thread as any to ask what the current meta is for agentic programming (in my case, as applied to data engineering). There are all these posts that make it to the front page talking about productivity gains but very few of them actually detail the setup that's working for the author, just which model is best.

I guess it's like asking for people's vim configs, but hey, there are at least a few popular posts mainly around git/vim/terminal configs.

energy123

I push most work into chat interface (attach full codebase as a single file, paste in specs, describe what I want), then copy the tasklist from chat into codex. This is to reduce codex token usage to avoid breaching weekly limits. I'd use a more agent-heavy process if I didn't care about cost.

fragmede

There more stuff in mine, but at the top of my ~/.claude/CLAUDE.md file, I have:

    ## Important Instructions


  - update todo.md as items are completed
**Commit to git after making code changes.** Check `git status` first - only commit if there are actual changes: ```bash # If not in a git repository, initialize it first: git init # Then commit changes: git add <FILES_UPDATED> # Be surgical - add only the changes you just made. git commit -m "Description of changes"
This lets me have bite-sized git commits that I can marshall later, rather than having to wrangl git myself.
datsci_est_2015

Thought this was going to be more about programmers, but it was actually about non technical users and Microsoft’s product development failure.

One tidbit I’d disagree with is that only those using the bleeding edge AI tools are reaping the benefits. There seem to be a lot of highly specialized tools and a lot of specific configurations (and mystical incantations) to get them to work, and those are constantly changing and being updated. The bleeding edge is a dangerous place to be if you value your time (and sanity).

Personally, as someone working on moderate-to-highly complex software (live inference of industrial IoT data), I can’t really open a merge / pull request for my colleagues to review unless I 100% understand what I’ve pushed, and can explain to them as well.

My killer app for AI would just be a CLI that gets me to a commit based on moderately technical input:

“Add this configuration variable for this entry point; split this class into two classes, one for each of the responsibilities that are currently crammed together; update the unit tests to reflect these changes, including splitting the tests for the old class into two different test classes; etc”

But, all the hype of the bleeding edge is around abstracting away the entire coding process until you don’t even understand what code is being generated? Hard to see it as anything but a pipe dream. AI is useful, but it’s not a panacea - you can’t fire it and replace it when it fucks up.

georgeburdell

“Add this configuration variable for this entry point; split this class into two classes, one for each of the responsibilities that are currently crammed together; update the unit tests to reflect these changes, including splitting the tests for the old class into two different test classes; etc”

Granted I'm way behind the curve, but is this not how actual engineers (and not influencers) are using it? I heavily micro-manage the implementation because my manager still expects me to know the code

tiangewu

Microsoft's failure around copilot in Excel gave my partner a very poor impression on AI's ability to help with financial tasks.

It took a lot of convincing, but I finally got her to start using ChatGPT to help her write SQL and walk her through setting up some SaaS accounting software formulas.

It worked so well now she's trying to find more applications at work. Claude code is too scary for her though. That will need to be in some Web UI before she feels comfortable giving it a try.

protocolture

tl;dr: If you are trying to protect your IP from AI you probably use Copilot or nothing. If you have no IP to protect you are free to mess about.

fortran77

I know it's fun to bash Microsoft, but--while Claude is better, Microsoft's Copilot is far from "awful". I've used it productively with the VS Code integration for some esoteric projects: PIC PIO programming and Verilog.

FilosofumRex

Generally speaking, if you're using your coding agent as your assistant inside your IDE, you're missing out on 80% of its benefits... If anything you should ask it how to do something and then act as its assistant on implementing it

PunchyHamster

also missing out on 80% of bugs

nickphx

Three kinds, those who do not use it.

athrowaway3z

> sandboxing agents is difficult

I use this amazingly niche and hipster approach of giving the agent its own account, which through inconceivably highly complex arcane tweaking and configurations can lock down what they can and cant do.

---

Can somebody for the love of god tell me why articles keep bringing up why this is so difficult?

fragmede

It's a bunch of work, that takes a bunch of time, and I want it nowwwww-owwwww!

...is how I imagine that conversation goes.

NitpickLawyer

I have antigravity in its own account and that has worked pretty well so far. I also use devcontainers for the cli agents and that has also worked out well. It's one click away in my normal dev flow (I was using this anyway before for python projects).

viccis

>You can easily run Monte Carlo simulations

Ah yes, Monte Carlo simulations, regular part of a finance team's objectives.

nnevatie

I'd be very interested in seeing some statistics on what could be considered confidential material pasted on ChatGPT's chat interface.

I think the results would be pretty shocking and I think mostly because the integrations to source services are abject messes.

Antibabelic

https://www.theregister.com/2025/10/07/gen_ai_shadow_it_secr...

"With 45 percent of enterprise employees now using generative AI tools, 77 percent of these AI users have been copying and pasting data into their chatbot queries, the LayerX study says. A bit more than a fifth (22 percent) of these copy and paste operations include PII/PCI."

hereme888

I'm still trying to wrap my head over the past decade: useful AI, self operating vehicles, real AI robots, immersive VR, catching reusable rockets with chopsticks, and of course the flying cars.

What will be the expected work output for the average future worker?

jsattler

Some years ago, I was at a conference and attended a very interesting talk. I don't remember the title of the talk, but what stuck with me was: "It's no longer the big beating the small, but the fast beating the slow". This talk was before all the AI hype. Working at a big company myself, I think this has never been more true. I think the question is, how to stay fast.

josters

And, to add to that, how to know when to slow down. Also, having worked at a big company myself, I think the question shifts towards "how to get fast" without compromising security, compliance etc.

swyx

this is generic startup advice (doesnt mean its not true). you level up a bit when you find instances where slow beat fast (see: Teams vs Slack)

crystal_revenge

One the most reliable BS detectors I've found is when you have to try to convince other people of your edge.

If you have found a model that accurately predicts the stock market, you don't write a blog post about how brilliant you are, you keep it quiet and hope no one finds out while you rake in profits.

I still can't figure out quite what motivates these "AI evangelist" types (unlike crypto evangelists who clearly create value for themselves when they create credibility), but if you really have a dramatically better way to solve problems, you don't need to waste your breath trying to convince people. The validity of your method will be obvious over time.

I was just interviewing with a company building a foundation model for supposedly world changing coding assistants... but they still can't ship their product and find enough devs willing to relocate to SF. You would think if you actually had a game changing coding assistant, your number one advantage would be that you don't need to spend anything on devs and can ship 10x as fast as your competition.

> First, you have the "power users", who are all in on adopting new AI technology - Claude Code, MCPs, skills, etc. Surprisingly, these people are often not very technical.

It's not surprising to me at all that these people aren't very technical. For technical people code has never been the bottleneck. AI does reduce my time writing code but as a senior dev, writing code is a very small part of the problems I'm solving.

I've never had to argue with anyone that using a calculator is a superior method of solving simple computational math problems than doing it by hand, or that using a stand mixer is more efficient than using a wooden spoon. If there was a competing bakery arguing that the wooden spoon was better, I wouldn't waste my time arguing about the stand mixer, I would just sell more pastry then them and worry about counting my money.

camgunz

I think this article is generally insightful, but I don't think the author really knows if they one shotted the excel to python transformation or not. Maybe they elided an extensive testing phase, but otherwise big bugs could be lurking.

Maybe it's not a big deal, or maybe it's a compliance model with severe financial penalties for non-compliance. I just personally don't kind these tradeoffs going implicit.

PunchyHamster

I'd argue 2 types of users are

People using it as a tool, aware of its limitations and treating it basically as intern/boring task executor (whether its some code boilerplate, or pooping out/shortening some corporate email), or as tool to give themselves summary of topic they can then bite into deeper.

People outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results, and are not interested in knowing more about the topic or honing their skills in the topic

The second group is one that thinks talking to a chatbot will replace senior developer

Aardwolf

The same person might be both kinds of users, depending on the topic or just the time of the day

sevenzero

I started to outsource thinking at my job as my company made it very clear that they do not want/cant afford thinking engineers. Thinking requires time and they want to deliver quickly. So they cater towards the very realistic deadlines our PMs set for features (/s). Funnily enough the features have to be implemented ASAP according to the customers, but the customer feedback takes like 6 months due to them using the new feature for the first time 6 months after delivery. I just dont care anymore. Gonna leave the learning part up to my time off, but getting generally tired of the industry as a whole, so just putting in minimal effort to pay my bills until things explode or get better. So for me its definitely outsourcing thinking at work.

3D30497420

> People outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results, and are not interested in knowing more about the topic or honing their skills in the topic

And this may be fine in certain cases.

I'm learning German and my listening comprehension is marginal. I took a practice test and one of the exercises was listening to 15-30 seconds of audio followed by questions. I did terribly, but it seemed like a good way to practice. I used Claude Code to create a small app to generate short audio (via ElevenLabs) dialogs and set of questions. I ran the results by my German teacher and he was impressed.

I'm aware of the limitations: Sometimes the audio isn't great (it tends to mess up phone numbers), it can only a small part of my work learning German, etc.

The key part: I could have coded it, but I have other more important projects. I don't care that I didn't learn about the code. What I care about is I'm improving my German.

anal_reactor

> This effectively leads to a situation where smaller company employees are able to be so much more productive than the equivalent at an enterprise. It often used to be that people at small companies really envied the resources & teams that their larger competitors had access to - but increasingly I think the pendulum is swinging the other way.

Small companies are more agile and innovative while corporations often just shuffle papers around. Wow, what a bold claim, never seen before in the entire history of economics.

deafpolygon

There’s also an emerging group of users (such as myself) who essentially use it primarily as an “on-demand” teacher and not as a productivity tool.

I am learning software development without having it generate code for me—preferring to have it explain each thing line-by-line. But… it’s not only for learning development, but I can query it for historical information and have it point me to the source of the information (so I can read the primary sources as much as possible).

It allows me to customize the things I want to learn at my own pace, while also allowing me to diverge for a moment from the learning material. I have found it invaluable… and so far, Gemini has been pretty good at this (probably owing to the integration of Google search into Gemini).

It lets me cut through the SEO crap that has plagued search engines in recent years.

simmerup

Terrifying that people are creating financial models with AI when they don’t have the skills to verify the model does what they expect

martinald

They have an excel sheet next to it - they can test it against that. Plus they can ask questions if something seems off and have it explain the code.

fatheranton

[dead]

nebula8804

All we need is one major crash caused by AI to scare the capital owners. Then maybe us white collar workers can breath a bit for at least another few more years(maybe a decade+).

mkoubaa

It's not terrifying at all, some shops will fail and some will succeed and in the aggregate it'll be no different for the rest of us

myfakebadcode

I’m trying to learn rust coming from python (for fun). I use various LLM for python and see it stumble.

It is a beautiful experience to realize wtf you don’t know and how far over their skis so many will get trusting AI. The idea of deploying a rust project at my level of ability with an AI at the helm is is terrifying.

derrida

Business as usual.

taneq

If they have the skills to verify the Excel model then they can apply the same approach to the numbers produced by the AI-generated model, even if they can’t inspect it directly.

In my experience a lot of Excel models aren’t really tested, just checked a bit and them deemed correct.

superkuh

The argument seems to be that having a corporation restrict your ability to present arbitrary text directly to the model and only being able to go through their abstract interface which will integrate your text into theirs (hopefully) is more productive than fully controlling the input text to a model. I don't think that's true generally. I think it can be true when you're talking about non-technical users like the article is.

majormajor

The use of specialization of interfaces is apparent if you compare Photoshop with Gemini Pro/Nano Banana for targeted image editing.

I can select exactly where I want changes and have targeted element removal in Photoshop. If I submit the image and try to describe my desired changes textually, I get less easily-controllable output. (And I might still get scrambled text, for instance, in parts of the image that it didn't even need to touch.)

I think this sort of task-specific specialization will have a long future, hard to imagine pure-text once again being the dominant information transfer method for 90% of the things we do with computers after 40 years of building specialized non-text interfaces.

Havoc

The copilot button in excel at my work can’t access the excel file of the window it’s in. As in “what’s in cell A1” and it says I can’t read this file. Not even sure what the point is then frankly.

I’m happily vibe coding at work but yeah article is right. MS has enterprise market share by default not by merit. Stunning contrast between what’s possible and what’s happening in big corp

cmrdporcupine

Meanwhile the people I know who work at Microsoft say there's a constant whip-cracking to connect everything they're doing to "AI" and prove that's what they're doing.

bwat49

yeah I actually use AI a lot, but copilot is... useless. When microsoft adds copilot to their various apps they don't seem to put any thought/effort behind it beyond sticking a copilot button somewhere.

And if the copilot button does nothing but open a chat window without any real integration with the app, what the hell is the point of that when there's already a copilot button in the windows taskbar?

s-lambert

I don't see a divergence, from what I can tell a lot of people have only just started using agents in the past 3-4 months when they got good enough that it was hard to say otherwise. Then there's stuff like MCP, which never seemed good and was entirely driven by people who talked more about it than used it. There also used to be stuff like langchain or vector databases that nobody talks about anymore, maybe they're still used but they're not trendy anymore.

It seems way too soon to really narrow down any kind of trends after a few months. Most people aren't breathlessly following the next twitter trend, give it at least a year. Nobody is really going to be left behind if they pick up agents now instead of 3 months ago.

neom

Not sure how much falling behind there is even going to be, I'm an old school linux type with D- programming skills, yet getting going building things has been ridiculously easy. The swarms thing makes is so fast. I've churned 2 small but tested apps out in 2 weekends just chatting with claude code, the only thing I had to do was configure the servers.

Gigachad

The only people I see talking about MCP are managers who don't do anything but read linked in posts and haven't touched a text editor in years if ever.

_1tan

What‘s used instead of MCP in reality? Just REST or other existing API things?

NitpickLawyer

While I agree that the MCP craze was a bit off-putting, I think that came mostly from people thinking they can sell stuff in that space. If you view it as a protocol and not much else, things change.

I've seen great improvements with just two MCP servers: context7 and playwright. The first is great on planning sessions and leads to better usage of new-ish libraries, and the second is giving the model a feedback loop. The advantage is that they work with pretty much any coding agent harness you use. So whatever worked with cursor will work with cc or opencode or whatever else.

defrost

The "upside" description:

  On the other you have a non-technical executive who's got his head round Claude Code and can run e.g. Python locally.


I helped one recently almost one-shot converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.
Once the model is in Python, you effectively have a data science team in your pocket with Claude Code. You can easily run Monte Carlo simulations, pull external data sources as inputs, build web dashboards and have Claude Code work with you to really integrate weaknesses in your model (or business). It's a pretty magical experience watching someone realise they have so much power at their fingertips, without having to grind away for hours/days in Excel.
almost makes me physically sick.

I've a reasonably intense math background corrupted by application to geophysics and implementing real world numerical applications.

To be fair, this statement alone:

* 30 sheet mind numbingly complicated Excel financial model

makes my skin crawl and invokes a flight reflex.

Still, I'll concede that a Claude Code conversion to Python of a 30 sheet Excel financial model is unlikely to be significantly worse than the original.

decimalenough

I'm almost certain it will be significantly worse.

The Excel sheet will have been tuned over the years by people who knew exactly what it was doing and fixed countless bugs along the way.

The Claude Code copy will be a simulacrum that may behave the same way with some inputs, but is likely to get many of edge cases wrong, and, when you're talking about 30 sheets of Excel, there will be many, many of these sharp edges.

majormajor

One of the dirty secrets of a lot of these "code adjacent" areas is that they have very little testing.

If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output? Or maybe you'll be too worried about getting chided for "not being data driven" enough.

If an exec tells an intern or temp to vibecode that thing instead, then you definitely won't have any checkpoints in the process to make sure the human-language prompt describing process was properly turned into the right simulation. But unlike in coding, you don't have a user-facing product that someone can click around in, or send requests to, and verify. Is there a test suite for the giant excel doc? I'm assuming no, maybe I'm wrong.

It feels like it's going to be very hard for anyone working in areas with less black-and-white verifiability or correctness like that sort of financial modeling.

ChrisMarshallNY

Obligatory xkcd: https://xkcd.com/1667/

bitwize

The thing is, when you use AI, you're not really doing things, you're having things done. AI isn't a tool, it's a service.

Now, back in the day, IBM designed and built an "executive data terminal". It wasn't really a computer terminal in the sense that you and I understand it. Rather, it was a video and two-way-audio feed to a room with a team of underlings, which an executive could ask for business data and analyses, which could be called up on a computer display (also routed to the executive's office). This allowed the executive to ask questions so he (it was the 1960s, it was almost invariably a he) could make informed decisions, and the team of underlings to call up data or crunch numbers on the computer and show the results on the display.

So because executives are used to having things done for them, I can totally see AI being used by executives to replace the "team of underlings" in this setup—in principle. The fact is that were I in that CEO's chair, I'd be thinking twice before trusting anything an LLM tells me, and double-checking those results—perhaps with my team of underlings.

Discussed on Hackernews: https://news.ycombinator.com/item?id=42405462 IEEE article: https://spectrum.ieee.org/ibm-demo

PunchyHamster

we're going from "bad excel sheet caused recession" to "bad vibe-coded financial thing caused recession"

ed_mercer

> Microsoft itself is rolling out Claude Code to internal teams

Seems like Nadella is having his Baller moment

fdsf2

Nothing but ego frankly. Apple had no problem settling for a small market share back in the day... look where they are now. It didnt come from make-believe and fantasy scenarios of the future based on an unpredictable technology.

running101

Code red moment

decimalenough

> I helped one recently almost one-shot[3] converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.

I'm sure Claude Code will happily one-shot that conversion. It's also virtually guaranteed to have messed up vital parts of the original logic in the process.

linsomniac

It depends on how easily testable the Excel is. If Claude has the ability to run both the Excel and the Python with different inputs, and check the outputs, it's stunningly likely to be able to one-shot it.

Spivak

Doesn't it help you sleep at night that your 401k might be managed by analysts #yoloing their financial modeling tools with an LLM?

danpalmer

I've noticed a huge gap between AI use on greenfield projects and brownfield projects. The first day of working on a greenfield project I can accomplish a week of work. But the second day I can accomplish a few days of work. By the end of the first week I'm getting a 20% productivity gain.

I think AI is just allowing everyone to speed-run the innovator's dilemma. Anyone can create a small version of anything, while big orgs will struggle to move quickly as before.

The interesting bit is going to be whether we see AI being used in maturing those small systems into big complex ones that account for the edge cases, meet all the requirements, scale as needed, etc. That's hard for humans to do, and particularly while still moving. I've not see any of this from AI yet outside of either a) very directed small changes to large complex systems, or b) plugins/extensions/etc along a well define set of rails.

data-ottawa

It’s fantastic to be able to prototype small to medium complexity projects, figure what architects work and don’t, then build on a stable foundation.

That’s what I’ve been doing lately, and it really helps get a clean architecture at the end.

EnPissant

I have experienced much of the opposite. With an established code base to copy patterns from, AI can generate code that needs a lot less iteration to clean up than on green fields projects.

orwin

Yeah, my observation is that for my usual work, I can maybe get a 20% productivity boot, probably closer to 10% tbh, and for the whole team overall productivity it feels like it has done nothing, as senior use their small productivity gains to fix the tons of issues in PR (or in prod when we miss something).

But last week I had two days where I had no real work to do, so I created cli tools to help with organisation, and cleaning up, I think AI boosted my productivity at least 200%, if not 500.

Gigachad

Similar experience. I love using Gemini to set up my home server, it can debug issues and generate simple docker compose files faster than I could have done myself. But at work on the 10 year old Rails app, I find it so much easier to just write all the code myself than to work out what prompt would work and then review/modify the results.

tonfreed

My observations match this. I can get fresh things done very quickly, but when I start getting into the weeds I eventually get too frustrated with babysitting the LLM to keep using it.

stego-tech

Enterprise IT dinosaur here, seconding this perspective and the author’s.

When I needed to bash out a quick Hashicorp Packer buildfile without prior experience beyond a bit of Vault and Terraform, local AI was a godsend at getting me 80% of the way there in seconds. I could read it, edit it, test it, and move much faster than Packer’s own thin “getting started” guide offered. The net result was zero prior knowledge to a hardened OS image and repeatable pipeline in under a week.

On the flip side, asking a chatbot about my GPOs? Or trusting it to change network firewalls and segmentation rules? Letting it run wild in the existing house of cards at the core of most enterprises? Absolutely hell no the fuck not. The longer something exists, the more likely a chatbot is to fuck it up by simple virtue of how they’re trained (pattern matching and prediction) versus how infrastructure ages (the older it is or the more often it changes, the less likely it is to be predictable), and I don’t see that changing with LLMs.

LLMs really are a game changer for my personal sales pitch of being a single dinosaur army for IT in small to medium-sized enterprises.

Fr0styMatt88

I find AI great for just greasing the wheels, like if I’m overthinking on a problem or just feel too tired to start on something I know needs doing.

The solutions also help me combat my natural tendency to over-engineer.

It’s also fun getting ChatGPT to quiz me on topics.

K0balt

It seems to be fantastic up to about 5k loc and then it starts to need a lot more guidance, careful supervision, skepticism, and aggressive context management. If you’re careful, it only goes completely off the rails once in a while and the damage is only a lost hour or two.

Overall, still a 4x production gain overall though, so I’m not complaining for $20 a month. It’s especially good at managing complicated aspects of c so I can focus on the bigger picture rather than the symbol contortions.

somat

Isn't this true of any greenfield project? with or without generative models. The first few days are amazingly productive. and then features and fixes get slower and slower. And you get to see how good an engineer you really are, as your initial architecture starts straining under the demands of changing real world requirements and you hope it holds together long enough to ship something.

"I could make that in a weekend"

"The first 80% of a project takes 80% of the time, the remaining 20% takes the other 80% of the time"

Aeolun

I find that setting up proper structure while everything still fits in a single context window of Claude code, as well as splittjng as much as possible into libraries works pretty well for staving off that moment.

sevenzero

> Anyone can create a small version of anything

Yup. My biggest issue with designing software is usually designing the system architecture/infra. I am very opposed to just shove everything to AWS and call it a day, you dont learn anything from that, cloud performance stinks for many things and I dont want to get random 30k bills because I let some instance of something run accidentally.

AI sucks at determining what kinda infrastructure would be great for scenario x due to Cloud being to go to solution for the lazy dev. Tried to get it to recommend a way to self host stuff, but thats just a general security hazard.

smuhakg

> On one hand, you have Microsoft's (awful) Copilot integration for Excel (in fairness, the Gemini integration in Google Sheets is also bad). So you can imagine financial directors trying to use it and it making a complete mess of the most simple tasks and never touching it again.

Microsoft has spent 30 years designing the most contrived XML-based format for Excel/Word/Powerpoint documents, so that it cannot be parsed except by very complicated bespoke applications with hundreds of developers involved.

Now, it's impossible to export any of those documents into plain text that an LLM can understand, and Microsoft Copilot literally doesn't work no matter how much money they throw at it. My company is now migrating Word documents to Markdown because they're seeing how powerful AI is.

This is karmic justice imo.

martinald

Totally agree, though ironically Claude code works way better with Excel than I expected.

I even tried telling Copilot to convert each sheet to a CSV on one attempt THEN do calculations. It just ignored it and failed miserably, ironically outputting me a list of files that it should have made, along with the broken python script. I found this very amusing.

QuantumGood

Tim Berners-Lee thought pages would become machine-readable long ago, with "obvious" benefits, and that idea partly drove XML, RDF and HTML 5. Now the benefit of doing so seems even bigger (but are they?), and the time spent making existing documents AI readable seems to keep growing.

irishcoffee

> Microsoft has spent 30 years designing the most contrived XML-based format for Excel/Word/Powerpoint documents, so that it cannot be parsed except by very complicated bespoke applications with hundreds of developers involved.

I had interns use c++ to unzip, parse, and repackage to json a standardized visio doc. I had no say in the standard, but specific blocks meant specific things, etc. The project was successful. The xml was parse-able... at least for our needs. The overall project died a swift death and this tidbit will probably be forgotten forever in the depths of repo heirarchy.

drsalt

what is the source data? the author says they've seen "far more non-technical people than I'd expect using Claude Code in terminal" so like, 3 people? who are these people?

wrs

Some minor editing to how this would have been written in the mid-1980s:

“The real leaps are being made organically by employees, not from a top down [desktop PC] strategy. Where I see the real productivity gains are small teams deciding to try and build a [Lotus 123] assisted workflow for a process, and as they are the ones that know that process inside out they can get very good results - unlike a [mainframe] software engineering team who have absolutely zero experience doing the process that they are helping automate.”

The embedded “power users” show the way, then the CIO-friendly packaged software follows much later.

SubiculumCode

The power is in the tails

with

> The bifurcation is real and seems to be, if anything, speeding up dramatically. I don't think there's ever been a time in history where a tiny team can outcompete a company one thousand times its size so easily.

Slightly overstated. Tiny teams aren't outcompeting because of AI, they're outcompeting because they aren't bogged down by decades of technical debt and bureaucracy. At Amazon, it will take you months of design, approvals, and implementation to ship a small feature. A one-man startup can just ship it. There is still a real question that has to be answered: how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.

Gigachad

I swear in a month at a startup I used to build what takes a year at my current large corp job. AI agents don't seem to have sped up the corporate process at all.

mhink

> how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.

Ultimately, it's the same way you ship human-generated code at scale without causing catastrophic failure: by only investing trust in critical systems to people who are trustworthy and have skin in the game.

There are two possibilities right now: either AI continues to get better, to the point where AI tools become so capable that completely non-technical stakeholders can trust them with truly business-critical decision making, or the industry develops a full understanding of their capabilities and is able to dial in a correct amount of responsibility to engineers (accounting for whatever additional capability AI can provide). Personally, I think (hope?) we're going to land in the latter situation, where individual engineers can comfortably ship and maintain about as much as an entire team could in years past.

As you said, part of the difficulty is years of technical debt and bureaucracy. At larger companies, there is a lot of knowledge about how and why things work that doesn't get explicitly encoded anywhere. There could be a service processing batch jobs against a database whose URL is only accessible via service discovery, and the service's runtime config lives in a database somewhere, and the only person who knows about it left the company five years ago, and their former manager knows about it but transferred to a different team in the meantime, but if it falls over, it's going to cause a high-severity issue affecting seven teams, and the new manager barely knows it exists. This is a contrived example, but it goes to what you're saying: just being able to write code faster doesn't solve these kinds of problems.

PunchyHamster

> There is still a real question that has to be answered: how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.

It's very simple. You treat AI as junior and review its code.

But that awesomely complex method has one disadvantage, having to do so means you can't brag about 300% performance improvement your team got from just commiting AI code to master branch without looking.

DavidPiper

> To really underline this, Microsoft itself is rolling out Claude Code to internal teams, despite (obviously) having access to Copilot at near zero cost, and significant ownership of OpenAI. I think this sums up quite how far behind they are

I think it sums up how thoroughly they've been disrupted, at least for coding AIs (independent of like-for-like quality concerns rightly mentioned elsewhere in this thread re: Excel/Python).

I understand ChatGPT can do like a million other things, but so can Claude. Microsoft deliberately using competitors internally is the thing that their customers should pay attention to. Time to transform "Nobody gets fired for buying Microsoft" into "Nobody gets fired for buying what Microsoft buy", for those inclined.

doom2

I guess this is as good a thread as any to ask what the current meta is for agentic programming (in my case, as applied to data engineering). There are all these posts that make it to the front page talking about productivity gains but very few of them actually detail the setup that's working for the author, just which model is best.

I guess it's like asking for people's vim configs, but hey, there are at least a few popular posts mainly around git/vim/terminal configs.

energy123

I push most work into chat interface (attach full codebase as a single file, paste in specs, describe what I want), then copy the tasklist from chat into codex. This is to reduce codex token usage to avoid breaching weekly limits. I'd use a more agent-heavy process if I didn't care about cost.

fragmede

There more stuff in mine, but at the top of my ~/.claude/CLAUDE.md file, I have:

    ## Important Instructions


  - update todo.md as items are completed
**Commit to git after making code changes.** Check `git status` first - only commit if there are actual changes: ```bash # If not in a git repository, initialize it first: git init # Then commit changes: git add <FILES_UPDATED> # Be surgical - add only the changes you just made. git commit -m "Description of changes"
This lets me have bite-sized git commits that I can marshall later, rather than having to wrangl git myself.
datsci_est_2015

Thought this was going to be more about programmers, but it was actually about non technical users and Microsoft’s product development failure.

One tidbit I’d disagree with is that only those using the bleeding edge AI tools are reaping the benefits. There seem to be a lot of highly specialized tools and a lot of specific configurations (and mystical incantations) to get them to work, and those are constantly changing and being updated. The bleeding edge is a dangerous place to be if you value your time (and sanity).

Personally, as someone working on moderate-to-highly complex software (live inference of industrial IoT data), I can’t really open a merge / pull request for my colleagues to review unless I 100% understand what I’ve pushed, and can explain to them as well.

My killer app for AI would just be a CLI that gets me to a commit based on moderately technical input:

“Add this configuration variable for this entry point; split this class into two classes, one for each of the responsibilities that are currently crammed together; update the unit tests to reflect these changes, including splitting the tests for the old class into two different test classes; etc”

But, all the hype of the bleeding edge is around abstracting away the entire coding process until you don’t even understand what code is being generated? Hard to see it as anything but a pipe dream. AI is useful, but it’s not a panacea - you can’t fire it and replace it when it fucks up.

georgeburdell

“Add this configuration variable for this entry point; split this class into two classes, one for each of the responsibilities that are currently crammed together; update the unit tests to reflect these changes, including splitting the tests for the old class into two different test classes; etc”

Granted I'm way behind the curve, but is this not how actual engineers (and not influencers) are using it? I heavily micro-manage the implementation because my manager still expects me to know the code

tiangewu

Microsoft's failure around copilot in Excel gave my partner a very poor impression on AI's ability to help with financial tasks.

It took a lot of convincing, but I finally got her to start using ChatGPT to help her write SQL and walk her through setting up some SaaS accounting software formulas.

It worked so well now she's trying to find more applications at work. Claude code is too scary for her though. That will need to be in some Web UI before she feels comfortable giving it a try.

protocolture

tl;dr: If you are trying to protect your IP from AI you probably use Copilot or nothing. If you have no IP to protect you are free to mess about.

fortran77

I know it's fun to bash Microsoft, but--while Claude is better, Microsoft's Copilot is far from "awful". I've used it productively with the VS Code integration for some esoteric projects: PIC PIO programming and Verilog.

FilosofumRex

Generally speaking, if you're using your coding agent as your assistant inside your IDE, you're missing out on 80% of its benefits... If anything you should ask it how to do something and then act as its assistant on implementing it

PunchyHamster

also missing out on 80% of bugs

nickphx

Three kinds, those who do not use it.

athrowaway3z

> sandboxing agents is difficult

I use this amazingly niche and hipster approach of giving the agent its own account, which through inconceivably highly complex arcane tweaking and configurations can lock down what they can and cant do.

---

Can somebody for the love of god tell me why articles keep bringing up why this is so difficult?

fragmede

It's a bunch of work, that takes a bunch of time, and I want it nowwwww-owwwww!

...is how I imagine that conversation goes.

NitpickLawyer

I have antigravity in its own account and that has worked pretty well so far. I also use devcontainers for the cli agents and that has also worked out well. It's one click away in my normal dev flow (I was using this anyway before for python projects).

viccis

>You can easily run Monte Carlo simulations

Ah yes, Monte Carlo simulations, regular part of a finance team's objectives.

nnevatie

I'd be very interested in seeing some statistics on what could be considered confidential material pasted on ChatGPT's chat interface.

I think the results would be pretty shocking and I think mostly because the integrations to source services are abject messes.

Antibabelic

https://www.theregister.com/2025/10/07/gen_ai_shadow_it_secr...

"With 45 percent of enterprise employees now using generative AI tools, 77 percent of these AI users have been copying and pasting data into their chatbot queries, the LayerX study says. A bit more than a fifth (22 percent) of these copy and paste operations include PII/PCI."

hereme888

I'm still trying to wrap my head over the past decade: useful AI, self operating vehicles, real AI robots, immersive VR, catching reusable rockets with chopsticks, and of course the flying cars.

What will be the expected work output for the average future worker?

jsattler

Some years ago, I was at a conference and attended a very interesting talk. I don't remember the title of the talk, but what stuck with me was: "It's no longer the big beating the small, but the fast beating the slow". This talk was before all the AI hype. Working at a big company myself, I think this has never been more true. I think the question is, how to stay fast.

josters

And, to add to that, how to know when to slow down. Also, having worked at a big company myself, I think the question shifts towards "how to get fast" without compromising security, compliance etc.

swyx

this is generic startup advice (doesnt mean its not true). you level up a bit when you find instances where slow beat fast (see: Teams vs Slack)

crystal_revenge

One the most reliable BS detectors I've found is when you have to try to convince other people of your edge.

If you have found a model that accurately predicts the stock market, you don't write a blog post about how brilliant you are, you keep it quiet and hope no one finds out while you rake in profits.

I still can't figure out quite what motivates these "AI evangelist" types (unlike crypto evangelists who clearly create value for themselves when they create credibility), but if you really have a dramatically better way to solve problems, you don't need to waste your breath trying to convince people. The validity of your method will be obvious over time.

I was just interviewing with a company building a foundation model for supposedly world changing coding assistants... but they still can't ship their product and find enough devs willing to relocate to SF. You would think if you actually had a game changing coding assistant, your number one advantage would be that you don't need to spend anything on devs and can ship 10x as fast as your competition.

> First, you have the "power users", who are all in on adopting new AI technology - Claude Code, MCPs, skills, etc. Surprisingly, these people are often not very technical.

It's not surprising to me at all that these people aren't very technical. For technical people code has never been the bottleneck. AI does reduce my time writing code but as a senior dev, writing code is a very small part of the problems I'm solving.

I've never had to argue with anyone that using a calculator is a superior method of solving simple computational math problems than doing it by hand, or that using a stand mixer is more efficient than using a wooden spoon. If there was a competing bakery arguing that the wooden spoon was better, I wouldn't waste my time arguing about the stand mixer, I would just sell more pastry then them and worry about counting my money.

camgunz

I think this article is generally insightful, but I don't think the author really knows if they one shotted the excel to python transformation or not. Maybe they elided an extensive testing phase, but otherwise big bugs could be lurking.

Maybe it's not a big deal, or maybe it's a compliance model with severe financial penalties for non-compliance. I just personally don't kind these tradeoffs going implicit.

PunchyHamster

I'd argue 2 types of users are

People using it as a tool, aware of its limitations and treating it basically as intern/boring task executor (whether its some code boilerplate, or pooping out/shortening some corporate email), or as tool to give themselves summary of topic they can then bite into deeper.

People outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results, and are not interested in knowing more about the topic or honing their skills in the topic

The second group is one that thinks talking to a chatbot will replace senior developer

Aardwolf

The same person might be both kinds of users, depending on the topic or just the time of the day

sevenzero

I started to outsource thinking at my job as my company made it very clear that they do not want/cant afford thinking engineers. Thinking requires time and they want to deliver quickly. So they cater towards the very realistic deadlines our PMs set for features (/s). Funnily enough the features have to be implemented ASAP according to the customers, but the customer feedback takes like 6 months due to them using the new feature for the first time 6 months after delivery. I just dont care anymore. Gonna leave the learning part up to my time off, but getting generally tired of the industry as a whole, so just putting in minimal effort to pay my bills until things explode or get better. So for me its definitely outsourcing thinking at work.

3D30497420

> People outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results, and are not interested in knowing more about the topic or honing their skills in the topic

And this may be fine in certain cases.

I'm learning German and my listening comprehension is marginal. I took a practice test and one of the exercises was listening to 15-30 seconds of audio followed by questions. I did terribly, but it seemed like a good way to practice. I used Claude Code to create a small app to generate short audio (via ElevenLabs) dialogs and set of questions. I ran the results by my German teacher and he was impressed.

I'm aware of the limitations: Sometimes the audio isn't great (it tends to mess up phone numbers), it can only a small part of my work learning German, etc.

The key part: I could have coded it, but I have other more important projects. I don't care that I didn't learn about the code. What I care about is I'm improving my German.

anal_reactor

> This effectively leads to a situation where smaller company employees are able to be so much more productive than the equivalent at an enterprise. It often used to be that people at small companies really envied the resources & teams that their larger competitors had access to - but increasingly I think the pendulum is swinging the other way.

Small companies are more agile and innovative while corporations often just shuffle papers around. Wow, what a bold claim, never seen before in the entire history of economics.

deafpolygon

There’s also an emerging group of users (such as myself) who essentially use it primarily as an “on-demand” teacher and not as a productivity tool.

I am learning software development without having it generate code for me—preferring to have it explain each thing line-by-line. But… it’s not only for learning development, but I can query it for historical information and have it point me to the source of the information (so I can read the primary sources as much as possible).

It allows me to customize the things I want to learn at my own pace, while also allowing me to diverge for a moment from the learning material. I have found it invaluable… and so far, Gemini has been pretty good at this (probably owing to the integration of Google search into Gemini).

It lets me cut through the SEO crap that has plagued search engines in recent years.