How to Scale a System from 0 to 10M+ Users

← Back to articleSources: hackernews1d1h36mSource 45 Comments

How to Scale a System from 0 to 10M+ Users

All
hackernews

olayiwoladekoya1d1h33m

I really enjoyed reading this. Much like Instagram, which had thousands of users sign up on the first day, if you aren't able to scale because of your skill level, wouldn't that affect usage and lead to comments like: 'The app/site is so slow'?

lesuorac1d56m

Aren't comments like "the site is too slow" similar to "the city is too crowded"?

Twitter famously had a "fail whale" but it didn't stop the company from growing. If you have market demand (and I guess advertising) then you can get away with a sub-optimal product for a long time.

arter451d18m

It depends on the adoption model.

If it’s just “sign up any time you want and go”, yes, it can go that way.

If it’s “join that waiting list” or “book a call” (for KYC purposes or whatever), you have a buffer.

If user count is more or less constant (most internal websites, for example), it’s probably not an issue.

And so on.

littlestymaar1d42m

Not criticizing the core idea, which is sound (don't waste ressource overengineering at the beginning, evolve your architecture to match your actual scale as you grow), but the “number of users” figures in this post are completely nonsensical. You ought to multiply them by 100 (if you're being conservative) or even 1000 (depending on the consumption pattern for the user).

Modern hardware is fast, if you cannot fit more than 100 users (not even 100 concurrent users) on a single $50/month server, you're doing something very very wrong.

Even repurposed 10 years old fairphone[1] can handle more than that.

[1]: https://far.computer

don_neufeld1d40m

Agreed, the numbers were shockingly low.

louismerlin1d35m

Amazing to see my little phone pop up randomly on hacker news :D

Thank you stranger.

maccard23h32m

You and another person made this point _but_ I’d encourage you to look at what $50/mo gets you on AWS all in. In reality it will get you a t4g.small plus 200GB of (very slow) storage. Honestly they start to chug at 500 or so users in my experience.

the847223h23m

Counting in users is just nonsensical. Is it total registered users? Users per <time interval>? Sessions that need to go in the session store? Concurrent requests?

Then there's the implementation language category. interpreted, JITed vs. AOT.

And of course the workload matters a lot. Simple CRUD application vs. compute-heavy or serving lots of media, ...

Together those factors can make like 6+ OOMs difference.

jbrooks841d14m

Nice read

Nextgrid23h55m

Good post in general but some caveats:

1) His user numbers are off by an order of magnitude at least, as other comments have mentioned. Even a VM/VPS should handle more, and a modern bare-metal server will do way more than the quoted numbers.

2) Autoscaling is a solution to the self-inflicted problem of insanely-high cloud prices, which cloud providers love because implementing it requires more reliance on proprietary vendor-specific APIs. The actual solution is a handful of modern bare-metal servers at strategic locations which allow you to cover your worst-case expected load while being cheaper than the lowest expected load on a cloud. Upside: lower prices & complexity. Downside: say goodbye to your AWS ReInvent invite.

3) Microservices. Apparently redeploying stateless appservers is a problem (despite the autoscaling part doing exactly this in response to load spikes which he's fine with), and his solution is to introduce 100x the management overhead and points of failure? The argument about scaling separate features differently doesn't make sense either - unless your code is literally so big it can't all fit in one server, there is no problem having every server be able to serve all types of requests, and as a bonus you no longer have to predict the expected load on a per-feature basis. A monolith's individual features can still talk to separate databases just fine.

withinboredom23h46m

And to add to this: virtually every programming language allows you to define multiple entry points. So you can have your workers in the exact same codebase as your api and even multiple api services. They can share code and data structures or whatever you need. So, if you do need this kind of complexity with multiple services, you don’t need separate repos and elaborate build systems and dependency hell.

mbb7023h38m

As is often stated, microservices is a solution for scaling an engineering org to 100s of developers, not for scaling a product to millions of users.

yomismoaqui23h23m

The best descripcion of microservices comes from "The Grug Brained Developer" (https://grugbrain.dev/):

"grug wonder why big brain take hardest problem, factoring system correctly, and introduce network call too

seem very confusing to grug"

techpression22h13m

I was reading it and got seriously confused by separate database and server for a measly 1000 users. With the two separate you can scale vertically to handle a million users if all you’re doing is basic web/rest type stuff, probably more.

I feel a bit of sadness for people who had never used a bare metal server and seen how insanely capable hardware is today.

kylecazar23h35m

Nice for traditional apps. I'm currently working with a client on an Elixir backend. Some aspects of the tier progressions transfer, but the BEAM diverges a bit (no external queues/redis, scaling direction). I am enjoying it.

efilife23h28m

This post shows some signs of having its parts written by a LLM in my opinion. Or am I crazy? Please tell me that I am.

Author having this on his github makes me even more suspicious: https://github.com/ashishps1/learn-ai-engineering

tallytarik23h24m

It’s entirely written by an LLM.

CrzyLngPwd23h24m

Echoing what others have said about the numbers being off.

I ran a 10k user classic ASP service on a VPS from Fasthosts, with MySQL 5.6 and Redis, and it was awesome.

gethly22h45m

Just skimming the website, i call bs.

swiftcoder22h41m

I'm going to be charitable and assume he means "concurrent users" (i.e. something like 100 concurrent users would typically imply 2 orders of magnitude more total users...)

poisonborz22h23m

I believe less and less that scaling to hundreds of millions of user is not just a failure mode. There is a tipping point from which you only serve profits and shareholders/funders. Communities die by becoming too big.

srinath69322h14m

I think a lot of these debates miss the core point, which is stage and context. Yes, a single modern server can handle far more than most people think, and yes, microservices are massively overused. But early teams usually optimize for speed, safety, and predictability rather than perfect efficiency. Cloud + autoscaling is expensive, but it reduces operational risk when traffic is unpredictable and the team is small. Bare metal is great once you understand your workload and failure modes, but it requires real ops discipline that many startups don’t have early on. Same with microservices: a modular monolith with good boundaries gets you very far with far less complexity, and most products never reach the scale where microservices are truly necessary. In practice, the winning approach tends to be: start simple, scale vertically, keep the architecture boring, and only add complexity when real bottlenecks force your hand - not because Twitter or Netflix did it.