The Dark Side of AWS

I use AWS to host this blog. I also use it at work. Using AWS for real work has exposed some rather annoying aspects of the service, with one standing head and shoulders above the rest: service limits.

Service limits certainly play an important role, both in helping Amazon plan capacity increases and in helping limit the damage of accidental or malicious provisioning. However, Amazon does not manage limits well at all. How so? Let me explain:

For some resources in some services, you can query your current limits via the API. For some other resources in some other services, you can query your current limits via Trusted Advisor, which costs $100/month and presents the limits in a somewhat awkward manner. In either case, it is not overly onerous to set up some kind of automated monitoring (e.g. Nagios) to alert you before you hit the limit.

For all other resources, the only way to query your current limits is to slam into them full force (i.e. you try to create something and fail because you've reached the limit). At this point, you contact Amazon Support and request a limit increase. To their credit, this generally goes pretty quickly, but when you need a new whatever right now, it's not quick enough.

(Aside: yes, you can put together a list of all default service limits (by hand, since there's no single comprehensive list) and then diligently maintain the list (again, by hand) whenever you get a limit increase. That is not a solution.)

Even this, on its own, would not be enough to piss me off enough to write publicly about it. No, the thing that really PISSES ME RIGHT THE FUCK OFF is Amazon's attitude about it. Simply put, they don't care. At all.

I have on more than one occasion asked Amazon for guidance on monitoring our limits. In short, I want our Nagios instance to alert us when we get close to a limit. This is a reasonable thing to want, right? Isn't it better if I can make a low-priority request for additional whatevers before I run into the limit instead of a panicked request after I've hit the limit?

During my most recent interaction with Amazon Support on this topic, they suggested the following:

This page and all related pages are INCREDIBLY light on details. Basically, it amounts to "Pay us a bunch of money for a product that might help you." The demo linked from that page makes ZERO mention of AWS, so I can't evaluate the functionality from there.

I believe that should address your concern with Nagios Monitoring

What is the basis for that belief?

This search returns two results. Two. Both of them are security bulletins.

...and here it is: the reason why they are so reluctant to provide ANY means to proactively deal with AWS service limits! They want to upsell you on a support plan! In our case, the only support plan above the one we currently have is $15000/month. Assuming that this plan would, in fact, get us a person at Amazon to watch our limits for us (the description of the plan does not explicitly say that), this would be a great plan...except for the fact that I would probably be put on some kind of "Idiot List" by the accountants if I even suggested it. And for good reason, too: $15000/month is a significant fraction of our total AWS monthly bill, which is higher than we'd like as it is.

(Oh, and before you tell me to just use Trusted Advisor: I'm not going to manually check Trusted Advisor every so often and hope that it's telling me about all of the limits I might need to know about. I want something AUTOMATED.)

This is, frankly, shameful. AWS is an incredibly useful and comprehensive service, but once you start growing, you repeatedly bump into limits with little to no advance warning. That is, unless you are willing and able to pay $15K per month. I guess if you're running a VC-backed startup with regular infusions of cash, then it's not a big deal. However, if you're running a more modest enterprise or a bootstrapped startup, service limits will be an ever-present thorn in your side.

Caveat emptor.