It's hard to believe that it has already been over eight years since I migrated all of my goodies from a dedicated server to Azure. For the most part, I'm thankful for this arrangement because maintaining your own hardware isn't something I ever enjoyed or wanted to do. Having a single box was always asking for failure, too, even though I had an extra drive in the thing (which did fail once) as a backup. I couldn't respond to scale needs if I had to either, and there was one point where it would have helped. Meanwhile, Azure has improved in a lot of ways since then as far as pricing structures go. The SQL database pools were a real game changer for me, because it works with the same flexibility as app services, which all live on the same "plan" with whatever memory and CPU constraints you're paying for. The database uses some goofy units, but whatever they are, I rarely average more than 5% of them.
While Azure Functions and Redis and some other bits are awesome to have at your disposal, the App Service is still the most important thing, since it hosts most of the running code to make an app or site. It became easier and cheaper to run multiple instances with that when the Linux flavor became available, and all of the dotnet hotness could run on it. I don't have anything running on Windows anymore. For the most part, this has been great, except when things break and you can't explain it. I'm in the middle of one of those situations now.
The first thing I noticed about running these on Linux is that there's an abstraction underneath that containerizes your app, even though I'm using the old school zip deploy out of Azure DevOps. You can see it when you watch the log stream. At first, the diagnostic tools were terrible, which is to say there weren't any that were useful, but the log streaming and the various graphs and charts available in the troubleshooting section of the portal definitely help. Where things have not worked before, it had something to do with the load balancing bits, which are totally abstract and you can't do much to figure out what's going on outside of some docs that tell you about 502's (gateway timeouts) but none of which work.
On previous occasions where I had the 502's, some combination of scale up or scale out, or redeployment, cleared the problem. A few weeks ago, I encountered the problem again, but couldn't resolve it. In fact, the scaling and deployments probably just created more noise. The app in question uses subdomains to set customer tenant context (there's a wildcard set, with a wildcard cert), or alternatively, it can use a totally custom domain. In this case, the custom domain, the one people knew, returned 502's, but using the subdomain worked fine. I didn't even realize it until I checked that other tenants were working. Knowing that one possible condition of 502's is not having a proper certificate bound, I removed and then rebound the cert, and soon after, it started working. After some digging, that turned out to just be coincidence, and support still doesn't know what caused the problem.
One of the things support did was point me at an aging blog post that's hosted on Github, which was troubling for a few reasons. One, it's hard or impossible to find, and two it shows what a leaky abstraction app services really are. Health check, self-healing, local cache are all things that feel like they should be automatic. And even then, there are undocumented things, like health check considers 301's as "failures," so if you point it at the root path, and you use a redirect to force the canonical domain name, those are failures. Self-healing seems like it should be automatic, since that's the point of having multiple instances. And local cache is such an in-the-weeds implementation detail, that App Service starts to feel less like a PaaS offering and throws you back to running IIS yourself. The average support rep will also throw graphs at you of various things that may not be available to you otherwise. And you can bet I'm not going to use Application Insights, which easily would cost more than the stuff already in use to keep the site on the air.
App Service has grown in a very organic way over the years, central to the usefulness of Azure, and to their credit, they basically rebuilt all of the plumbing without anyone noticing, which is a quality that indicates a solid abstraction. But the above bits lean toward leaky abstraction, and I think they can and will do better.
Thanks for the insights. I have just stared my Azure App Service journey and need all the info I can get.