How SNAT port allocation impacts your Azure services and what to do about it

SNAT (Source Network Address Translation) is the process where private IP addresses from the outbound connections of your Azure services are translated into a public IP address. This enable access to the public internet while preserving fully private internal IP addresses within your Azure environment.

With an app service in Azure you have a limited allocation of available SNAT ports. This means that if you app requires more SNAT ports than the default allocation, outbound connections may be dropped, resulting in various 5xx errors. By default, the number of pre-allocated SNAT ports on an app service is 128 per instance.

It’s important to note that SNAT port consumption is based on connections to the same address and port, and is not an issue if you are doing outbound connections to lots of different addresses.

Within an App Service, an internal load balancer manages SNAT port allocation and reclaims ports from closed connections after a four-minute timeout. However, a sudden surge in outbound connections within a short timeframe can lead to port exhaustion, as the available ports may be insufficient to handle the increased demand.

When planning your solutions you have to consider scenarios where many users may access your application simultaneously. If this is a common usage pattern for your app, you need to plan for the possibility to encounter this issue. I’ve experienced this issue first hand, and when it occurs, it’s often hard to identify the cause, even to recognize that it is port exhaustion that is the underlying problem. Luckily Microsoft has created several dashboard to help us with this issue under the Diagnose and solve problems tools in Azure. In the below screenshot you will see charts for available and used ports which gives a good indication of your current situation.

Measures to prevent SNAT issues

To avoid the SNAT issues there are different measures you can take:

Review your application code

Make sure that your outgoing calls from your applications are using best practices when it comes to connection pooling. For HTTP calls it’s important to implement the HttpClientFactory (Use IHttpClientFactory to implement resilient HTTP requests) pattern. In terms of connections to databases you also using connection pooling and that connections are opened and closed according to best practices.

Introduce caching mechanisms

Another easy-to-implement action you could take is to introduce more caching mechanisms in your application which again reduces the need for doing outbound calls. Be aware that utilizing a caching provider like Redis, actually also introduces outgoing calls, and again a lot of calls to this service can also make your application vulnerable to port exhaustion if you don’t configure it correctly.

Optimize your infrastructure

If your application is following best practices in terms of outgoing connections and you still face issues with exhaustion, you can do additional measures on your infrastructure. Firstly, try to avoid that your calls go out on the public internet by utilizing service endpoints and private endpoints. Take a look at my previous blog post to read more about Virtual Network integration here 👉 Security hardening of your Azure PaaS services

And finally, if you have large amounts of outgoing calls and taken the other measures is to take a look at the Azure NAT Gateway. The NAT gateway gives you 64 000 available SNAT ports. The NAT gateway dynamically allocates ports to your different resources in your subnets and effectively removes your exhaustion issues.

You will come a long way by making adjustments to optimize your application code to use resilient, best practice frameworks and patterns. However, if you have an application with a lot of outbound connections and a large user base, additional infrastructure measures – such as using private endpoints or utilizing the Azure NAT Gateway can ensure stable performance under heavy load.


Posted

in

,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *