How to Fix Upstream Connect Error or Disconnect/Reset Before Headers in Envoy Proxy

How to Fix Upstream Connect Error or Disconnect/Reset Before Headers in Envoy Proxy

If you are using Envoy proxy as a service mesh or a load balancer, you might encounter the following error message in your logs:

This error means that Envoy was unable to establish a connection with the upstream service or the connection was terminated before Envoy received any response headers. There are several possible causes and solutions for this error, depending on your configuration and environment. In this post, we will explore some of the common scenarios and how to troubleshoot them.

Scenario 1: Invalid or Unreachable Upstream Address

One of the most obvious reasons for this error is that the upstream address that Envoy is trying to connect to is invalid or unreachable. This could happen if you have a typo in your configuration, if the upstream service is down or misconfigured, or if there is a network issue preventing the connection.

To verify that the upstream address is valid and reachable, you can use tools like curl, telnet, or ping to test the connectivity from the Envoy host. For example, if your upstream service is running on port 8080, you can try:

If any of these commands fail, it means that there is a problem with the upstream service or the network. You should check the following:

  • The upstream service is running and listening on the correct port.
  • The upstream service has a valid DNS entry or IP address that Envoy can resolve.
  • The network firewall or security group rules allow traffic from Envoy to the upstream service on the required port.
  • The network routing or load balancing configuration is correct and does not drop or redirect packets.

If you can confirm that the upstream address is valid and reachable, you can move on to the next scenario.

Scenario 2: TLS Mismatch Between Envoy and Upstream Service

Another common reason for this error is that there is a mismatch between the TLS settings of Envoy and the upstream service. This could happen if you have enabled TLS on either side but not on both, or if you have different TLS versions or cipher suites.

To verify that the TLS settings are compatible, you can use tools like openssl or nmap to test the TLS handshake from the Envoy host. For example, if your upstream service is using TLS on port 8443, you can try:

If any of these commands fail, it means that there is a problem with the TLS configuration. You should check the following:

  • The Envoy cluster has transport_socket configured with tls_context if the upstream service requires TLS, or has no transport_socket configured if the upstream service does not require TLS.
  • The Envoy cluster has transport_socket_matches configured with multiple tls_context options if the upstream service supports multiple TLS versions or cipher suites, or has no transport_socket_matches configured if the upstream service supports only one TLS version or cipher suite.
  • The Envoy cluster has sni configured with the correct server name if the upstream service requires SNI, or has no sni configured if the upstream service does not require SNI.
  • The Envoy cluster has verify_subject_alt_name or match_subject_alt_names configured with the correct values if the upstream service requires certificate validation, or has no verify_subject_alt_name or match_subject_alt_names configured if the upstream service does not require certificate validation.
  • The Envoy cluster has ca_certificate_path configured with the correct path to the CA certificate if the upstream service requires certificate validation, or has no ca_certificate_path configured if the upstream service does not require certificate validation.
  • The Envoy cluster has certificate_chain_path and private_key_path configured with the correct paths to the client certificate and key if the upstream service requires client authentication, or has no certificate_chain_path and private_key_path configured if the upstream service does not require client authentication.

If you can confirm that the TLS settings are compatible, you can move on to the next scenario.

Scenario 3: Timeout or Circuit Breaking Between Envoy and Upstream Service

Another possible reason for this error is that there is a timeout or circuit breaking between Envoy and the upstream service. This could happen if you have configured timeouts or circuit breakers on either side that are too low or too strict for your traffic pattern.

To verify that there is no timeout or circuit breaking between Envoy and the upstream service, you can use tools like envoyproxy/envoy or envoyproxy/envoy-stats to inspect the Envoy metrics and statistics from the Envoy host. For example, you can try:

If you see any metrics or statistics that indicate timeouts or circuit breaking, such as upstream_rq_timeout, upstream_rq_per_try_timeout, upstream_cx_connect_fail, upstream_cx_connect_timeout, upstream_cx_overflow, or upstream_cx_open_circuit_breakers, it means that there is a problem with the timeout or circuit breaker configuration. You should check the following:

  • The Envoy cluster has connect_timeout configured with a reasonable value that allows enough time for the connection to be established.
  • The Envoy cluster has per_connection_buffer_limit_bytes configured with a reasonable value that allows enough buffer space for the connection data.
  • The Envoy cluster has max_requests_per_connection configured with a reasonable value that allows enough requests to be sent over the same connection.
  • The Envoy cluster has circuit_breakers configured with reasonable values that allow enough concurrent connections, requests, and retries for your traffic pattern.
  • The Envoy route has timeout configured with a reasonable value that allows enough time for the request to be processed.
  • The Envoy route has retry_policy configured with reasonable values that allow enough retries for your traffic pattern.

If you can confirm that there is no timeout or circuit breaking between Envoy and the upstream service, you can move on to the next scenario.

Scenario 4: Protocol Mismatch Between Envoy and Upstream Service

Another potential reason for this error is that there is a mismatch between the protocol used by Envoy and the upstream service. This could happen if you have configured Envoy to use HTTP/2 or gRPC but the upstream service only supports HTTP/1.1, or vice versa.

To verify that the protocol used by Envoy and the upstream service is compatible, you can use tools like curl or grpcurl to test the protocol from the Envoy host. For example, if your upstream service is using HTTP/2 on port 8443, you can try:

If any of these commands fail, it means that there is a problem with the protocol configuration. You should check the following:

  • The Envoy cluster has http2_protocol_options configured if the upstream service supports HTTP/2, or has no http2_protocol_options configured if the upstream service does not support HTTP/2.
  • The Envoy cluster has http_protocol_options configured if the upstream service supports HTTP/1.1, or has no http_protocol_options configured if the upstream service does not support HTTP/1.1.
  • The Envoy cluster has protocol_selection configured with the correct value (USE_CONFIGURED_PROTOCOL, USE_DOWNSTREAM_PROTOCOL, or AUTO) that matches the protocol used by the downstream client and the upstream service.
  • The Envoy route has upgrade_configs configured if the upstream service supports protocol upgrade, or has no upgrade_configs configured if the upstream service does not support protocol upgrade.

If you can confirm that the protocol used by Envoy and the upstream service is compatible, you have successfully eliminated some of the common causes of this error.

Conclusion

In this post, we have covered some of the common scenarios and solutions for fixing the error message:

in Envoy proxy. We hope this post helps you troubleshoot and resolve this error in your own environment. If you have any questions or feedback, please leave a comment below.