We had an issue, an issue that took many hours of diagnosing... I'm hoping this will help others so i'll keep it simple.
If you are using Kong and ECS Service Connect and see strange issues relating to the Request headers not being as you expected....
TL;DR
If you are using Service Connect in ECS, an ALB that is routing traffic to your service, WILL USE SERVICE CONNECT to transmit traffic to your service (there is no mention of this in the ALB and the ALB is not registered in any Service Connect namespaces). You need to handle traffic that is now coming from localhost
and not the ALB as you'd expect.
Background (details have been simplified)
- ECS Service running on port 8000 (a reverse proxy) - Kong Api Gateway
- An ALB routing traffic to a Ruby on Rails app on port 8000
- Kong "trusts" requests from its own VPC (containing ECS and ALB) i.e.
trusted_ips=10.0.0.0/16,<IPs if trusted infrastructure, CloudFlare et al>)
- Kong proxies requests to Rails via an internal ALB
- Kong "sees" request coming from a private VPC IP address of the load balancer (i.e.
10.0.54.147
)
all works fine.
Then you want to use Service Connect...
If you enable Service Connect, so that Kong sends traffic to Rails (which behind the scenes routes traffic into Kongs envoy sidecar container and comes out Railss envoy container; everything works and the Rails ALB is no longer required, awesome.
BUT...
We began to see ActionController::InvalidAuthenticityToken
errors from the Rails app, as Kong isn't sending the correct client IP headers (X-Forwarded-Proto
was incorrect in our case)
We trust the CIDR address of our ALB (and the rest of the VPC) which allows Kong to correctly set all the requiredX-Forwarded-
headers, but this RELIES on traffic coming from this trusted CIDR.
https://docs.konghq.com/gateway/latest/how-kong-works/routing-traffic/#proxying-and-upstream-timeouts
In the case where $realip_remote_addr is one of the trusted addresses, the request header with the same name gets forwarded if provided. Otherwise, the value of the $scheme variable provided by ngx_http_core_module will be used."
Suddenly incorrect header values were getting to Rails and causing the above error.
Kong also starts seeing requests from the ALB to be actually from "127.0.0.1 (localhost)". The only way requests can be coming OUT of the Envoy sidecar at Kong is if the ALB is using Envoy too!? which isn't documented.
Because we trust the VPC CIDR, we did not trust 127.0.0.1 and request began to fail in very unexpected ways. As soon as we started trusting 127.0.0.1 everything was good again
Clues
After adding 127.0.0.1 to the trusted IPs, I wanted to find HOW this would fix it.
This comment from AWS gave a huge clue...
"you are able to have the ALB traffic 'bypass' the Service connect agent by setting a different port for the Service Connect Agent (ingressPortOverride defined here) which will allow non-Service Connect traffic (like Load balancer traffic) to not be intercepted by the Service Connect Agent.." https://github.com/aws/containers-roadmap/issues/1958#issuecomment-1654650077
Service Connect "intercepts" traffic, even if the service (in our case an ALB) is not configured in any way to use Service Connect (because this just isn't an option in the AWS console).
ingressPortOverride is really well documented here. With no mention of ALBs using the Envoy Tunnels ...https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ServiceConnectService.html
"The port number for the Service Connect proxy to listen on.
Use the value of this field to bypass the proxy for traffic on the port number specified in the named portMapping in the task definition of this application, and then use it in your VPC security groups to allow traffic into the proxy for this Amazon ECS service."
No mention of ALBs self inviting themselves to the Service Connect party.