I just started working on Jive support a week ago, and was told that sometimes the clients were getting 502 - Service Unavailable on their browser. This also happens when running other web calls like web services, etc.
So i started taking a look at the jive infrastructure and found the reason for which this is happening and would like to get feedback from Jive as to if these configuration changes will be supported, so i can do a final push to all production servers.
This issue turned to be a problem with a connection race condition that occurs between httpd and tomcat. What happens is that the proxy (httpd's mod_proxy) checks the connection state on a pooled connection to the backend (tomcat) and sends the actual request after that, though sometimes, the connection expires in the tomcat backend between those two steps, so the proxy sees an OK connection when he polls the backend, but it no longer is when the actual request is sent.
I've worked around this problem by configuring a couple of settings on the httpd config, altough there usually are other ways to fix this.
The solution i've implemented in the httpd's config, which disables HTTP 1.1 and unsets a HTTP 1.1 specific option, since tomcat is trying to force this version of the protocol, as per its config.
SetEnv force-proxy-request-1.0 1
SetEnv proxy-nokeepalive 1
RequestHeader unset Expect early
I can't see any performance impact on our production infrastructure on the node where this is implemented, but would have to run a load generator to actually measure if there is any impact... most of the bandwidth optimizations presented by HTTP 1.1 are irrelevant since this connection is only used on the local machine, through the loopback interface.
I also tried another solution, which would keep the HTTP 1.1 features between tomcat and httpd, but unfortunately it wasn't 100% successful, probably because the httpd is using the prefork MPM module, which doesn't seem to be honoring the ttl on the server threads.
In theory, if the TTL on the httpd server thread is just a few seconds lower than the tomcat's connection/keepalive timeout, tomcat should never close a connection between mod_proxy's polling and request send.
For this test i looked up the configured TTL on the httpd's config files (which is set at 120 seconds) and increased the timeout on the tomcat server from 20 seconds do 130 seconds on the server.xml file.
The result was only a decrease in the 502 errors in a ratio of 1:6.5 which is the ratio of the timeout increase.. this is because the httpd threads are't being recycled after the 120 seconds.
Another config option that would likely fix this is setting the proxy-initial-not-pooled setting on the httpd conf file, but per apache's documentation, this should hurt performance a bit.
I'm sure everyone's getting a few 502s every once in a while in their environment or load tests and would like to see this issue resolved as we would, in a way that is supported or shipped by default from jive.
I've been asked to get Jive's formal opinion on this, so could you please advise as to if i can push this to all nodes and have no future issues from jive support for using some form of unsupported config?