nginx的upstream异常 - code-craft - SegmentFault 思否
异常
upstream server temporarily disabled while connecting to upstream
no live upstreams while connecting to upstream
max_fails与fail_timeout
max_fails默认值为1,fail_timeout默认值为10秒。
nginx可以通过设置max_fails(最大尝试失败次数)和fail_timeout(失效时间,在到达最大尝试失败次数后,在fail_timeout的时间范围内节点被置为失效,除非所有节点都失效,否则该时间内,节点不进行恢复)对节点失败的尝试次数和失效时间进行设置,当超过最大尝试次数或失效时间未超过配置失效时间,则nginx会对节点状会置为失效状态,nginx不对该后端进行连接,直到超过失效时间或者所有节点都失效后,该节点重新置为有效,重新探测.
upstream backend {
server backend1.example.com weight=5;
server 127.0.0.1:8080 max_fails=3 fail_timeout=30s;
server unix:/tmp/backend3;
server backup1.example.com backup;
}
fail的标准
比如
connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: , request: "POST /demo HTTP/1.1", subrequest: "/capture/getstatus", upstream: "http://192.168.99.100:8080/api/demo/
比如
upstream timed out (110: Connection timed out) while reading response header from upstream
Nginx 默认判断失败节点状态以connect refuse和time out状态为准,不以HTTP错误状态进行判断失败,因为HTTP只要能返回状态说明该节点还可以正常连接,所以nginx判断其还是存活状态;除非添加了proxy_next_upstream指令设置对404、502、503、504、500和time out等错误进行转到备机处理,在next_upstream过程中,会对fails进行累加,如果备用机处理还是错误则直接返回错误信息(但404不进行记录到错误数,如果不配置错误状态也不对其进行错误状态记录),综述,nginx记录错误数量只记录timeout 、connect refuse、502、500、503、504这6种状态,timeout和connect refuse是永远被记录错误状态,而502、500、503、504只有在配置proxy_next_upstream后nginx才会记录这4种HTTP错误到fails中,当fails大于等于max_fails时,则该节点失效.
探测机制
如果探测所有节点均失效,备机也为失效时,那么nginx会对所有节点恢复为有效,重新尝试探测有效节点,如果探测到有效节点则返回正确节点内容,如果还是全部错误,那么继续探测下去,当没有正确信息时,节点失效时默认返回状态为502,但是下次访问节点时会继续探测正确节点,直到找到正确的为止。
实验log
upstream test_server{
server 192.168.99.100:80801;
server 192.168.99.100:80802;
server 192.168.99.100:80803;
}
##for capture
location /api/test/demo{
proxy_pass http://test_server/api/demo;
}
location /api/demo{
default_type application/json;
content_by_lua_file conf/lua/demo.lua;
}
lua
local cjson = require "cjson.safe"
testres = ngx.location.capture("/api/test/demo",{
method= ngx.HTTP_POST,
body = "arg1=xxxx&arg2=xxxxx"
})
ngx.log(ngx.ERR,"status"..testres.status)
local testbody = cjson.decode(testres.body)
ngx.log(ngx.ERR,testbody==nil)
请求192.168.99.100:8080/api/demo,里头的lua会发起一个capture,请求/api/test/demo
请求一次
2017/02/09 14:48:57 [error] 5#5: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80801/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [warn] 5#5: *1 upstream server temporarily disabled while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80801/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [error] 5#5: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80802/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [warn] 5#5: *1 upstream server temporarily disabled while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80802/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [error] 5#5: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80803/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [warn] 5#5: *1 upstream server temporarily disabled while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80803/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [error] 5#5: *1 [lua] demo.lua:44: status502 while sending to client, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", host: "192.168.99.100:8080"
对upstream逐个请求,都失败,则capture的subrequest返回502,对client返回的status code取决于lua脚本
再请求一次
2017/02/09 15:09:34 [error] 6#6: *11 no live upstreams while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://test_server/api/demo", host: "192.168.99.100:8080"
该upstream下面的server都挂的情况下出现no live upstreams while connecting to upstream