June 19, 2019•232 words
A peer showing no symptoms will be in the UP state this is the peak of health
If a peer shows concerning but tolerable symptoms it will be put in the UNSTABLE state.
In this state the peer is still searched but we emit warnings about our symptoms on the bulletin board.
Preempts all previous states. Currently symptoms that fall into this are:
- Clock skew between search head and peer. We get the peer's time from the timestamp on the Http Response headers during the heartbeat. If this exceeds a configurable in limits.conf we consider clocks to be skewed.
- Over subscribed peers. If an indexer is streaming back search results at a much slower rate than others then it can hold up the completion of the whole search. We currently have logic to detect such slow peers in the search process. Currently we use this logic to kill the peer before we get all the data. (Feature is off by default)
For all other symptoms we move the peer to the DOWN state. In this state the peer is not searched but we still heartbeat to monitor it. Preempts all previous states.
There should never be a situation where this state is reached. However, if this status code shows up in your indexing cluster, welp, there you are.