| rest splunk_server_group=dmc_group_indexer /servicesNS/-/-/data/indexes
| fields splunk_server title repFactor homePath homePath_expanded coldPath coldPath_expanded thawedPath thawedPath_expanded summaryHomePath_expanded tstatsHomePath tstatsHomePath_expanded
| eval Index = title, hot = mvappend(homePath, homePath_expanded), cold = mvappend(coldPath, coldPath_expanded), thawed = mvappend(thawedPath, thawedPath_expanded), summaries = summaryHomePath_expanded, dma = mvappend(tstatsHomePath, tstatsHomePath_expanded)
| stats values(splunk_server) AS "Indexers" values(repFactor) AS "Replication Factor" values(hot) AS "Hot/Warm" values(cold) AS "Cold" values(thawed) AS "Thawed" values(summaries) AS "Summaries" values(dma) AS "Data Model Accelerations" by Index
| rest splunk_server=local /servicesNS/-/-/data/transforms/lookups | fields title eai:appName type filename collection
Courtesy of David Paper
index=_internal earliest=-1h group=search_concurrency host=<search head glob> ("system total") | rex field=_raw mode=sed "s/system total/user=system/g" |eval user=coalesce(user,"system") | timechart max(active_hist_searches) by user
A peer showing no symptoms will be in the UP state this is the peak of health
If a peer shows concerning but tolerable symptoms it will be put in the UNSTABLE state.
In this state the peer is still searched but we emit warnings about our symptoms on the bulletin board.
Preempts all previous states. Currently symptoms that fall into this are:
- Clock skew between search head and peer. We get the peer's time from the timestamp on the Http Response headers during the heartbeat. If this exceeds a configurable in limits.conf we consider clocks to be skewed.
- Over subscribed peers. If an indexer is streaming back search results at a much slower rate than others then it can hold up the completion of the whole search. We currently have logic to detect such slow peers in the search process. Currently we use this logic to kill the peer before we get all the data. (Feature is off by default)
For all other symptoms we move the peer to the DOWN state. In this state the peer is not searched but we still heartbeat to monitor it. Preempts all previous states.
There should never be a situation where this state is reached. However, if this status code shows up in your indexing cluster, welp, there you are.
index=_internal host=indexer* OR host=cm* ((source=*splunkd.log* my guid) OR (source=*health* due_to_stanza="feature:data_searchable" color=red)) | eval type=case(match(source,"health"),"not searchable",match(source,"splunkd\.log"),"start-up") | timechart span=1m dc(sourcetype) by type
Thanks to JonRust on Slack
_bump for “content files” (css/js/appserver), debug/refresh for “config changes/xml/conf” and “splunkweb restart” for persistant handlers. mod input, custom command py files are executed fresh each instantiation after the initial “pick up new things splunkd restart”. conf.spec requires restart
|tstats summariesonly=true allow_old_summaries=true count from datamodel=Authentication where Authentication.action="failure" by _time Authentication.dest span=1s | rename Authentication.* AS * | streamstats time_window=1m sum(count) AS dest_failures by dest