356 words

@automine Guestbook
You'll only receive email when automine publishes a new post

Search to see number of concurrent searches

Courtesy of David Paper

index=_internal earliest=-1h group=search_concurrency host=<search head glob> ("system total") | rex field=_raw mode=sed "s/system total/user=system/g" |eval user=coalesce(user,"system") | timechart max(active_hist_searches) by user

Splunk clustering status


A peer showing no symptoms will be in the UP state this is the peak of health


If a peer shows concerning but tolerable symptoms it will be put in the UNSTABLE state.
In this state the peer is still searched but we emit warnings about our symptoms on the bulletin board.
Preempts all previous states. Currently symptoms that fall into this are:

  • Clock skew between search head and peer. We get the peer's time from the timestamp on the Http Response headers during the heartbeat. If this exceeds a configurable in limits.conf we consider clocks to be skewed.
  • Over subscribed peers. If an indexer is streaming back search results at a much slower rate than others then it can hold up the completion of the whole search. We currently have logic to detect such slow peers in the search process. Currently we use this logic to kill the peer before we get all the data. (Feature is off by default)


For all other symptoms we move the peer to the DOWN state. In this state the peer is not searched but we still heartbeat to monitor it. Preempts all previous states.


There should never be a situation where this state is reached. However, if this status code shows up in your indexing cluster, welp, there you are.

Data Durability Status and History

index=_internal host=indexer* OR host=cm* ((source=*splunkd.log* my guid) OR (source=*health* due_to_stanza="feature:data_searchable" color=red))
| eval type=case(match(source,"health"),"not searchable",match(source,"splunkd\.log"),"start-up")
| timechart span=1m dc(sourcetype) by type

Thanks to JonRust on Slack

Splunk dev with bump, refresh, restarts

_bump for “content files” (css/js/appserver), debug/refresh for “config changes/xml/conf” and “splunkweb restart” for persistant handlers. mod input, custom command py files are executed fresh each instantiation after the initial “pick up new things splunkd restart”. conf.spec requires restart

Thanks, alacercogitatus

Rolling authentication failures by device over 1 minute windows

|tstats summariesonly=true allow_old_summaries=true count from datamodel=Authentication where  Authentication.action="failure" by _time Authentication.dest span=1s 
| rename Authentication.* AS * 
| streamstats time_window=1m sum(count) AS dest_failures by dest