NOT dest_ip IN ("10.0.0.0/8","172.16.0.0/12","192.168.0.0/16")
I'm going to say there likely won't be a huge material difference in any of these yes, leading wildcards usually matter as they force the reading of the lexicon of each tsidx file from beginning to end. But, for index it is a special case because all active index names are already known, in memory, and a set around O(1000) the second one - the one with stats - does not actually need the fields because stats knows which fields it needs, and because of "smart mode" will only extract the needed fields when running a transforming (reporting) command like stats. If you're running in "verbose mode" then the added call to fields may help some, but you should not be performance testing things being run in verbose mode anyway! which leaves us with, "is dedup more or less efficient that stats" stats is really powerful from a performance point of view because it does map-reduce very well. Each indexer is able to independently compute it's own "local stats" for your function and fields. And then, pass back a much smaller result to the search head. The search head then can take the much smaller result set, and compute the "global stats" based on the local ones. For things like max() this is obvious (edited) but, dedup can do the exact same thing. dedup is also easily map-reduced because each indexer can compute its own local set up dedup'ed results, and then pass that back to the indexer to run one more dedup cycle over everything ... so far ... there's nothing here that suggests that there should be a large performance difference in either of these there is one semantic difference though - the stats returns only the three fields - f1, f2, and max(_time). The dedup returns the newest whole raw event for each value of f1, f2 ( which is why you need the table ) From a semantic point of view, if all you needed was the max(_time) for values of f1 and f2 then the stats is more correct But, I think that if you do a really careful objective performance test (account for other load on the system, kernel data caching, etc) ... you'll find the differences are so tiny as to be like 5% either way
| rest splunk_server_group=dmc_group_indexer /servicesNS/-/-/data/indexes
| fields splunk_server title repFactor homePath homePath_expanded coldPath coldPath_expanded thawedPath thawedPath_expanded summaryHomePath_expanded tstatsHomePath tstatsHomePath_expanded
| eval Index = title, hot = mvappend(homePath, homePath_expanded), cold = mvappend(coldPath, coldPath_expanded), thawed = mvappend(thawedPath, thawedPath_expanded), summaries = summaryHomePath_expanded, dma = mvappend(tstatsHomePath, tstatsHomePath_expanded)
| stats values(splunk_server) AS "Indexers" values(repFactor) AS "Replication Factor" values(hot) AS "Hot/Warm" values(cold) AS "Cold" values(thawed) AS "Thawed" values(summaries) AS "Summaries" values(dma) AS "Data Model Accelerations" by Index
| rest splunk_server=local /servicesNS/-/-/data/transforms/lookups | fields title eai:appName type filename collection
Courtesy of David Paper
index=_internal earliest=-1h group=search_concurrency host=<search head glob> ("system total") | rex field=_raw mode=sed "s/system total/user=system/g" |eval user=coalesce(user,"system") | timechart max(active_hist_searches) by user
A peer showing no symptoms will be in the UP state this is the peak of health
If a peer shows concerning but tolerable symptoms it will be put in the UNSTABLE state.
In this state the peer is still searched but we emit warnings about our symptoms on the bulletin board.
Preempts all previous states. Currently symptoms that fall into this are:
- Clock skew between search head and peer. We get the peer's time from the timestamp on the Http Response headers during the heartbeat. If this exceeds a configurable in limits.conf we consider clocks to be skewed.
- Over subscribed peers. If an indexer is streaming back search results at a much slower rate than others then it can hold up the completion of the whole search. We currently have logic to detect such slow peers in the search process. Currently we use this logic to kill the peer before we get all the data. (Feature is off by default)
For all other symptoms we move the peer to the DOWN state. In this state the peer is not searched but we still heartbeat to monitor it. Preempts all previous states.
There should never be a situation where this state is reached. However, if this status code shows up in your indexing cluster, welp, there you are.
index=_internal host=indexer* OR host=cm* ((source=*splunkd.log* my guid) OR (source=*health* due_to_stanza="feature:data_searchable" color=red)) | eval type=case(match(source,"health"),"not searchable",match(source,"splunkd\.log"),"start-up") | timechart span=1m dc(sourcetype) by type
Thanks to JonRust on Slack
_bump for “content files” (css/js/appserver), debug/refresh for “config changes/xml/conf” and “splunkweb restart” for persistant handlers. mod input, custom command py files are executed fresh each instantiation after the initial “pick up new things splunkd restart”. conf.spec requires restart
|tstats summariesonly=true allow_old_summaries=true count from datamodel=Authentication where Authentication.action="failure" by _time Authentication.dest span=1s | rename Authentication.* AS * | streamstats time_window=1m sum(count) AS dest_failures by dest