I explain Splunk to my team using a layered approach.
Eventually we found that alot of the stuff we were generating and then feeding into Splunk, can be generated more conveniently by Splunk itself. Furthermore replacing alot of homebrew code with more robust, flexible and easier to maintain Splunk applications.
- 1. Full-Text Search… of the world
Start with a full text search engine. But put _everything_ we have into it.
- logs ( host and application )
- configuration files (change management)
- stats (system performance, monitoring)
- alerts (snmp, emails, anything automated)
- reports
- emails
- bugs, tickets, issue and tracker systems
- documentation, wikis, and maybe even source code
This is immediately useful in small ways, and is a common practice at large companies anyways. At least for documentation, wikis, tickets. To search for details related to a host when working on a problem ticket, or fixing bugs.
- 2. Records
Give that full text search engine a knowledge of the ‘record’ structure within all those files. Thus when you search, it can give you just the relevant records from the file. But it can also display records in context or the entire file if desired.
This also lets you restrict search by records, or by transactions made of groups of records, making more powerful searches possible.
- 3. Fields
Make it cognizant of the ‘fields’ present in those records. This is obviously useful for narrowing queries, as well as sorting, and displaying only relevant info. But these fields are flexible, and resolved in a lazy fashion. They can be different; per file, per record, per search. Furthermore they don’t require the heavy project overhead of schema and datamining planning. The lazy resolution of fields becomes a powerful tool when combined with step #4.
- 4. Powerful Expression language
Building upon all the structure of records and fields, and the lazy resolution of structure to allow all sorts of complicated processing and data manipulation: splitting, mutating, joining datasets. Not just searching through records, but also data munging, to generate new data and then perform further searches on that.
- 5. Powerful User Interface.
- a CLI with autocompletion
- intuitive record/field browsing
- automatically populated one-click drilldowns
- graphing/visualisation
- default dashboards automatically populated with commonly used searches and keywords.
- custom dashboards
- 6. Zeroconf
Splunk detects most of the details and configuration itself. With heavy heuristics that do the right thing most of the time. But can be overridden in the cases they don’t, or just for further control.
The only place you need to do some forethought is some capacity planning.
- 7. Scalability
And finally, wrap this all up into a componentized architecture so that it scales well, and you can scale just the components that you need to. Whether that be for capacity, or for performance.
Eventually we found that alot of the stuff we were generating and then feeding into Splunk, can be generated more conveniently by Splunk itself. Furthermore replacing alot of homebrew code with more robust, flexible and easier to maintain Splunk applications.
I'm the dev lead on the EM4J project that was spoken about at VMWorld. You're right to surmise that the JVM has a problem with regular ballooning. At it's most simple, the problem is that the JVM doesn't give memory back to the operating system and thus always ends up consuming it's high watermark memory, even if there's plenty of free space in the heap. Since the balloon driver can only reclaim memory from the OS, these two models are basically incompatible. The result is that it's fiendishly difficult to predict when regular ballooning is safe and the consequences of getting it wrong are severe (due to GC characteristics as you state). Hence the current best practice of using memory reservations.
I'm putting together a series of YouTube clips to explain this whole area in more detail. Hopefully these will be helpful. First one is up now: http://www.youtube.com/watch?v=kyz7J-FQUSM