How to do capacity planning for Splunk Enterprise?

Splunk Enterprise is a popular solution for the operational intelligence for data center. The name of Splunk comes from ‘spelunking’ because the founders of Splunk feel understanding machine data is like spelunking in cold cave.splunk2

The way that Splunk Enterprise works is to collect syslogs and event logs from all network devices, Windows and Linux machines, etc., then build up time series based index data files as the search source. Splunk instances include Search Head, Search Peer (Indexer) and Forwarder.

In a data center, the log files keep growing all the time. In order to make Splunk index these files efficiently, to make proper capacity planning is significant. Below is the formula to do capacity planning for Splunk:

(Daily average indexing rate) * (Index Replication Count) * (retention policy in days) * 50%

For instance, an organization has two data centers, the Daily average indexing rate is 20-25G data in each data center; general retention policy is 3-6 months, but for this organization it’s 1 year.  Raw data files related storage needs is 15% and index data file needs is 35% so overall it’s 50% of all the storage needs. Let’s see how we calculate the capacity based on above formula:

Storage needs for two data centers: 25G * 2 * 365 days * 50% = 9125G = 8.91T

Storage needs for one data center:  8.91T / 2 = 4.5T

There’s also a Splunk sizing tool available at: https://splunk-sizing.appspot.com/

Should you have any questions regarding Splunk architectural design or sizing, please let us know.