Splunk HF - Advanced Data Routing & Cloning

Background

Nowadays, as we are integrating systems in-order to get data for Analytical purposes, Splunk plays a great & huge role for gathering such machine data and avail it for more robust wide range of Analytical capabilities on top the platform. 

And for getting the data in the form that will ease the Data Analysis and decrease the load on Splunk, Cribl provide that all in near real time as processing & transforming the data on the wire. 

And having such needs nowadays we have to have that ability and agility in how to perform the data routing between different streaming agents within the approved networking in our all Environments. 

Problem

Based-on that, within the distributed environments we require to have that capability for ( Physical ) routing/cloning the datasets within different streams to avail different wide range of agility in designing the data pipelines. 

One of these integration pattern challenges is being able to ( Physical ) route/clone data from up-stream HF to down-stream HFs/Cribl based-on a defined routing/cloning rule. That will enable passing the data from the up-stream HF to both down-stream HF & Cribl for data manipulation use-cases.

Next we will be performing a POC Steps that will enable you to achieve that data ( Physical ) routing/cloning purposes. 

POC Definition

The POC requires to be able to enable streaming data routing and cloning using Splunk HF Agent to MidTier Layers either another Splunk HF or Cribl.

Within the POC we will be performing multiple scenarios that will include the below:

  1. Simple data routing use-case
  2. Simple data cloning use-case
  3. Advance data routing/cloning use-case

The environment components used to perform the POC are as follows: 

  1. Server-1 → Routing HF: Is used to monitor a data on different indexes and sourcetype  levels and do the routing/cloning logic.
  2. Server-2 → MidTier HF: Is used to Forward the  data received from Routing HF to the EndTier Splunk Layer. 
  3. Server-3 → MidTier Cribl: Is used to Forward the  data received from Routing HF to the EndTier Splunk Layer.
  4. Server-4 → EndTier Splunk: Is a standalone Splunk instance that used to receive the data sent by the MidTier, index it and avail it for searching.

Below diagram will show more about the environment networking architecture diagram:

Scenarios

Scenario-1:

Referencing the below diagram, Scenario-1 Simple Routing requires to route ( CUT ) index [ idx_3 ] from MidTier HF to MidTier Cribl using the Routing HF.  

Scenario-2:

Referencing the below diagram, Scenario-2 Simple Cloning requires to clone index [ idx_3 ] from MidTier HF to as well MidTier Cribl using the Routing HF.  

Scenario-3:

Referencing the below diagram, Scenario-3 Advanced Routing/Cloning requires to either route or clone index [ idx_1 ] from MidTier HF to MidTier Cribl using the Routing HF taking into consideration that idx_1 index data is being sent with sourcetype [ st_idx_a ] which is the same for index [ idx_2 ] as well. Since, indexes [ idx_1 & idx_2 ]  have the same sourcetype [ st_idx_a ] that will require additional level of filtration while routing the data.  

Extreme Cases:

  1. Subset of the index name matches multiple other indexes
  2. Want to Route multiple indexes not only one. 

Configuration

Routing Server

Routing HF Layer is the initial Layer of getting data-in and responsible on routing/cloning data to MidTier Cribl/HF Layer. 

As based-on Scenario objective to Route/Clone idx-3 or idx-1 data to both MidTier HF & MidTier Cribl.

To perform such we will have routing rules specified in props.conf & transforms.conf based on the inputs & outputs specified. 

Simple Routing Rules mainly identified by the <sourcetype>, <source> or <host> within the props.conf as below selected the data comes in with sourcetype [st_idx_a]. Stanza properties:

  • TRANSFORMS-routing property to map to the routing rule in transforms.conf. transforms.conf stanza will hold the same name of props.conf TRANSFORMS-routing property value.

transforms.conf stanza contains multiple properties as:

  • SOURCE_KEY property aims to specify the regex filtration field. 
  • REGEX: to specify an extra layer of filtration on the data coming on the specified sourcetype (in our case),
  • DEST_KEY to specify either the data are going on TCP protocol or SYSLOG Protocol in our case we require TCP so the value is _TCP_ROUTING
  • Format property that specifies the output group configured in the outputs.conf so as we require cloning the data between the MidTier HF/Cribl then the format property value will be both output groups separated by comma as follows [ routing-cribl,default-autolb-group ]

inputs.conf

As Routing HF is the first layer, so it has to monitor some files in order to perform the data streaming as if getting data form customer sources.

Below we are configuring three sources pushing data to three different indexes having common/different sourcetype names, as index_1 & index_2 have common sourcetype name [ st_idx_a ] and index_3 has separate sourcetype [ st_idx_b ].

Configuration
1
2
3
4
5
6
7
8
9
10
11
[monitor://$SPLUNK_HOME/var/log/splunk/splunkd.log]
index = index_1
sourcetype = st_idx_a
 
[monitor://$SPLUNK_HOME/var/log/splunk]
index = index_2
sourcetype = st_idx_a
 
[monitor://$SPLUNK_HOME/var/log/splunk/web_*]
index = index_3
sourcetype = st_idx_b

props.conf

As described props.conf aims to filter out the data by sourcetype, source or host and to specify the transforms-routing property. 

Scenario < 1 > & < 2 >

Configuration
1
2
[st_idx_b]
TRANSFORMS-routing="TRANSFORMS-routing3"

Scenario < 3 >

Configuration
1
2
[st_idx_a]
TRANSFORMS-routing="TRANSFORMS-routing1"

transforms.conf

For performing the scenarios required for < 1 >, < 2 > & < 3 > we will  be manipulating the SOURCE_KEY & FORMAT Properties Values.

Scenario < 1 >

As the scenario requires to route the index_3 data to MidTier Cribl we will specify FORMAT to point-out to output.conf output group [ routing-cribl ] that point to the Cribl destination. 

Configuration
1
2
3
4
[TRANSFORMS-routing3]
REGEX=.
DEST_KEY=_TCP_ROUTING
FORMAT=routing-cribl

Scenario < 2 >

As the scenario requires to clone the index_3 data to MidTier Cribl as well routing it to MidTier HF. We will specify FORMAT to point-out to both output.conf output-groups [ routing-cribl,default-autolb-group ] that point to the MidTier Cribl/HF destination. 

Configuration
1
2
3
4
[TRANSFORMS-routing3]
REGEX=.
DEST_KEY=_TCP_ROUTING
FORMAT=routing-cribl,default-autolb-group

Scenario < 3 >

As Scenario-3 requires to either route/clone [ index_1 ] taking into consideration that the specified sourcetype in props.conf [ st_idx_a ] holds both [ index_1 & index_2 ]. Therefore we need to specify the SOURCE_KEY as [ _MetaData:Index ] to be able to have an extra routing/cloning specification by Index through the REGEX property by specifying it to [ index_1 ].

By that the transforms.conf virtually will receive data that holds sourcetype [ st_idx_a ] from props.conf specification. And since there is two indexes [ index_1 & index_2 ] sending data having the same sourcetype, transforms.conf by specifing SOURCE_KEY & REGEX will apply the routing/cloning rules only to REGEX Value [ index_1 ].

Finally, and as applied in Scenario < 1 > & < 2 >, for routing & cloning we need to specify the Format Property either including only one output-group for routing or multiple output-group for cloning. For instance we will be routing the data only through Cribl output-group. 

Configuration
1
2
3
4
5
6
7
8
[TRANSFORMS-routing1]
SOURCE_KEY = _MetaData:Index
REGEX=index_1
DEST_KEY=_TCP_ROUTING
## For Cloning Data ##
#FORMAT=routing-cribl,default-autolb-group
## For Routing Data ##
FORMAT=routing-cribl

Extreme Cases Handling:

  1. Subset of the index name matches multiple other indexes
    • If we want to route an index called [ idx_1 ] & there is another index called [ idx_1_1 ]. Using Regex parameter value as [ idx_1 ] that will match the value and route both indexes while we need [ idx_1 ] only. Therefore the right pattern for the Regex field is [ idx_1$ ] to match exactly the required index only.
  2. Want to Route multiple indexes not only one. 
    • To match multiple index for routing/cloning you can re-use this in the Regex Parameter Value as follows [ idx_1$|idx_2$ ]. 
    • We specified [ idx_1$ ] to get the exact match for idx_1 using [ $ ] as well as [ idx_2$ ]
    • We Specified pipe [ | ] as Or Condition to match events have either exact [ idx_1$ ] or [ idx_2$ ] 

outputs.conf

As previously mentioned by the Scenarios, we do have two destinations MidTier HF & MidTier Cribl. Under [tcpout] stanza we defined the defaulGroup  to be called  [default-autolb-group]. And then defined the [tcpout:default-autolb-group] that refers to MidTier HF IP:PORT. 

And defined another group for MidTier Cribl called [tcpout:routing-cribl] having as well the IP:PORT

Configuration
1
2
3
4
5
6
7
8
9
10
[tcpout]
defaultGroup = default-autolb-group
 
[tcpout:routing-cribl]
server = 192.168.226.167:8888
 
[tcpout:default-autolb-group]
server = 192.168.226.180:8887
 
[tcpout-server://192.168.226.180:8887]


MidTier HF

MidTier HF aims to forward in-coming data from Routing HF to the EndTier Splunk as below has configured the following:

inputs.conf

Inputs.conf has configured to get data through SplunkTCP port 8887 which the port enabled to receive data from the Routing HF.

Configuration
1
2
[splunktcp://8887]
connection_host = ip

outputs.conf

As mentioned that MidTier will forward the in-coming traffic to EndTier, output.conf reflects such configuration by configuring the tcpout to EndTier Splunk on server 192.168.226.178 on port 9997

Configuration
1
2
3
4
5
6
7
[tcpout]
defaultGroup = default-autolb-group
 
[tcpout:default-autolb-group]
server = 192.168.226.178:9997
 
[tcpout-server://192.168.226.178:9997]

MidTier Cribl

Source Configuration:

Source SplunkTCP has been configured to receive data through port 8888 coming from any address on 0.0.0.0

Route Configuration:

Then a Route has been configured to filter data through source splunk:SPLUNK_72_HF. And pass the data to pipeline called Splunk-Route, Then Destination called Splunk:Splunk 

Pipeline Configuration:

Pipeline aim is to pass through the data by just adding a new field called cribl_flag with vaslue FromCribl to tag the events that is coming out of Cribl to be more easier to find in Splunk at EndTier.

Destination Configuration:

Finally, The destination it has been configured to send data to EndTier Splunk on  192.168.226.178 on Port 9998.

EndTier Splunk

EndTier is where the data will be indexed, in the current POC its a standalone server in other cases it may be the indexers layer servers. Below is the configuration used for the standalone server. 

inputs.conf

Configuration
1
2
3
4
5
[splunktcp://9997]
connection_host = ip
 
[splunktcp://9998]
connection_host = ip

Above configuration is to enable receiving data from the mentioned ports which reflects the connection from MidTier HF & MidTier Cribl Layer. 


Please comment down for any more details, discussions and recommendations. 


I would like to hear from you back related to next topics you recommend to present it to you.

  

References:

  1. https://community.splunk.com/t5/Getting-Data-In/Cloning-set-of-data-to-specified-Splunk-indexer/m-p/110405
  2. https://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Routeandfilterdatad

Comments

Post a Comment