Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Data Mining with Splunk

discovering interesting things in and about your data
by

David Carasso

on 15 August 2012

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Data Mining with Splunk

You've been thrown some events from data you aren't intimately familiar with... CLUSTER EVENTS Data Mining with Splunk> What type of events do I have?
What are their fields?
How do events relate to each other?
How do I to detect anomalous events? David Carasso
Chief Mind
Splunk> Mar 7 12:40:01 willLaptop crond(pam_unix)[10696]: session opened for user root by (uid=0)
Mar 7 12:40:01 willLaptop crond(pam_unix)[10695]: session closed for user root
Mar 7 12:40:02 willLaptop crond(pam_unix)[10696]: session closed for user root
Mar 7 12:44:47 willLaptop gconfd (root-10750): starting (version 2.10.0), pid 10750 user 'root'
Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config...
Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position
Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration ...
Mar 7 12:45:01 willLaptop crond(pam_unix)[10754]: session opened for user root by (uid=0)
Mar 7 12:45:02 willLaptop crond(pam_unix)[10754]: session closed for user root
....
Mar 7 14:20:02 willLaptop crond(pam_unix)[12161]: session closed for user root
Mar 7 14:22:44 willLaptop ntpd[2654]: synchronized to LOCAL(0), stratum 10
Mar 7 14:23:49 willLaptop ntpd[2654]: synchronized to 194.109.223.180, stratum 3
Mar 7 14:23:49 willLaptop ntpd[2654]: synchronized to 82.148.138.26, stratum 2
Mar 7 14:25:01 willLaptop crond(pam_unix)[12206]: session opened for user root by (uid=0)
...
Mar 7 16:20:03 willLaptop crond(pam_unix)[13439]: session closed for user root
Mar 7 16:21:19 willLaptop ntpd[2654]: synchronized to LOCAL(0), stratum 10
Mar 7 16:22:21 willLaptop ntpd[2654]: synchronized to 194.109.223.180, stratum 3
Mar 7 16:22:22 willLaptop ntpd[2654]: synchronized to 82.148.138.26, stratum 2
Mar 7 16:25:01 willLaptop crond(pam_unix)[13488]: session opened for user root by (uid=0)
Mar 7 16:25:02 willLaptop crond(pam_unix)[13488]: session closed for user root
Mar 7 16:27:41 willLaptop dhclient: DHCPREQUEST on eth0 to 10.1.1.50 port 67
Mar 7 16:27:41 willLaptop dhclient: DHCPACK from 10.1.1.50
Mar 7 16:27:41 willLaptop dhclient: bound to 10.1.1.194 -- renewal in 6953 seconds.
...
Show 3 examples from each cluster, from most common cluster to least:

…| cluster labelonly=t showcount=t
| sort -cluster_count, cluster_label, _time
| dedup 3 cluster_label Group events by most common first 7 punctuation characters:

…| rex field=punct "(?P<smallpunct>.{7})"
| eval smallpunct= "*" + smallpunct
| stats first(_raw) as example count by smallpunct
| sort -count cluster_count cluster_label raw
------------- ------------- ----------------------------------------------------------------------------------------------------------------------
1339 3 Mar 7 11:05:01 willLaptop crond(pam_unix)[6785]: session opened for user root by (uid=0)
1339 3 Mar 7 11:10:01 willLaptop crond(pam_unix)[17659]: session opened for user root by (uid=0)
1339 3 Mar 7 11:10:01 willLaptop crond(pam_unix)[17656]: session opened for user root by (uid=0)
1324 2 Mar 7 11:05:02 willLaptop crond(pam_unix)[6785]: session closed for user root
1324 2 Mar 7 11:10:01 willLaptop crond(pam_unix)[17656]: session closed for user root
1324 2 Mar 7 11:10:02 willLaptop crond(pam_unix)[17659]: session closed for user root
136 13 Mar 7 20:05:08 willLaptop kernel: SELinux: initialized (dev selinuxfs, type selinuxfs), uses genfs_contexts
136 13 Mar 7 20:05:09 willLaptop kernel: SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
136 13 Mar 7 20:05:09 willLaptop kernel: SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts
117 23 Mar 7 20:05:08 willLaptop kernel: ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11
117 23 Mar 7 20:05:08 willLaptop kernel: ACPI: PCI Interrupt 0000:00:1f.6[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11
117 23 Mar 7 20:05:09 willLaptop kernel: ACPI: PCI Interrupt 0000:02:02.0[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11
99 314 Mar 7 20:05:11 willLaptop kernel: ide: failed opcode was: unknown
99 314 Mar 7 20:05:11 willLaptop kernel: ide: failed opcode was: unknown
99 314 Mar 7 22:39:08 willLaptop kernel: ide: failed opcode was: unknown
72 221 Mar 7 20:05:07 willLaptop kernel: ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 9 10 *11)
72 221 Mar 7 20:05:07 willLaptop kernel: ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 9 10 11) *0, disabled.
72 221 Mar 7 20:05:07 willLaptop kernel: ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11) *0, disabled.
72 319 Mar 7 22:39:08 willLaptop kernel: hdc: status error: status=0x00 { }
72 319 Mar 7 22:39:08 willLaptop kernel: hdc: status error: status=0x00 { }
... smallpunct example
---------- ----------------------------------------------------------------------------------------------------------------------------
*__::__( Mar 10 16:50:02 willLaptop crond(pam_unix)[9639]: session closed for user root
*__::__: Mar 10 15:30:25 willLaptop dhclient: bound to 10.1.1.194 -- renewal in 5788 seconds.
*__::__[ Mar 10 16:46:32 willLaptop ntpd[2544]: synchronized to 138.23.180.126, stratum 2
*__::___ Mar 10 10:47:59 willLaptop gconfd (will-3271): Resolved address "xml:readwrite:/home/will/.gconf" to ...
*____::_ Wed Nov 16 2011 18:51:14 UTC LcumN=031439422 lineN=000210534 45.2.0.1 NOC-FWa: NetScreen ...
*__::__- Mar 10 10:47:41 willLaptop gdm-binary[3130]: Couldn't authenticate user
*__::__. Mar 10 10:45:43 willLaptop rpc.statd[2123]: Version 1.0.7 Starting ddd $!%%#!@$%! Fields Correlation

Given a new relatively unknown source, you can discover patterns in extracted fields but looking at the correlation table, which shows patterns of co-occurring fields. A 1.0 means two fields always co-occur. For example, Component and Log_Level always co-occur in splunkd.log. You can filter out fields to make this table more manageable.
...| fields - date* source* time* | correlate


RowField C CN Component Context L ...
------------------------ ---- ---- --------- ------- ----
C 1.00 1.00 0.00 0.00 1.00
CN 1.00 1.00 0.00 0.00 1.00
Component 0.00 0.00 1.00 0.06 0.00
Context 0.00 0.00 0.06 1.00 0.00
L 1.00 1.00 0.00 0.00 1.00
Log_Level 0.00 0.00 1.00 0.06 0.00
Message 0.00 0.00 1.00 0.06 0.00
ST 1.00 1.00 0.00 0.00 1.00
Text 0.00 0.00 0.05 0.80 0.00
bytes 0.00 0.00 0.00 0.00 0.00
clientName 0.00 0.00 0.00 0.00 0.00
dir 0.00 0.00 0.00 0.00 0.00
disabled 0.00 0.00 0.00 0.00 0.00
endpoint 0.00 0.00 0.00 0.00 0.00
et 0.00 0.00 0.01 0.00 0.00
et_lt_span_flush_lru 0.00 0.00 0.00 0.00 0.00
fn 0.00 0.00 0.00 0.00 0.00
host 0.00 0.00 1.00 0.06 0.00
id 0.00 0.00 0.02 0.00 0.00
lableonly 0.00 0.00 0.00 0.00 0.00
linecount 0.00 0.00 1.00 0.06 0.00
lt 0.00 0.00 0.01 0.00 0.00
n 0.00 0.00 0.00 0.00 0.00
nextId 0.00 0.00 0.00 0.00 0.00
openDatabases 0.00 0.00 0.01 0.00 0.00
phoneHomeIntervalInSecs 0.00 0.00 0.00 0.00 0.00
reloadDSOnAppInstall 0.00 0.00 0.00 0.00 0.00
repositoryLocation 0.00 0.00 0.00 0.00 0.00
ts 0.00 0.00 0.01 0.00 0.00
waitInSecsBetweenRetries 0.00 0.00 0.00 0.00 0.00
workingDir 0.00 0.00 0.00 0.00 0.00

For example, Component and Log_Level always co-occur in splunkd.log. Fields We can use 'top' to look at the most common Log_Level by Component...

... | top Log_Level by Component

Component Log_Level count percent
---------------------------------- --------- ----- ----------
AdminManager WARN 1 100.000000
DatabaseDirectoryManager WARN 153 100.000000
DateParserVerbose WARN 262 100.000000
DedupProcessor ERROR 1 100.000000
DeploymentClient DEBUG 60 85.714286
DeploymentClient WARN 5 7.142857
DeploymentClient INFO 5 7.142857
DispatchCommand ERROR 2 100.000000
DispatchSearch WARN 1 100.000000
FileClassifierManager WARN 210 100.000000 Contingency

Alternatively, we can build a contingency table of Log_Level and Component values to more conveniently see the patterns.

... | contingency Log_Level Component maxcols=5

Log_Level SavedSplunker databasePartitionPolicy loader DateParserVerbose FileClassifierManager timeinvertedIndex TOTAL
--------- ------------- ----------------------- ------ ----------------- --------------------- ----------------- -----
WARN 1763 36 2 262 210 0 2504
INFO 0 391 352 0 0 198 1334
DEBUG 0 0 0 0 0 0 163
ERROR 0 0 0 0 0 0 16
TOTAL 1763 427 354 262 210 198 4017

From the above table, we can see that the SavedSplunker component only outputs WARN Log_Levels, while the loader component almost always outputs INFO Log_Levels. field value associations

The associate command can be used to automatically deduce the sort of conclusions we reached above; namely, that SavedSplunker only outputs WARN, while loader almost always outputs INFO. | associate Log_Level Component '
Reference_Key Reference_Value Target_Key Support Unconditional_Entropy Conditional_Entropy Entropy_Improvement Top_Conditional_Value
------------- ------------------------ ---------- ------- --------------------- ------------------- ------------------- ---------------------------------
Component DatabaseDirectoryManager Log_Level 34.67% 1.182 0.000 1.182201 WARN (62.25% -> 100.00%)
Component DateParserVerbose Log_Level 58.60% 1.182 0.000 1.182201 WARN (62.25% -> 100.00%)
Component FileClassifierManager Log_Level 46.97% 1.182 0.000 1.182201 WARN (62.25% -> 100.00%)
Component HotDBManager Log_Level 38.25% 1.182 0.000 1.182201 INFO (33.15% -> 100.00%)
Component SavedSplunker Log_Level 394.31% 1.182 0.000 1.182201 WARN (62.25% -> 100.00%)
Component databasePartitionPolicy Log_Level 95.50% 1.182 0.417 0.765017 INFO (33.15% -> 91.57%)
Component loader Log_Level 79.17% 1.182 0.050 1.131883 INFO (33.15% -> 99.44%)
Component timeinvertedIndex Log_Level 44.28% 1.182 0.000 1.182201 INFO (33.15% -> 100.00%)
Log_Level DEBUG Component 12.15% 2.993 1.126 1.866713 ShutdownHandler (4.77% -> 60.12%)
Log_Level WARN Component 186.75% 2.993 1.574 1.419071 SavedSplunker (43.81% -> 70.38%) This shows that before we know the component is SavedSplunker, the odds of a WARN Log_Level is 62.25%; afterwords, the odds are 100%. Before we know the component is loader, the odds of INFO Log_Level is 33.15%; afterwards, 99.44%.
Full transcript