Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

SemanticLogging

No description
by

Rob Das

on 29 September 2011

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of SemanticLogging

Event Logs Suck They have no real structure 050818 16:19:31 2 Query UPDATE xar_session_info SET xar_vars = 'XARSVuid|i:2;XARSVrand|i:343223999;XARSVuaid|s:2:\"29\";XARSVbrowsername|s:9:\"Netscape6\";XARSVbrowserversion|s:3:\"5.0\";XARSVosname|s:7:\"Unknown\";XARSVosversion|s:7:\"Unknown\";XARSVnavigationLocale|s:11:\"en_US.utf-8\";SPLUNKAPP_IP|N;', xar_lastused = 1124407171 WHERE xar_sessid ='ll7joq442223fl6h07v3f3vpd2’

10 Query UPDATE xar_session_info SET xar_vars = 'XARSVuid|i:2;XARSVrand|i:89426315;XARSVuaid|s

:2:\"29\";XARSVbrowsername|s:9:\"Netscape6\";XARSVbrowserversion|s:3:\"5.0\";XARSVosname|s:7:\"Unknown\";XARSVosversion|s:7:\"Unknown\";XARSVnavigationLocale|s:11:\"en_US.utf-8\";SPLUNKAPP_IP|N;', xar_lastused = 1124407193 WHERE xar_sessid = 't2idg584t1co0scgj40qnnm’ 31 Connect caveuser@web2.int.splunk.com on cave

Jun 2 13:36:50 DEBUG[1826]: Setting NAT on RTP to 0

Jun 2 13:36:50 DEBUG[1826]: Check for res for 5008office

Jun 2 13:36:50 DEBUG[1826]: Call from user '5008office' is 1 out of 0

Jun 2 13:36:50 DEBUG[1826]: build_route: Contact hop: <sip:5008office@10.1.1.132:5060>

Jun 2 13:36:50 VERBOSE[10887]: -- Executing Macro("SIP/5008office-dfbd", ”

They can be overwhelming Registry
Event logs
File system
sysinternals Configurations
Audit/Query logs
Tables
Schemas Configurations
syslog
SNMP
Netflow Configurations
syslog
/var/log
ps, iostat, top Web logs
Log4j, JMS, JMX
.NET events
Code & Scripts Hypervisor
Guest OS
Guest Apps
Performance They're hard to access They take up tons of space They can clog up the network Operations guys don't want developers logging in to their production systems You need rotation strategies
Compliance regulations may require years of archival Event Logs are Great Apr 29 19:13:01 45.2.98.7 SentriantGenericAlert: Time="04/29/06 07:12 PM PDT",Host="roach_motel.enet.interop.net",Category="fabric_network_activity",Generator="Response:Slow Scan",Type="NOTICE",Priority="High",Body="Appliance=roach_motel.enet.interop.net,Reporting Segment=ENET network,Action=Response disabled,Response=Slow Scan,Duration=90 seconds,Source Segment=Unprotected,Source IP=88.73.39.200,Source MAC=00:01:30:BC:93:90,Current Target Count=0"
Apr 29 19:13:01 45.2.98.7 SentriantGenericAlert: Time="04/29/06 07:12 PM PDT",Host="roach_motel.enet.interop.net",Category="fabric_network_activity",Generator="Response:Slow Scan",Type="NOTICE",Priority="High",Body="Appliance=roach_motel.enet.interop.net,Reporting Segment=ENET network,Action=Response disabled,Response=Slow Scan,Duration=69 seconds,Source Segment=Unprotected,Source IP=68.163.20.95,Source MAC=00:01:30:BC:93:90,Current Target Count=0"
Apr 29 19:13:01 45.2.98.7 SentriantGenericAlert: Time="04/29/06 07:12 PM PDT",Host="roach_motel.enet.interop.net",Category="fabric_network_activity",Generator="Response:Slow
They help you find problems "It took too long!" They contain a gold mine of information A definitive record of activity and behavior 10.2.1.44 - [25/Sep/2009:09:52:30 --0700] type=USER_LOGIN msg=audit (12588394.056: auid=429487984 msg='acct="TAYLOR": addr=10.2.1.48, terminal=sshd res=failed User IP Action Login Result Ensure system security
Meet compliance mandates Customer behavior and experience
Product and service usage
End-to-end transaction visibility Important insight for IT and the Business 10.2.1.80 - - [25/Jan/2010:09:52:30 -0700] "GET /petstore/product.screen?product_id=AV-CB-01 HTTP/1.1" 200 9967" http://petstore/category.screen?category_id=BIRD" "Mozilla. User IP Product Category Machine generated events are everywhere When interpreted, they
create real business value Event Generalities You have no control over other systems' events
You have full control over events that YOU write
Most events are written by developers to help them debug
Some events are written to form an audit trail "Semantic Logging" Events which are written explicitly for the
gathering of analytics * Designed to be consumed and processed by software Software for Analyzing
"Machine Data" would have the following characteristics Provides a "fabric" for acquiring, transporting and securing data Knows next to nothing about the data as it's consumed Allows the user to find the needle in the haystack Derives structure out of unstructured data as it's consumed Provides Aggregation and Visualization Tools Can operate in real-time Early vs. Late Structure Binding Early Structure Binding Late Structure Binding Schema - created at design time
Queries - understood at design time Data - must fit into tables (be converted)
Must match constraints Schema-less
Created at search time
Ad hoc Data - heterogeneous
Constantly changing
No conversion required, no constraints Analytics Early Structure Binding Late Structure Binding Decide the question(s) you
want to ask Design the Schema Normalize the data and write
DB insertion code Create SQL & Feed into
Analytics Tool Write Semantic Events Collect Create Searches, Reports
and Graphs Create Human Readable Events* *For the most part Binary sounds good because it's compressed, but it requires decoding and will not segment Try not to use complex encoding
Use INFO, WARN, ERROR, DEBUG, etc.
Don't use XML or JSON - unless you really need multi-depth nesting
Keep multi-line events to a minimum Log in Text Make it easy for humans Categorize The farther you place the timestamp from the beginning,
the more difficult it is to tell it's a timestamp and not
other data
Do not use time offsets Use human readable timestamps Favor the beginning of the line Log anything that can add value
when aggregated and/or
visualized* * Log more than just debugging events! Example Bogus Pseudo-Code void submitPurchase(purchaseId)
{
log.info("action=submitPurchaseStart, purchaseId=%d", purchaseId)

//these calls throw an exception on error
submitToCreditCard(...)
generateInvoice(...)
generateFullfillmentOrder(...)

log.info("action=submitPurchaseCompleted, purchaseId=%d", purchaseId)
}

Logging Best Practices Graph purchase volume by hour, by day, by month

How long are purchases taking during different times of the day and different days of the week?

Are my purchases taking longer than they did last month?
Are my systems getting slower as they process more transactions?

How many purchases are failing? Graph these over time.

Which specific purchases are failing? Given the previous pseudo-code example: Use clear key/value pairs* "<timestamp> <level> key1=value1,key2=value2,key3=value3" Use clear key/value pairs Example (Good):
Log.debug(“orderstatus=error,errorcode=454, user=%d,transactionid=%s”, userId, transId)
Example (Bad):
Log.debug(“error %d 454 - %s ”, userId, transId)

Searching for "error" is way to vague
Software requires regular expressions to parse vs consistent single rule
Keys have to be assigned (not provided in the event) Break multi-value information into separate events Example (Good):
<TS> phonenumber=333-444-4444, app=angrybirds, installdate=xx/xx/xx
<TS> phonenumber=333-444-4444, app=facebook, installdate=yy/yy/yy
Example (Bad):
<TS> phonenumber=333-444-4444, apps=angrybirds,facebook

More difficult to add data for each multivalue
More difficult for software to parse Using header lines for keys is OK <TS>
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 41 21.9 1.7 3233968 143624 ?? Rs 7Jul11 48:09.67 /System/Library/foo
rdas 790 4.5 0.4 4924432 32324 ?? S 8Jul11 9:00.57 /System/Library/baz
Create periodic events that issue 'ps', 'iostat', etc.
Allows for adding performance metrics to your analytics
Software can interpret the column headers as keys and each line as values Log unique identifiers so you can follow transactions Carry unique identifiers through mutliple touchpoints if possible
Use transitive closure if you need to: Transaction transid=abcdef
transid=abcdef, otherid=qstuvw
otherid=qstuvw Quick Semantic Logging Demo Operational Best Practices Log locally to files
Use rotation policies - destroy or back up
Run agents provided by your software - provides elastic buffering - or else production applications can block! Event
Producing
Application Agent Machine
Data
Engine Network Local log file Syslog is great for large volumes of low value data Obviously lossy
But has existing services on most systems
Won't block production applications Syslog NG and other TCP-based services are better But watch your configuration so events are dropped/buffered if the receiver is down Syslog can't handle multi-line events. Packet sizes are too small. The more you capture, the more visibility you have! Application logs
Database logs
Network logs
Configuration file changes
Performance data (iostat, vmstat, ps, etc)
Anything that has a time component Thank You!
Questions? Developer Best Practices Treat your Machine Data Engine as part of your software stack Use your Machine Data Engine as your Analytics System Your code isn't considered "delivered" until you have built analytics that support it!
Full transcript