Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Transcript of SemanticLogging
10 Query UPDATE xar_session_info SET xar_vars = 'XARSVuid|i:2;XARSVrand|i:89426315;XARSVuaid|s
:2:\"29\";XARSVbrowsername|s:9:\"Netscape6\";XARSVbrowserversion|s:3:\"5.0\";XARSVosname|s:7:\"Unknown\";XARSVosversion|s:7:\"Unknown\";XARSVnavigationLocale|s:11:\"en_US.utf-8\";SPLUNKAPP_IP|N;', xar_lastused = 1124407193 WHERE xar_sessid = 't2idg584t1co0scgj40qnnm’ 31 Connect firstname.lastname@example.org on cave
Jun 2 13:36:50 DEBUG: Setting NAT on RTP to 0
Jun 2 13:36:50 DEBUG: Check for res for 5008office
Jun 2 13:36:50 DEBUG: Call from user '5008office' is 1 out of 0
Jun 2 13:36:50 DEBUG: build_route: Contact hop: <sip:email@example.com:5060>
Jun 2 13:36:50 VERBOSE: -- Executing Macro("SIP/5008office-dfbd", ”
They can be overwhelming Registry
ps, iostat, top Web logs
Log4j, JMS, JMX
Code & Scripts Hypervisor
Performance They're hard to access They take up tons of space They can clog up the network Operations guys don't want developers logging in to their production systems You need rotation strategies
Compliance regulations may require years of archival Event Logs are Great Apr 29 19:13:01 188.8.131.52 SentriantGenericAlert: Time="04/29/06 07:12 PM PDT",Host="roach_motel.enet.interop.net",Category="fabric_network_activity",Generator="Response:Slow Scan",Type="NOTICE",Priority="High",Body="Appliance=roach_motel.enet.interop.net,Reporting Segment=ENET network,Action=Response disabled,Response=Slow Scan,Duration=90 seconds,Source Segment=Unprotected,Source IP=184.108.40.206,Source MAC=00:01:30:BC:93:90,Current Target Count=0"
Apr 29 19:13:01 220.127.116.11 SentriantGenericAlert: Time="04/29/06 07:12 PM PDT",Host="roach_motel.enet.interop.net",Category="fabric_network_activity",Generator="Response:Slow Scan",Type="NOTICE",Priority="High",Body="Appliance=roach_motel.enet.interop.net,Reporting Segment=ENET network,Action=Response disabled,Response=Slow Scan,Duration=69 seconds,Source Segment=Unprotected,Source IP=18.104.22.168,Source MAC=00:01:30:BC:93:90,Current Target Count=0"
Apr 29 19:13:01 22.214.171.124 SentriantGenericAlert: Time="04/29/06 07:12 PM PDT",Host="roach_motel.enet.interop.net",Category="fabric_network_activity",Generator="Response:Slow
They help you find problems "It took too long!" They contain a gold mine of information A definitive record of activity and behavior 10.2.1.44 - [25/Sep/2009:09:52:30 --0700] type=USER_LOGIN msg=audit (12588394.056: auid=429487984 msg='acct="TAYLOR": addr=10.2.1.48, terminal=sshd res=failed User IP Action Login Result Ensure system security
Meet compliance mandates Customer behavior and experience
Product and service usage
End-to-end transaction visibility Important insight for IT and the Business 10.2.1.80 - - [25/Jan/2010:09:52:30 -0700] "GET /petstore/product.screen?product_id=AV-CB-01 HTTP/1.1" 200 9967" http://petstore/category.screen?category_id=BIRD" "Mozilla. User IP Product Category Machine generated events are everywhere When interpreted, they
create real business value Event Generalities You have no control over other systems' events
You have full control over events that YOU write
Most events are written by developers to help them debug
Some events are written to form an audit trail "Semantic Logging" Events which are written explicitly for the
gathering of analytics * Designed to be consumed and processed by software Software for Analyzing
"Machine Data" would have the following characteristics Provides a "fabric" for acquiring, transporting and securing data Knows next to nothing about the data as it's consumed Allows the user to find the needle in the haystack Derives structure out of unstructured data as it's consumed Provides Aggregation and Visualization Tools Can operate in real-time Early vs. Late Structure Binding Early Structure Binding Late Structure Binding Schema - created at design time
Queries - understood at design time Data - must fit into tables (be converted)
Must match constraints Schema-less
Created at search time
Ad hoc Data - heterogeneous
No conversion required, no constraints Analytics Early Structure Binding Late Structure Binding Decide the question(s) you
want to ask Design the Schema Normalize the data and write
DB insertion code Create SQL & Feed into
Analytics Tool Write Semantic Events Collect Create Searches, Reports
and Graphs Create Human Readable Events* *For the most part Binary sounds good because it's compressed, but it requires decoding and will not segment Try not to use complex encoding
Use INFO, WARN, ERROR, DEBUG, etc.
Don't use XML or JSON - unless you really need multi-depth nesting
Keep multi-line events to a minimum Log in Text Make it easy for humans Categorize The farther you place the timestamp from the beginning,
the more difficult it is to tell it's a timestamp and not
Do not use time offsets Use human readable timestamps Favor the beginning of the line Log anything that can add value
when aggregated and/or
visualized* * Log more than just debugging events! Example Bogus Pseudo-Code void submitPurchase(purchaseId)
log.info("action=submitPurchaseStart, purchaseId=%d", purchaseId)
//these calls throw an exception on error
log.info("action=submitPurchaseCompleted, purchaseId=%d", purchaseId)
Logging Best Practices Graph purchase volume by hour, by day, by month
How long are purchases taking during different times of the day and different days of the week?
Are my purchases taking longer than they did last month?
Are my systems getting slower as they process more transactions?
How many purchases are failing? Graph these over time.
Which specific purchases are failing? Given the previous pseudo-code example: Use clear key/value pairs* "<timestamp> <level> key1=value1,key2=value2,key3=value3" Use clear key/value pairs Example (Good):
Log.debug(“orderstatus=error,errorcode=454, user=%d,transactionid=%s”, userId, transId)
Log.debug(“error %d 454 - %s ”, userId, transId)
Searching for "error" is way to vague
Software requires regular expressions to parse vs consistent single rule
Keys have to be assigned (not provided in the event) Break multi-value information into separate events Example (Good):
<TS> phonenumber=333-444-4444, app=angrybirds, installdate=xx/xx/xx
<TS> phonenumber=333-444-4444, app=facebook, installdate=yy/yy/yy
<TS> phonenumber=333-444-4444, apps=angrybirds,facebook
More difficult to add data for each multivalue
More difficult for software to parse Using header lines for keys is OK <TS>
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 41 21.9 1.7 3233968 143624 ?? Rs 7Jul11 48:09.67 /System/Library/foo
rdas 790 4.5 0.4 4924432 32324 ?? S 8Jul11 9:00.57 /System/Library/baz
Create periodic events that issue 'ps', 'iostat', etc.
Allows for adding performance metrics to your analytics
Software can interpret the column headers as keys and each line as values Log unique identifiers so you can follow transactions Carry unique identifiers through mutliple touchpoints if possible
Use transitive closure if you need to: Transaction transid=abcdef
otherid=qstuvw Quick Semantic Logging Demo Operational Best Practices Log locally to files
Use rotation policies - destroy or back up
Run agents provided by your software - provides elastic buffering - or else production applications can block! Event
Application Agent Machine
Engine Network Local log file Syslog is great for large volumes of low value data Obviously lossy
But has existing services on most systems
Won't block production applications Syslog NG and other TCP-based services are better But watch your configuration so events are dropped/buffered if the receiver is down Syslog can't handle multi-line events. Packet sizes are too small. The more you capture, the more visibility you have! Application logs
Configuration file changes
Performance data (iostat, vmstat, ps, etc)
Anything that has a time component Thank You!
Questions? Developer Best Practices Treat your Machine Data Engine as part of your software stack Use your Machine Data Engine as your Analytics System Your code isn't considered "delivered" until you have built analytics that support it!