Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

MySQL Schema and Query Optimization

No description
by

Or Wolfstein

on 29 December 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of MySQL Schema and Query Optimization

Choosing Optimal Data Types
Architecture
Indexing
Indexing is very important to DB optimization and its better to have no Index than a bad on
Auto Generated Schema
The foundation of the Database.
Choosing Optimal Data Types
Schema Optimization and correct indexing can improve performance by orders of magnitude

Understanding the product demands can improve scaling and performance
Schema Optimization
MySQL Schema and Query Optimization
The information discussed here is taken from the book "High performance MySQL" by Oreilly


There is no one way to do things, the aim of this presentation is to bring you new information that you can later use how ever you see fit in each specific situation and is not in any way "The only way"
Before we start
The fine print!
Connection Handler

Each connection get a unique thread
and is being authorized for privileges
Cache
The cache saves queries and their result sets.

If a query was detected, no other action is required than returning the result set will happen

Parsing and Optimization
Create internal structure (parse tree)
Optimizations: Rewrite, read order, index pick…

myISAM
InnoDB
Storage engine
Each database engine is different, different tables might perform better under different engines
Every DB change has a trade off
Thinking out of the box like OutBrain and Wix
Smaller and Simpler is better:
Less space on HDD,RAM,CPU means fewer CPU cycles
Comparing Int < comparing String
Increasing the size of a column is easier than lowering it
Null able data types
Avoid using null when possible. e.g. define fields as not null when ever possible
If a field has a default value that is applicable. USE IT.
Example: An age cannot be smaller than 0 (Or 1 how ever you want)
It is harder for MYSQL to optimize queries on column that allow NULL values because it will need to apply more rules to the query (More if statements...)
Performance impact: usually small. Do not update existing schema
Integers
They are the most lightweight and querying by Integer will have the highest performance.
It is a good practice to have an auto incremented integer field in every column (except for join tables)

Interesting fact:
Setting the width of the integer column is
useless
. It does not restrict the range of the value and only used by tools.
Meaning: Int (1) and Int (10) is the same
Strings
Varchar
Uses only as much space as it needs
Adds 1,2 bytes of space to the row to calculate space
Can cause performance problems on columns that are modified often due to fragmentation
MySQL often allocates full size memory blocks which takes memory and is bad for sorting operations
Char
Fixed size
Best used with the same size inputs (Like: MD5)
Does not need to be fragmented
VS
Enums
Enums are stored as Integers in the database and are very compact.
There is also a lookup table for converting from Int-to-String
Join Benchmark table


Test


Queries per second
VARCHAR joined to VARCHAR 2.6
VARCHAR joined to ENUM 1.7
ENUM joined to VARCHAR 1.8
ENUM joined to ENUM 3.5
Can cause severe performance problems
Takes the worst possible scenarios like:
Large var char fields
Messy column order
ORM frameworks encapsulate the DB in code but are very poor in using the Databases strengths
Query Optimization
Analyze poorly performing queries in 2 steps
Step 1: Are you getting more data than needed
Are you using the "Limit" clause when applicable (Pagination)
Fetching all columns from multiple join:

Consider the following:
SELECT * FROM a INNER JOIN b ... INNER JOIN c

will return all columns from all 3 tables

Instead of:
SELECT a.* FROM ....

will return only columns from table "a"
Step 2: Is MYSQL examining to much data
Query cost metrics
Execution time
# Rows examined
# Rows returned
slow_query_log
which logs query information on queries that takes more time than
long_query_time
(10 by default)
The Explain command
EXPLAIN SELECT * FROM film_actor WHERE film_id = 1
When film_id is indexed
******************* 1. row *******************
id: 1
select_type: SIMPLE
table: film_actor
type: ref
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: const
rows: 10
Extra:
When film_id is not indexed
******************* 1. row *******************
id: 1
select_type: SIMPLE
table: film_actor
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5073
Extra: Using where
Restructuring queries
Is it better to have one complex query over many small ones?
The traditional approach:
Historicity true
Prefers complex queries
High cost of network
High cost of connecting to the database
Many queries approach:
Network connections are cheap
DB connection is efficient
Caching can save queries!
Allows for code reuse
Complex queries are still better over the network because it is faster to read from memory
Chopping up a query
In big and intense queries it might be a good idea to chop them up into smaller pieces
Example:
In a delete operation of 10,000 rows it will probably be better to LIMIT the query to 10 queries of 1000 each to not lock the database/tables for to long
Join decomposition
For example, instead of this single query:
mysql> SELECT * FROM tag
-> JOIN tag_post ON tag_post.tag_id=tag.id
-> JOIN post ON tag_post.post_id=post.id
-> WHERE tag.tag='mysql';

You might run these queries:
mysql> SELECT * FROM tag WHERE tag='mysql';
mysql> SELECT * FROM tag_post WHERE tag_id=1234;
mysql> SELECT * FROM post WHERE post.id in (123,456,567,9098,8904);
Doing joins in the application may be more efficient when:
• You cache and reuse a lot of data from earlier queries
• You use multiple MyISAM tables
• You distribute data across multiple servers
• You replace joins with IN( ) lists on large tables
• A join refers to the same table multiple times
De-Fragmentation is only done manually

My ISAM


InnoDB
Required full text Search Yes Yes

Require Transactions Yes

frequent select queries Yes

frequent insert,update,delete Yes

Row Locking (multi processing on single table) Yes

Relational base design Yes


My ISAM VS innoDB
Types of indexes
So what is it?!
Indexes in MYSQL and like indexes in a book, you search in a compact set of data the subject the you want, and all the pages that relate to that index are directly connected to it
For example:
consider the table "
Actors
" with an index on actor_name. a search query will need to find the key "
Leo
" once and all the rows will have
direct access
from there
B-TREE
The default index
Uses a B-TREE structure
Nodes are keys and each leaf is the same distance from the root
Does not have to scan the whole table to find the data
Useful only if the look up uses the leftmost prefix of the index key(
last_name
, first_name) or the entire index
Hash
Built on Hash tables
Can build your own hash index (if engine doesn't support)
Useful only for exact lookups that use every column in the key
Full text
Find words in text instead of comparing values to the value of the index
This index has many types of subtleties and is mainly for
MATCH-AGAINST queries
Indexing: Advantages and Disadvantages
Advantages
Disadvantages
Indexes make search queries much faster
Reduces the amount of data the server needs to examine
Indexes like primary key index and unique index help to avoid duplicate row data
Helps the server avoid sorting and temporary tables
Indexes turn random I/O to Sequential I/O.
A Dictionary table with index by name will sort the values on the I/O
Text searching: Full-text indexes have the opportunity to optimize searching against even large amounts of text
Creating an index is followed by creating a sorted table containing the index fields
Takes up disk space and if every column is indexed will grow faster than then table
slow down the speed of writing queries, such as INSERT, UPDATE and DELETE,
because every time a record is changed, the indexes must be updated
Optimizing specific types of queries
Optimizing count( )
How to count
Counting values:
Counting rows:
count (field_id) will only count values that are NOT null
count (*) will count rows instead of values.
Example of simple optimization
Consider the following:
SELECT COUNT(*) FROM world.City WHERE ID
>
5;
If you examine this query you will see it scans 4,079 rows
Optimized query:
SELECT (SELECT COUNT(*) FROM world.City) - COUNT(*) FROM world.City where ID
<=
5
If you examine the optimized query you will reduce it to 5 rows
MyISAM if very fast for the count(*) because it stores the number of rows in each table
Join optimization
To build a good join query, you need to build the query tree in such a way that it will eliminate as many rows as possible early on due to its DFS nature
Use inner joins before outer joins because MYSQL cannot optimize inner joins and there is relevance to the order of the joins
Make sure there are indexes on the columns in the
ON
or
USING
clause
Try to make
GROUP BY
and
ORDER BY
to refer to columns from a single table so that indexes can be used
Optimizing LIMIT
A common problem:
In pagination you will often see queries that end with LIMIT 10000 , 20. Which means its generating 10020 rows and discarding 10000 of them.
Consider the following query:
SELECT film_id, description FROM sakila.film ORDER BY title LIMIT 50 , 5
A better query will be written as follows:
SELECT film.film_id, film.description
FROM sakila.film INNER JOIN ( SELECT film_id FROM sakila.film ORDER BY title LIMIT 50, 5) AS film USING (film_id)
This will work because the server will examine the data in the index with out accessing the rows
Thank you!
Full transcript