Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Magento Performance: Know Your Database

No description
by

Ivan Chepurnyi

on 12 February 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Magento Performance: Know Your Database

M* Perfomance
Questions?
@IvanChepurnyi
Know Your Database
* - Applicable for M2, M1 and MySQL
ivan@ecomdev.org
MySQL like Magento, it needs gentle approach to its features
It requires the following:
Analyze slow queries
Can we kill MySQL server?
Can we execute it with non privileged user?
Can we do it by a single query?
The "Winners"
INSERT INTO ... FROM SELECT ..
UPDATE ... JOIN ... JOIN ... SET ...
Why?
MySQL creates a temporary table, that may go into swap
If your swap is low (majority of hosts), it uses all of it in matter of seconds
Your server is down, ssh is not available
1. Ranged Queries
2. Rise Memory / Swap Size
Ranged Queries
1. You just need to find min / max pair and total number of primary key matches
2. Find your perfect batch size (~ 20k)
3. Execute same query by moving forward with batch size
Solution?
Insert Query
INSERT INTO `entity_flat`
(`entity_id`, `scope_id`)
SELECT `main`.`entity_id`, '4' AS `scope_id`
FROM `entity` AS `main`
WHERE
(main.entity_id >=
1
) AND (main.entity_id <
20001
)
Update Query
UPDATE entity_flat AS main
INNER JOIN entity_varchar AS `attribute` ON
attribute.entity_id = main.entity_id
AND attribute.attribute_id = 1
AND attribute.scope_id = 0
SET `main`.`firstname` = `attribute`.`value`
WHERE (main.scope_id = 4) AND
(main.entity_id >=
1
) AND (main.entity_id <
20001
)
Why is Better?
1. It works faster on huge data-sets
2. It consumes less memory on MySQL side
3. It has predictable timeframe based on data-set size
Some boring numbers...
My Data-set
Generator Based on M2 Framework base
Database of 1k, 10k, 100k and 1m EAV like records
Flat indexer based on ranged queries
Use MySQL memory carefully
Know its tricks
Trick #1
Range Query
SELECT
MIN(entity_id)
,
MAX(entity_id)
FROM entity
Flat Data Generation
Flat Index Data Purpose
Filter Data
Sort Data
Fast selection with indexed where condition
Fast ordering with indexed field order
Flat Data's main purpose is not a retrieval
Trick #2
Simplified Queries
1. Separate JOINs into separate selects, that are based on main select primary key
2. Try to minimize JOIN conditions within simplified one
3. Search for easy override priority for your case
Main Query
SELECT
entity_id
FROM entity LIMIT 0, 100
Data Query
SELECT
entity_id, attribute_id, scope_id
FROM entity_varchar IN(1, 2, 3 ....) ORDER BY
scope_id
Bad Query
SELECT `main`.`entity_id`,
IF(notes_scope.value_id IS NOT NULL, notes_scope.value, notes_default.value) AS `notes`,
...
FROM `entity` AS `main`

LEFT JOIN `entity_text` AS `notes_default` ON ...
LEFT JOIN `entity_text` AS `notes_scope` ON ...
LIMIT 0, 50
Why is Better?
1. It doesn't create a temporary table
2. It uses index instead of a table scan
3. It works with only small subset of data, instead of joining two million records tables
Join vs Separate Query
Trick #3
Range Limit
1. Separate LIMIT from main select query.
3. Retrieve data from main query by LIMIT query primary key order
Limit Query
SELECT
entity_id
FROM entity_flat
WHERE is_active = 1
ORDER BY firstname
LIMIT
0, 100
2. Apply sort order of main query to limit and remove it from main one
Main Query
SELECT .... FROM `entity` AS `main`
LEFT JOIN ...
WHERE main.entity_id IN(
1, 2, 3...
)
Why is Better?
1. It allows to access last pages on huge data-sets
2. It does a result-set scan only on one field, the primary key
Regular vs Range Limit
Trick #4
Memory Table
1. In case of huge result-set, create a memory table with primary key values from main select
2. Join that memory table into data select
3. On _destruct() drop all created memory tables
Create a Table
$table = $this->getConnection()->newTable(
uniqid('primary_key')
);
$table->addColumn('
id
',
Table::TYPE_INTEGER
, null, [
'unsigned' => true, 'primary' => true
]);
$table->setOption('type',
Mysql::ENGINE_MEMORY
);

$this->getConnection()->createTable($table);

Populate Table
$this->getConnection()->insertArray(
$table->getName(),
['id'],
$ids
);
Join Table
$select->join(
['id' =>
$table->getName()
],
'main.entity_id = id.entity_id',
[]
);
When is Better?
When the IN condition may contain more then 100 items
IN vs JOIN (10k)
IN vs JOIN (100k)
IN vs JOIN (1m)
But there should be as smallest gap as possible in between identifier values
IN vs JOIN (100k
rand
)
IN vs JOIN (1m
rand
)
Benchmark Code
https://github.com/EcomDev/mysql-performance-benchmark
Full transcript