0

I have this table:

id, heartbeat_at, data_json, ...

The column heartbeat_at is frequently updated with this query:

UPDATE table SET heartbeat_at = NOW() WHERE id IN (...)

The data_json column is much less frequently updated, but it contains a lot of data (5-50kb per row)

I am wondering if this combination of small, frequently updated column with large, rarely updated column is responsible for high CPU usage of writes.

Does MySQL need to rewrite the entire row, even if I only update heartbeat_at? If so, would it be better if data_json lives in a separate table?

2
  • Please edit your question to show (as text, not image) output of show create table yourtablename Commented Jun 29 at 15:08
  • There are many factors that determine the CPU usage, like the exact query, the available indexes, the storage engine and file system used, and many many more. The best thing is to see what happens when you put data_json in a separate table. Commented Jun 29 at 15:49

2 Answers 2

1

Assuming InnoDB 8.0 or later, which is the currently supported default storage engine.

Creating a new record version does not necessarily create a new copy of a LOB.

Refer to this worklog for MySQL 8.0: https://dev.mysql.com/worklog/task/?id=8960

After this worklog, many versions of clustered index record can point to the same LOB.

I experimented to see if updating a record with a large JSON LOB used more space.

I created a table with a 1MB JSON LOB.

CREATE TABLE mytable (
  id SERIAL,
  counter INT DEFAULT 0,
  data JSON
);

INSERT INTO mytable (data) VALUES (CONCAT('"', REPEAT('A', 1024*1024), '"'));

I used Jeremy Cole's innodb_ruby tool to inspect the tablespace for a test table.

innodb_space -f mytable.ibd space-page-type-regions
start       end         count       type                
0           0           1           FSP_HDR             
1           1           1           IBUF_BITMAP         
2           2           1           INODE               
3           3           1           SDI                 
4           4           1           INDEX               
5           5           1           LOB_FIRST           
6           15          10          LOB_DATA            
16          16          1           LOB_INDEX           
17          36          20          LOB_DATA            
37          63          27          FREE (ALLOCATED)    
64          97          34          LOB_DATA            
98          127         30          FREE (ALLOCATED)    

I updated the counter column in mytable repeatedly, but the number of pages allocated for LOB_DATA did not change.

mysql> update mytable set counter = counter + 1;
...repeat many times...

But as soon as I changed the JSON LOB:

mysql> update mytable set data = concat('"', repeat('B', 1024*1024), '"');

Then new space was allocated for LOB data, and the original LOB data pages were now free.

innodb_space -f mytable.ibd space-page-type-regions
start       end         count       type                
0           0           1           FSP_HDR             
1           1           1           IBUF_BITMAP         
2           2           1           INODE               
3           3           1           SDI                 
4           4           1           INDEX               
5           5           1           FREE (LOB_FIRST)    
6           15          10          FREE (LOB_DATA)     
16          16          1           FREE (LOB_INDEX)    
17          36          20          FREE (LOB_DATA)     
37          63          27          FREE (ALLOCATED)    
64          97          34          FREE (LOB_DATA)     
98          98          1           LOB_FIRST           
99          108         10          LOB_DATA            
109         109         1           LOB_INDEX           
110         163         54          LOB_DATA            
164         383         220         FREE (ALLOCATED)    

This shows that LOB pages are not copied if you update some other column in the row, but they are copied (and old row versions are delete-marked) when you update the LOB.

Sign up to request clarification or add additional context in comments.

1 Comment

Since this feature is part of a MySQL 8.0 worklog, I infer that MySQL 5.x did require a whole row copy when you do an update.
0

Based on my analysis of the MySQL codebase, I can answer your question about row update behavior in MySQL 8.0.41.

Does MySQL rewrite the entire row when updating only heartbeat_at?

No, MySQL does not necessarily rewrite the entire row. InnoDB has sophisticated update optimization that can perform in-place updates when field sizes don't change. row0upd.cc:1-50

Since heartbeat_at is a DATETIME column, it has a fixed storage size regardless of the actual timestamp value. When you update it with NOW(), the field size remains constant, which allows InnoDB to use its optimistic update path and perform an in-place update without rewriting the entire row. item.h:1-50

Impact of large JSON columns

However, your large data_json column (5-50kb) does affect update performance, even when not being updated itself. Large JSON columns are stored using InnoDB's external storage mechanism when they exceed certain size thresholds. row0upd.cc:1-50

The presence of externally stored columns can complicate the update process because:

  1. InnoDB needs to check external field references during updates

  2. Row structure becomes more complex with external pointers

  3. Additional overhead exists for managing external storage pages

Recommendation: Table splitting

Yes, separating the data_json column into a separate table would likely improve performance for your frequent heartbeat updates. This is because:

  1. The heartbeat table would have a simpler row structure with only fixed-size columns

  2. Updates would be faster due to smaller row size and no external storage overhead

  3. Less buffer pool pollution from infrequently accessed large JSON data

  4. Better cache locality for the frequently updated heartbeat data

You could restructure as:

  • Main table: id, heartbeat_at, ... (frequently updated)

  • Data table: id, data_json (rarely updated, joined when needed)

This separation would allow InnoDB's in-place update optimizations to work most efficiently for your heartbeat updates while keeping the large JSON data isolated.

Notes

The update behavior depends on your table's row format (DYNAMIC, COMPRESSED, etc.) and whether InnoDB can determine that only fixed-size fields are being modified. dict0dd.cc:1-50 DATETIME columns are always fixed-size, making them ideal for in-place updates, while large JSON columns introduce complexity that can impact overall update performance even when not directly modified.

Practical Recommendations

  • Assess Update Patterns: First, use monitoring tools (such as Performance Schema, slow query logs) to observe lock contention and CPU usage during update operations to confirm whether high-frequency updates are causing a performance bottleneck.

  • Optimize Index Design: Ensure that there is an appropriate index on the heartbeat_at column so that update operations can quickly locate the target rows, reducing unnecessary full table scans.

  • Consider Splitting Strategy: If testing shows that splitting the table can significantly reduce lock contention and the overhead of row rewriting, consider moving data_json to a separate table. However, be sure to maintain proper data association and consistency.

  • Hardware and Configuration Tuning: Beyond SQL logic adjustments, also evaluate server resources such as CPU, memory, and disk I/O to ensure sufficient system resources are available and to avoid abnormal CPU usage due to hardware bottlenecks.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.