Struggles with syncing big databases and caches

Published on 2020-06-10

When working on a project I want my local environment to be very similar to production systems. This means that I prefer to download these databases regularly. But sometimes they are sooo huge and filled with stuff like sessions, caching I don't need. Now I must tackle this.

In this project the upstream database is multiple GBs. And I can only access it through a slow connection so it takes hours to download the thing!

My goal: sync databases from upstream to my local environment, without irrelevant and huge tables.

Step 1: Show me the big tables

Run SQL query for analysis

Thanks to: https://chartio.com/resources/tutorials/how-to-get-the-size-of-a-table-in-mysql/

SELECT

TABLE_NAME AS `Table`,

ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS `Size (MB)`

FROM

information_schema.TABLES

WHERE

TABLE_SCHEMA = "database_name"

ORDER BY

(DATA_LENGTH + INDEX_LENGTH)
DESC;

Result

No doubts here: just a few tables I have to consider.

Step 2: Create a script that downloads the full database, without specific tables

So I use SequelPro on MacOs. In the sql-export interface I can choose per table to:

Include Schema (column S)
Include Content (column C)
Include DROP-TABLE (column D)

What is appropriate to do? I am gonna select all tables, but uncheck 'content' in the export-interface.

I am doing this for all tables that are too big for me to download or irrelevant. In this case 'watchdog' and all cache tables.

Save export-settings so you can re-use them again

This is a feature in SequelPro. On the export-dialog right-top there is a 'save' functionality.

Step 3: import on my local database

Yes I am a happy man now. The export went well. I took just a few minutes (goes through a bunch of ssh-tunnels).
The export-file (.sql.gz) was just 1.6 MB in size and not multiple GB.
The import went extremly fast (seconds). And it also created the cache tables and watchdog tables, but empty (perfectly fine).