Struggles with syncing big databases and caches

Home Archive 2020 Struggles with syncing big databases and caches

When working on a project I want my local environment to be very similar to production systems. This means that I prefer to download these databases regularly. But sometimes they are sooo huge and filled with stuff like sessions, caching I don't need. Now I must tackle this.

In this project the upstream database is multiple GBs. And I can only access it through a slow connection so it takes hours to download the thing!

My goal: sync databases from upstream to my local environment, without irrelevant and huge tables.

Step 1: Show me the big tables

Run SQL query for analysis

Thanks to:

  TABLE_NAME AS `Table`,
  ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS `Size (MB)`
  TABLE_SCHEMA = "database_name"


No doubts here: just a few tables I have to consider.

Step 2: Create a script that downloads the full database, without specific tables

So I use SequelPro on MacOs. In the sql-export interface I can choose per table to:

  • Include Schema (column S)
  • Include Content (column C)
  • Include DROP-TABLE (column D)

What is appropriate to do? I am gonna select all tables, but uncheck 'content' in the export-interface.

I am doing this for all tables that are too big for me to download or irrelevant. In this case 'watchdog' and all cache tables.

Save export-settings so you can re-use them again

This is a feature in SequelPro. On the export-dialog right-top there is a 'save' functionality.

Step 3: import on my local database

Yes I am a happy man now. The export went well. I took just a few minutes (goes through a bunch of ssh-tunnels).
The export-file (.sql.gz) was just 1.6 MB in size and not multiple GB. 
The import went extremly fast (seconds). And it also created the cache tables and watchdog tables, but empty (perfectly fine).