Skip to main content

Setup large imports with Drupal Feeds

Published on 2013-08-13
So when I import from a feed using a form (/import/node_importer), I get a nice little progress bar and I can import roughly 4k rows in about a minute and a half. How can I get the same performance using cron/periodic imports/process in background? I currently have it configured to run every 15 minutes but it seems to make about 1% progress every time it runs- in other words about 4% every hour. Is there a way to make it import the whole lot (100%) every 15 minutes? Am I missing something?

Cron processes feeds in chunks. The default chunk size of 50 is extremely small. Just add this line to your settings.php file to change the chunk size.

$conf['feeds_process_limit'] = 2000;

I believe the only limit is a number that will not cause a php script time out. 2000 works fine for me.

 

The default chunk-size is set low, because a high value can really kill your website-performance.

The default php.ini settings (used on frontend) gives you 30 seconds to finish a php script (max_execution_time). On high performance sites, this setting could be lower.

If you put your feeds_process_limit high, than the import process can take more than 30 seconds easily. At 30 seconds you get your timeout error.

Feeds has multiple ways to run the import process:

  1. Front-end (with the progressbar)
  2. Cron
  3. Some sandbox-modules provide a third option: Drush (https://drupal.org/sandbox/enzo/1865202).

 

In most setups cron runs with different php-settings (via php-cli) than the normal frontend php.ini. This is cool, because on frontend you want speed (a lot of concurrent users / php-processes running short time) and on backend you want power (a php process that imports all your nodes).
Drush also uses these php-cli.ini settings.

The default max_execution_time for php-cli is 0, which means that it could run forever. That should be enough for 2000 nodes.

So if you want to import all your nodes every 15 minutes on a live site I would recommend to import via php-cli. That way you can set your feeds_process_limit high, without hitting on your frontend performance.

Howto do it:

  • In your feed-settings, choose 'import every 15 minutes'.
  • Run your cron every 15 minutes.
    • Be sure to run cron via php_cli (ask your hoster). You could make use of Drush to run cron. Or you could use Drush to run feeds_import (via the sandbox module above).
  • Put your feeds_process_limit high
    • Be sure 15 minutes is enough to import all nodes, if not lower the feed 'import every xx' *and* your cron frequency untill all nodes are imported.

 

Drupal variables like "feeds_process_limit" can be changed on multiple locations. Most easy ways are:

  • settings.php --> $conf['feeds_process_limit'] = 2000;
  • drush --> drush vset feeds_process_limit 2000

 

Via https://drupal.org/node/1551246