So when I import from a feed using a form (/import/node_importer), I get a nice little progress bar and I can import roughly 4k rows in about a minute and a half. How can I get the same performance using cron/periodic imports/process in background? I currently have it configured to run every 15 minutes but it seems to make about 1% progress every time it runs- in other words about 4% every hour. Is there a way to make it import the whole lot (100%) every 15 minutes? Am I missing something?
Cron processes feeds in chunks. The default chunk size of 50 is extremely small. Just add this line to your settings.php file to change the chunk size.
$conf['feeds_process_limit'] = 2000;
I believe the only limit is a number that will not cause a php script time out. 2000 works fine for me.
The default chunk-size is set low, because a high value can really kill your website-performance.
The default php.ini settings (used on frontend) gives you 30 seconds to finish a php script (max_execution_time). On high performance sites, this setting could be lower.
If you put your feeds_process_limit high, than the import process can take more than 30 seconds easily. At 30 seconds you get your timeout error.
Feeds has multiple ways to run the import process:
- Front-end (with the progressbar)
- Cron
- Some sandbox-modules provide a third option: Drush (https://drupal.org/sandbox/enzo/1865202).
In most setups cron runs with different php-settings (via php-cli) than the normal frontend php.ini. This is cool, because on frontend you want speed (a lot of concurrent users / php-processes running short time) and on backend you want power (a php process that imports all your nodes).
Drush also uses these php-cli.ini settings.
The default max_execution_time for php-cli is 0, which means that it could run forever. That should be enough for 2000 nodes.
So if you want to import all your nodes every 15 minutes on a live site I would recommend to import via php-cli. That way you can set your feeds_process_limit high, without hitting on your frontend performance.
Howto do it:
- In your feed-settings, choose 'import every 15 minutes'.
- Run your cron every 15 minutes.
-
- Be sure to run cron via php_cli (ask your hoster). You could make use of Drush to run cron. Or you could use Drush to run feeds_import (via the sandbox module above).
- Put your feeds_process_limit high
-
- Be sure 15 minutes is enough to import all nodes, if not lower the feed 'import every xx' *and* your cron frequency untill all nodes are imported.
Drupal variables like "feeds_process_limit" can be changed on multiple locations. Most easy ways are:
- settings.php --> $conf['feeds_process_limit'] = 2000;
- drush --> drush vset feeds_process_limit 2000