Speeding up Datafeedr Bulk Image Processing
I’m in the process of building a massive affiliate store. This affiliate store will have 500,000+ products in it and I’m going to use it to:
- Make money (of course!)
- Demo my Fast WooCommerce Filters (every WooCommerce category/attribute/price filter I’ve seen has awful performance on large stores and is the primary culprit for WooCommerce being slow at anything above 20,000 products) Update: This performance plugin is now available.
- Demo Datafeedr (these guys make it really easy to build affiliate stores)
- Demo my fast theme
- Demo my Fast Ajax Product Search
So far I’ve loaded about 30,000 products (Update: now 820,000 products!). That’s large enough to see performance issues but small enough to not get annoyed while fixing them.
The first issue I ran into was that my shop category pages were slow (they shouldn’t be! Not with my plugins!). I installed the SQL Query Monitor plugin and found that there were hundreds of update_meta calls. Digging a little deeper, I found that Datafeedr is updating the images and downloading them on the fly. They do this to speed up the Datafeedr imports and effectively they’re loading the images just in time.
That’s not good for testing my filter speed however, so I ran the bulk image import. I started off by getting speeds of about 1 item processed every 3 – 6 seconds.
The server for this affiliate store is on Digital Ocean – it has a GB connection to the outside world and it uses SSD drives for storage, so fetching and saving an image should take a few milliseconds. I dug into the code and this is the query I found, at line 130-ish in plugins/datafeedr-product-sets/functions/ajax.php:
$id = $wpdb->get_var( " SELECT pm1.post_id AS post_id FROM $wpdb->postmeta AS pm1 JOIN $wpdb->posts AS p ON p.ID = pm1.post_id LEFT JOIN $wpdb->postmeta AS pm2 ON p.ID = pm2.post_id AND pm2.meta_key = '_thumbnail_id' LEFT JOIN $wpdb->postmeta AS pm3 ON p.ID = pm3.post_id WHERE pm1.meta_key = '_dfrps_product_check_image' AND pm1.meta_value = '1' AND pm2.post_id IS NULL AND pm3.meta_key = '_dfrps_product_set_id' AND p.post_status = 'publish' ORDER BY post_id ASC " );
What that’s doing is checking to see which products don’t yet have an image saved against them.
Digging further, I found that the _dfrps_product_check_image value is actually set at the start of the job to process imports. That means I can change the query to this:
$id = $wpdb->get_var( ”
select pm1.post_id
FROM $wpdb->postmeta AS pm1
WHERE pm1.meta_key = ‘_dfrps_product_check_image’
AND pm1.meta_value = ‘1’
and exists (select * from $wpdb->posts p where p.ID = pm1.post_id AND p.post_status = ‘publish’)
limit 1;
” );
You can see the results and the bug this caused in this video:
The reason for the bug, I found, is that some rows may not be processed for some reason, yet they don’t get the value of _dfrps_product_check_image changed to 0 (to avoid them being reprocessed).
So, I added this line to the plugins/datafeedr-product-sets/classes/class-dfrps-image-importer.
update_post_meta( $this->post->ID, '_dfrps_product_check_image', 0 ); // this needs to go here so we don't reprocess the same post over and over
Now, the Bulk Image Importer processes at least a dozen products per second as opposed to about a dozen per minute. That’s a 60-fold performance increase.
I’ve sent these edits over to Eric @ Datafeedr so they will be added to the next release of the plugin – in the meantime, if you wish to take advantage of this massive speed boost, go edit these two plugin files yourself and see the massive performance boost on your own site.
Here is the speed boost when I implemented this final solution:
Note: I also added these indexes to wp_posts and wp_postmeta (you can try with or without them – I prefer with, but the query is faster anyway and you might not like to edit the indexes on your underlying WordPress tables):
create index awdspeed1 on wp_posts(post_status, id); create index awdspeed2 on wp_postmeta(meta_key,meta_value(1),post_id); create index awdspeed3 on wp_postmeta(meta_key,post_id);
- My WordPress performance plugins and server stack have moved - July 31, 2016
- Price Comparison Pro 1.2 Released - July 5, 2016
- How to run backups on huge WordPress websites without your website being brought offline - February 4, 2016
April 23, 2016 @ 7:55 pm
Hi Dave,
Thanks for all the great info on this blog.
Were you able to make Datafeedr work with Variable Products? When we tried it would only do simple.
Thanks,
N
May 27, 2016 @ 1:55 pm
It currently only does simple – I have a plugin in plans, partly done, called ‘auto attributes’ which will dedup the multiple products datafeedr imports and change them into different options of a variable product. I don’t know when it’ll be ready though as I have a lot of other stuff to catch up on first.
June 18, 2016 @ 7:35 am
Any expectations as to when this will be ready?
July 4, 2016 @ 4:13 pm
Yes – it’s ready – I’m using it plus one client is using it. The reason I haven’t released it to the public yet is because I’d rather Eric put it into the core code for Datafeedr. I know they have switched over to focusing on the API rather than the factory so this should be soon. If you want a copy of what I’m using, let me know.
July 29, 2016 @ 4:58 pm
FYI – a couple of the plugins have been released – the Fast Filters plugin got renamed to the WPI Performance Plugin and I’ve created a new site for this new bundle of plugins – any existing customers will find they already have an account on there – http://www.wpintense.com – if you already have an Affiliate Web Designers account, you can claim your WPI account by visiting http://www.wpintense.com/my-account/ and hit ‘forgotten password’ and you’ll get a link to generate a new password.