Debugging an infinite loop in Magento 1

The collectTotals() vortex of doom

Tony Brown
tonegolf71

--

Recently, after a release we started to see a problem on a Magento 1 site in New Relic. There were sporadic slow transaction traces which were running for up to 30 seconds being reported.

This turned out to be one of those issues that we love to hate, an issue that is hard to replicate.

What followed was a journey to uncover the cause of these slow transactions and ultimately resolve it.

Where to start?

How do you debug an issue that you cannot replicate? Well, unfortunately there is no simple answer to that. But ultimately you need to get to the point of being able to replicate the issue in order to be confident with any fix.

In this case we have New Relic to come to the rescue, not only is New Relic bringing to our attention that there are slow transactions occurring, it will also record a trace of the transaction to aid in debugging. This is a great feature that we have used to identify the source of problems in the past.

For this issue, the slow traces were being recorded for various different pages, the first we looked at was for the cms/index/index controller (all other related traces were very similar).

What you can see above is the summary for the transaction trace, this helps to give an overview of the most time intensive elements of the trace, it also includes counts of the number of times certain aspects have been called.

Upon investigating the traces for this issue, we found that there were a significant number (in the hundreds) of calls to the _assignProducts method of the Mage_Sales_Model_Resource_Quote_Item_Collection class.

The presence of this class indicated that there must be a quote present (cart/basket) for this request, maybe this issue is effecting everyone who has a quote? Upon testing that… No joy.

Looking at the number of times that _assignProducts is being called (383 times in this case), maybe it is just people who have a large amount of items in their quote? Upon testing that… No joy.

Ok, maybe the Trace Details can save the day? This section provides a breakdown of the transaction along with the time each part has taken.

Unfortunately, in this case the Trace Details breakdown was not providing helpful information. You could only see that it was executing a significant amount of calls to the _assignProducts method.

At this point it made sense to look at the code in question to try and understand how this could happen. Using Xdebug, we were able to step through the code, but nothing jumped out as to what could cause this. Even with many items in the quote, the method was only called once!?

How could we capture more information about the state of the application at the point the issue is occurring?

Logging to the rescue!

We knew that under certain circumstances, the _assignProducts method could get called a significant number of times, so if we added some logic to try and capture when this occurred, we could then understand more about those circumstances and hopefully replicate the issue.

We could achieve this by maintaining a count of calls to the method and then when the count reached a certain threshold, we could log out some information which would aid in our investigation.

So we added the following logic to the core code (temporarily I might add) that would use the Magento Registry to keep a count of the _assignProducts calls, when it reached 10, we would log out the users quote ID and also a backtrace of the last 50 lines of execution to better understand the path that Magento had taken.

$debugCounter = Mage::registry("quote_debug");if (! $debugCounter) {
Mage::register("quote_debug", 1);
} else {
$debugCounter++;
Mage::unregister("quote_debug");
Mage::register("quote_debug", $debugCounter);
}
if ($debugCounter == 10) {
Mage::log($this->_quote->getEntityId(), null, "quoteDebug.log", true);
Mage::log(debug_backtrace(DEBUG_BACKTRACE_IGNORE_ARGS, 50), null, "quoteDebug.log", true);
}

Once we had this code live, we had some information logged within a few minutes.

Firstly we looked at the trace and it was clear to see we were caught in a loop between observers, but why was this only happening to certain people?

The clue was seeing the collectTotals method being called on Mage_Sales_Model_Quote in the loop. I’m not going to go in to great detail about this method, you can read more in to that via Classy Llama’s blog. However, we have seen an issue before around this method that has caused an infinite loop. Due to our logging, we now have the quote ID of the person effected, when we looked at that quote entry in the database, we could see that it had the field trigger_recollect set to “1”.

On our local environment, we set up our own quote with the same quote items and switched trigger_recollect to “1” and BOOM, issue replicated!

Now for the easier bit… we needed to solve the issue.

We could now use XDebug to step through the code whilst the problem was occurring and we could see the issue very plainly.

In summary, having trigger_recollect set to “1” on a quote will cause the collectTotals method to be called on the next request where a quote is displayed. On this site there was an Amasty Promo module installed which was observing the sales_quote_collect_totals_before event, as part of this code it was removing items from the quote and re-adding them. This site also had a custom module that was observing the sales_quote_remove_item event, this code was loading the quote via a model load which was then triggering a collectTotals call and so the vortex of doom continued!

There were 2 things that were fixed off the back of this, firstly the Amasty module was upgraded, this consequently no longer took the approach to remove and re-add items as part of its logic. But we also changed the custom code to no longer load the whole quote, this was because it was an unnecessary overhead as the data required was already loaded and available via the observer Varien_Event_Observer object that was passed in.

Thankfully, we managed to navigate our way through this particular problem, but issues that occur that cannot be replicated can be a nightmare!

It is important to try and adopt different strategies and techniques to help focus in on such issues and ultimately replicate them. If you get stuck with such a problem, try to collaborate with someone to get a fresh way of thinking about the situation.

--

--

Tech consultant and developer. Loves eCommerce, Shopify and Magento. Speaker, Mage Titans organiser, former Tech director at Space 48. Magento Master 2017.