Handling High-Volume Attribution with Magento 2
Extending Magento 2 capabilities to support large number of total attributes (greater than 10,000) and large number of attributes per attribute set (greater than 1000 per set)
Online marketplaces accounted for 67% of global e-commerce in 2021. $3.23 trillion was spent globally on the top 100 online marketplaces. Walmart’s marketplace has over 100,000 sellers, while Amazon sells close to 350 million products on its marketplace. Traditional eCommerce platforms must scale up very fast to process the tremendous volumes of data they now need to handle.
As more and more sellers are onboarded onto the marketplace, the assortment of products being offered increases. This causes unique challenges in categorizing these products without impacting the overall performance and ease of use. One such challenge the McFadyen Digital team faced during a recent marketplace implementation was handling the numerous product attributes that came with seller products.
We implemented a B2B marketplace on Magento and Webkul for one of our clients in the US selling industrial fastening products. As more sellers were onboarded, the product assortment increased, and so did the number of unique product attributes each brought. The attribute number exceeded what Magento recommends, and we faced significant performance issues. Our technical experts carried out a detailed analysis of the problem and were able to put in some workarounds to handle the situation. We will discuss a couple of these in this blog.
Magento recommends dividing many attributes into multiple attribute sets. But, if the number of attribute sets also becomes large, it adversely impacts the performance of the product and attributes import. This happens because Magento preloads all the attributes and attribute sets into memory. For example, suppose there are 1000 attribute sets, and we import only a subset of products. In that case, Magento still loads all the attributes and attribute sets available, which increases the memory consumption, thus increasing the import process time. When importing products, the AbstractType init method is called, which loads all the attributes available in the product database. First, all attributes are loaded, then the attribute sets, and these are then arranged based on the attribute id..
The easy workaround for this problem is to load only the attribute set /products required from the default CSV file instead of trying to make a full import. For example, suppose we are trying to import a single category of products. In that case, we load attributes only from that category and not the rest of the attributes /attributes sets into memory.
For this, we override the init method of the Simple Import type as below::
This helped us improve the performance remarkably and the time taken for the import was reduced by almost 30%.
While visiting any category page, Magento passes all attribute data to Elastic Search for retrieving category and aggregation details. Many attributes lead to high memory consumption, resulting in performance degradation..
The FilterableAttributeList class will load all the attribute and attribute sets in a category even if we have only a subset of attributes required to filter on the specific category..
When we connect with Elastic Search, the request must be built with two parameters, like what we do in MySQL while creating a query,
- Match Parameter
- Aggregation, which brings all data that we require
Magento uses aggregation as layer navigation and returns all the data, so the aggregation count will increase when we have a large amount of data. This will impact Elastic Search when we try to get more aggregation data into a single query.
This can also cause problems with indexing in the flat entity tables. Flat entity tables are not per attribute set and thus are an aggregation of all attributes which can run into MySQL maximum column limits.
As of MySQL 5.6.9, the maximum number of columns is 1017, which will make the admin product edit page an unusable crawl causing a severe impact on the rendering of products on the front end.
How Magento works:
Magento loads the attribute metadata and then the value for that attribute. Entities will have to store this information in memory. However, because attribute metadata is stored in the Magento cache, the worst effects would be seen on the first-page load.
Again, KnockoutJS is not designed for high performance..
Since Magento tries to pass all the attributes of a single category, override the request builder, narrow down to the specific ones related to the category and provide a mapping feature to the category. So, whenever the category page loads, we load only attributes of that category.
Create () function must be overridden to create a dynamic request based on the category search page, passing the category ID or brand Id or seller Id.
As per Magento documentation, the limit for product attributes is 2000 with 500 filterable. More than 500 or more than 10k product attributes introduce several performance degradations in the storefront and Magento admin (including the reported issue).
To sum up, the following best practices should be kept in mind while handling a large number of product attributes:
- Use different Product templates (attribute sets) for different products.
- In Magento Admin, there is a field “Use in Product Listing.” Enable only those attributes which we are planning to enable on the category page
- Leverage custom options and complex products for variations management
- Minimize the number of searchable attributes
- Remove unused product properties.
- Store and manage non-commerce-related attributes in external PMS systems
While these solutions were introduced as workarounds to the actual problem, in the long term, eCommerce platforms will have to scale up to address these challenges posed by the ever-increasing volumes of product data that come with online marketplaces.
At McFadyen Digital, we’ve been developing online marketplaces for over 15 years and are constantly exposed to available platforms. Our technology services for marketplace operators include architecture review, vendor analysis, implementation, and more. To know more visit our marketplace technology solutions. For more information on everything you need to build and manage a customized Commerce store. Visit https://devdocs.magento.com/
About the Author
Sharada Rao is a Senior Software Engineer at McFadyen Digital. She is fond of learning and implementing new technologies. An avid traveler and gastronome, Sharada likes to explore new places. She stays with her husband and 11 months old son in Bengaluru. During her stint with McFadyen Digital, she has been part of some big-ticket eCommerce/ marketplace implementation projects.