support@majako.se

Categories

Menu

Making nopCommerce faster and scalable with improved caching: FamilyWallpapers

Making nopCommerce faster and scalable with improved caching: FamilyWallpapers

FamilyWallpapers is one of the largest online stores for wallpaper and paint in the Nordic region. Majako works in a close partnership with the store owners to provide development and hosting, and FamilyWallpapers has seen incredible growth over the past two years. With a constantly growing product catalogue and five stores serving as many countries, the database now contains:

  • 30000+ published products,
  • 5 languages,
  • 400000+ localised properties,
  • 57000 locale string resources,
  • 3600 published categories, and
  • 350000+ active URL records.

At the time of writing (2023-01-30), monitoring reports an average of 1390 requests per minute over the past month, typically ranging between a few hundred requests per minute during the night and above 3500 during peak hours. Google Analytics shows 368 active users in the last 30 minutes: a good but representative traffic level for a Monday afternoon.

In order to provide stable service for that many users without needlessly soaring hosting costs, an automatically scaling hosting environment is a necessity. Somewhat complicating the matter, however, the store admins perform daily product imports, price updates and other administrative work that requires all instances to stay in sync so as to display the same information for all customers. Unfortunately, the preexisting implementation of distributed caching in nopCommerce was, to put it mildly, very slow for the volumes of data involved. The store owners, meanwhile, rightfully priding themselves on running one of the fastest shopping sites in Sweden, were not willing to compromise on response times.

The solution we ended up developing for FamilyWallpapers has, along with a number of other performance optimisations, recently been submitted by us to the core nopCommerce project and is slated to be included in the 4.70 release later this year. We are proud to say that FamilyWallpapers has been running as a distributed, scalable nopCommerce app for over two months with no loss in performance, and it is our hope that our contribution will enable many more nopCommerce stores to continue growing and truly move onto the cloud with all its benefits.

Summary

Our changes to the caching strategy can broadly be summarised in three points:

  • the new distributed cache option RedisSynchronizedMemory, a regular fast memory cache that uses Redis events to keep instances in sync,
  • lazily acquiring data on cache misses to avoid duplicating work, and
  • storing large collections in specialised data structures for faster retrieval.

For FamilyWallpapers, these combined improvements resulted in a decrease of, on average,

  • 98 % of startup time,
  • 71 % of maximum memory usage, and
  • 96 % of response times[1]

on a benchmark battery compared to using the distributed cache included in nopCommerce 4.60. Compared to the old memory cache, we observed a reduction of

  • 86 % of startup time and
  • 27 % of maximum memory usage.

Lower and more predictable memory usage allows us to host the site on a cheaper server without risking performance dips or downtime, while faster startup and warmup lets us quickly scale to meet increased traffic during peak hours. Faster response times, of course, are crucial to a smooth user experience for the customers.

Method

In the comparisons below, we use the latest 4.60 release of nopCommerce as a baseline, compared to the development branch incorporating our optimisations. The database is a copy of the FamilyWallpapers database, and we use a C1 Basic Azure Redis server (1 GB cache size, 1000 client connections, low network performance). "Warmup" refers to making one request each (in sequence) to the main page, two category pages with different category structure, and the cart, in order to populate the cache. Our regular list of warmup URLs targets a product page as well, but decided not to include that here as they require extra plugins to function.

Startup tests are measured from and including application launch to the end of the warmup phase. For the load tests, we make 10 parallel requests at a time to a total of 200 requests, to the already warmed-up index page.

We measure the execution time from beginning to end as defined for each benchmark above, together with memory and CPU usage during the process. Each experiment is repeated three times for each setting, and all three runs are plotted together.

Distributed cache comparison

Here we compare the old implementation of the distributed cache using Redis with our new Redis-synchronised memory cache.

Cache implementationStartup time (avg.)Max. memory (avg.)
Unoptimised Redis, first load1568.4 s3240.6 MB[2]
Unoptimised Redis, second load157.2 s773.3 MB
Redis-synchronised memory cache23.3 s945.3 MB

mem_dist_1

cpu_dist_1

During the second startup, Redis is already populated, leading to shorter startup times and less memory usage for the pure distributed cache. The Redis-synchronised memory cache is not affected, as it does not store data on Redis in the first place.

mem_dist_2

cpu_dist_2

Load test

During the load test, although the application is already warmed up, the pure Redis cache still has to retrieve data from Redis for each new request. As this is a rather time-consuming operation, this leads to the CPU spending most of the time idly waiting for data before it can process a request.

Cache implementationResponse time (avg.)
Unoptimised Redis20549 ms
Redis-synchronised memory cache867 ms

mem_dist_load

cpu_dist_load

Memory cache comparison

The cache optimisations are not limited to distributed caching: we have also improved the regular memory cache in several ways. Owing to enhancements both in how data is stored in the cache and in what kind of data structures are used to cache large collections, startup time has been greatly reduced.

Cache implementationStartup time (avg.)Max. memory (avg.)
Unoptimised memory cache175.5 s1302.2 MB
Optimised memory cache24.0 s953.8 MB

mem_mem

cpu_mem

Cold-start load testing

The main difference between the old and the new memory-cache implementations is that we now avoid duplicating work when a cache miss occurs. On a warmed-up page, this makes no difference, but the effect on performance is particularly visible when multiple requests for the same page arrive at the same time and the page in question has not been warmed up. In effect, this is the same as when a page loads its components in parallel during warmup, even on a single request. Note that this effect could be visible whenever part of the cache is cleared for whatever reason, such as when a product or category is updated during heavy traffic.

Below, we make a single batch of 10 simultaneous requests to a cold page. 4 requests are processed simultaneously, but the optimised cache acquires the data once while the other three requests wait for the task to finish without using additional resources. The unoptimised cache, on the other hand, starts four identical acquisition tasks that all interfere with each other by using the same resources. Much extra memory is required for data that will later be discarded.

Cache implementationTotal response time (avg.)Max. memory (avg.)
Unoptimised memory cache547.7 s2067.6 MB
Optimised memory cache18.1 s1108.8 MB

mem_mem_cold

cpu_mem_cold

Conclusion

Thanks to several major improvements in the caching strategy, FamilyWallpapers has been able to continue growing with nopCommerce and move from a single virtual machine onto a distributed and scalable modern hosting solution on the cloud, without sacrificing response times. The site can reliably handle increased traffic during peak hours while scaling down to save costs during the night, all while serving content from a huge database at lightning speed.

These improvements have been accepted into the core nopCommerce project and will be available in the next release. Until then, if you are looking to speed up your nopCommerce store or move to distributed hosting, do not hesitate to reach out to Majako via the links below. We also offer plugins for HTML output caching, which further reduces response times and server load, and ElasticSearch integration for fast and improved search and filtering.

Let's keep growing with nopCommerce and continue giving our customers a smooth and enjoyable shopping experience!

majako.net

majako.se (Swedish site)

majako logo

The benchmark scripts we used for this article

The pull request for rewriting the cache

The pull request for optimising large collections

FamilyWallpapers

Footnotes

  1. It should be noted that the actual response times on the live site were already much lower on most pages (around 500 ms), thanks to Majako's HTML cache plugin. Please contact us at support@majako.se for more information!

  2. We are not entirely certain why the unoptimised Redis cache uses as much memory as it does, but it could be that it first stores the actual objects in memory as they are created, then the JSON-serialised strings that are cached on Redis, and then again the same JSON strings as they are retrieved from Redis when they are next needed, and again deserialised back into C# objects. In that case, the garbage collector will eventually take care of most of them, but the application still requires an enormous amount of extra memory during startup. Even so, the Redis server reported a maximum of 225 MB, so it is likely that the same data is also being created multiple times on simultaneous cache misses.

Comments

Guest

Great work! thanks for sharing...

Guest

great

Guest

Hey, awesome feature, work, and research.

Quick question.  Are eventual consistency / dirty reads a problem with this approach, or did you work around that issue (or accept the trade-off)?  It's not clear to me if the application RAM memory model can be behind (out of sync) with redis while waiting for an event, or if there is some sort of synchronization logic that blocks requests.  Or some other mechanism to prevents dirty reads.  Not even sure if that would be possible, but curious if I am asking the right question and if you considered this at all.

Leave your comment