Sunday, October 04, 2015

The AEM Dispatcher...

Dispatcher is an important component in AEM. At the core its a simple module that can be deployed on a web server and provides caching and load balancing services. (Note: AEM dispatcher module is available only for a limited set of web servers, checkout the official documentation for the list of servers supported)

Caching Service
Dispatcher is primarily used for its caching capabilities. AEM is not very great at performance (I personally am yet to take our system into performance testing and see AEM behavior under stress; but its clear from forum discussions that its a proven fact about AEM performance). Its imperative that we deploy the dispatcher in front of the publisher instance to reduce the traffic that end up on the AEM instance

The regular behavior of caching and delivering of cached content is well documented. I will talk about few unconventional scenarios that we had to deal with

Caching cross referenced content: Assume a page (p1) that includes fragments from another page (p2). Now when caching p1, it includes content of p2. When p2 changes, p1 is stale. But there is no way of telling dispatcher that p1 is to be invalidated when p2 changes.

The content structure in AEM plays a key role in cache invalidation. When a page at a specific node is modified, it invalidates all nodes under it in the dispatcher cache.

Another key feature in AEM is that almost all referenced content are lazy loaded. In the scenario explained above, the contents of p1 typically should not contain p2. Instead it should only contain reference to p2. The content of p2 itself should be lazy loaded onto appropriate position within p1. This way on the server side, content request to p1 and p2 are two independent request both of which gets cached & invalidated independent of each other.

There could be scenarios where referenced content cannot be lazy loaded. Configuration gets tricky in such scenario. The only parameters that we have to control the behavior of cache invalidation are
1. A invalidation globing pattern which filters the file that gets invalidated
2. statfileslevel sets a level for invalidation. Setting this to 0 (which is the default) invalidate all cached content matching the invalidation globing pattern on any activation. Setting this to a higher level only invalidates the content above this level along the path at which the activation happens.

We should use the above two parameters to achieve a caching behavior desired.

A scenario we encountered, where we could not achieve the desired cache behavior based on the above parameters is when we had
1. Reusable content : Authors can create reusable content under some base paths (say /content/persons, /content/things, ...)
2. Pages that can include these reusable content which is under a different path (say /content/corp/site/interests page can have persons and things included in it)

And the tricky part is these reusable content cannot be lazy loaded. In fact we have a rendering engine outside of AEM to render the page, and it gets all contents needed for rendering a page at one go from AEM.

We got it to work reasonably well by configuring the cache such that reusable content gets cached one folder below the pages cache. Reusable content gets cached at say /tmp/cache & pages gets cached at /tmp/cache/content. In this configuration, updating a reusable content invalidate all the pages as well, where as updating a page just flushes that single page.

We were able to achieve this by having multiple farms; we had 3 farms and a small url rewrite in web server configuration. More about this in the next post... The AEM Dispatcher Farms

Connected Assets

This is a feature introduced in 6.5 release.  To understand the concept of connected assets clearly, it is essential to understand th...