Sunday, November 29, 2015

Mongo vs Tar MK... When is it appropriate to use Mongo MK for AEM


The general thumb rule is MongoMK provides better scalability and TarMK provides better performance. But most applications requires both scalability and performance.

The detailed analysis for Author & Publish instances as per the Adobe recommended metrics is

Author Environment

Metrics comparison against the criteria to validate if MongoMK would be the suitable option for an application is in the below table

Metric
Criteria
Dotcom Requirement
In favor of MongoMK
Number of named users connected in a day
In the thousands or more
Less than 1000
No
Number of concurrent users
In the hundreds or more
100 concurrent users
Maybe
Volume of asset ingestions per day
In hundreds of thousands or more
About 1000’s of assets ingested per day during the peak
No
Volume of page edits per day
In hundreds of thousands or more (including automated updates via Multi Site Manager or news feed ingestions for example)
No automated updates, will have about 1000’s of page edits during the pack
No
Volume of searches per day
In tens of thousands or more
May have tens of thousands of searches during the peak
May be

Other points on MongoMK for Authoring Environment

·         Scalability – MongoMK provides the ability to have multiple AEM instances share the same MongoDB instance. This shared all clustering approach enables horizontal scalability of AEM instances, adding additional AEM nodes as the load increases
·         Replication – MongoMK reduces the overhead of replication; it effectively delegates replication functionality to the MongoDB, which has mature model to maintain replica sets. Also the no. of MongoDB replicas and AEM instances can be scaled independently
·         Distributed Authoring teams – MongoMK helps when we have distributed authoring teams spread across geographies. Content changes done by all authors gets persisted to the same primary MongoDB instance and gets replicated to the secondary replica sets
·         Automated Recovery – MongoDB replica sets and automated failover helps with automated system recovery when the primary MongoDB instance goes down or in case of the whole data center failure; never the less TarMK also provides the option for setting up automated recovery

Recommended Option for Authoring Environment

The choice of MongoMK or TarMK for authoring instance can be based on no. of AEM instances required in authoring environment, in line with Adobe recomendation.
·         If more than 1 active AEM instance is needed for Authoring environment – MongoMK
·         If the volume can be handled with 1 active AEM instance - TarMK

Publish Instance

The thumb rule for considering if MongoMK is a choice for publish instance is if publish instance is required to support user generated content. 

Other points on MongoMK for Publish Environment

Code releases – When releasing code changes to production for an application, we will have to release it to one data center, validate the release and then release it to the other data center. This sometimes is a mandatory requirement to ensure 24/7 availability. Such rolling release is well supported by TarMK. With MongoMK, code release gets replicated to all MongoDB replica’s making the isolated release to one data center difficult.
AEM Upgrades – Applying patches and release upgrades to AEM also requires rolling releases which is well supported by TarMK because of their shared nothing clustering approach
Horizontal scalability – Both TarMK and MongoMK supports horizontal scalability for publish instance. With TarMK, we can have a form of independent shared nothing AEM instances, which are kept in sync by replication.

Recommended Option for Publish Environment

The choice of MongoMK or TarMK for publish instance can be based on if the site supports user generated content or not.
·         Site support user generated content – MongoMK
·         Site does not support user generated content - TarMK 

Sunday, October 04, 2015

The AEM Dispatcher...

Dispatcher is an important component in AEM. At the core its a simple module that can be deployed on a web server and provides caching and load balancing services. (Note: AEM dispatcher module is available only for a limited set of web servers, checkout the official documentation for the list of servers supported)

Caching Service
Dispatcher is primarily used for its caching capabilities. AEM is not very great at performance (I personally am yet to take our system into performance testing and see AEM behavior under stress; but its clear from forum discussions that its a proven fact about AEM performance). Its imperative that we deploy the dispatcher in front of the publisher instance to reduce the traffic that end up on the AEM instance

The regular behavior of caching and delivering of cached content is well documented. I will talk about few unconventional scenarios that we had to deal with

Caching cross referenced content: Assume a page (p1) that includes fragments from another page (p2). Now when caching p1, it includes content of p2. When p2 changes, p1 is stale. But there is no way of telling dispatcher that p1 is to be invalidated when p2 changes.

The content structure in AEM plays a key role in cache invalidation. When a page at a specific node is modified, it invalidates all nodes under it in the dispatcher cache.

Another key feature in AEM is that almost all referenced content are lazy loaded. In the scenario explained above, the contents of p1 typically should not contain p2. Instead it should only contain reference to p2. The content of p2 itself should be lazy loaded onto appropriate position within p1. This way on the server side, content request to p1 and p2 are two independent request both of which gets cached & invalidated independent of each other.

There could be scenarios where referenced content cannot be lazy loaded. Configuration gets tricky in such scenario. The only parameters that we have to control the behavior of cache invalidation are
1. A invalidation globing pattern which filters the file that gets invalidated
2. statfileslevel sets a level for invalidation. Setting this to 0 (which is the default) invalidate all cached content matching the invalidation globing pattern on any activation. Setting this to a higher level only invalidates the content above this level along the path at which the activation happens.

We should use the above two parameters to achieve a caching behavior desired.

A scenario we encountered, where we could not achieve the desired cache behavior based on the above parameters is when we had
1. Reusable content : Authors can create reusable content under some base paths (say /content/persons, /content/things, ...)
2. Pages that can include these reusable content which is under a different path (say /content/corp/site/interests page can have persons and things included in it)

And the tricky part is these reusable content cannot be lazy loaded. In fact we have a rendering engine outside of AEM to render the page, and it gets all contents needed for rendering a page at one go from AEM.

We got it to work reasonably well by configuring the cache such that reusable content gets cached one folder below the pages cache. Reusable content gets cached at say /tmp/cache & pages gets cached at /tmp/cache/content. In this configuration, updating a reusable content invalidate all the pages as well, where as updating a page just flushes that single page.

We were able to achieve this by having multiple farms; we had 3 farms and a small url rewrite in web server configuration. More about this in the next post... The AEM Dispatcher Farms

Sunday, September 27, 2015

Adobe Experience Manager - An Intro

{
Recently had an opportunity to work on AEM. And we had to use it in a very unconventional way... This led us to explore the inner workings of AEM, to understand its behavior in detail and hook it into the overall system architecture. What made it even more interesting is that this is my first experience with any CMS system :);

Summarizing my learning's in a series of blogs... starting with a simple introduction to AEM (from my pov); 
}


Formerly and still popularly (at-least in my view) known as CQ5, Adobe Experience Manager (AEM) is one of its kind content management system.

The fundamental architecture of AEM is very different from many other CMS systems

At the core of it is the JCR repository. This node based simple storage structure forms the backbone of the AEM system. Almost everything in AEM which includes code, content, binaries, resources are stored in this JCR tree.

And sling with its resource oriented architecture forms the UI control layer in AEM. This is very different from the standard MVC paradigm that we have been so used to in web solution architectures. Only 2 factors matter in the sling architecture.
- The resource - is an unambiguous path to the source of information in the JCR tree
- The script that renders the resource - determines the script that is executed on the resource to deliver the response
Both the resource and the script are determined by the url used to access the web page.

And finally the OSGi which forms the back-end component framework, which provides for the heavy lifting business logic layer. All the Java logic we do goes into this layer.

And the interesting thing is, all these three are open source Apache projects.

The integrating factor that binds these three frameworks to provide cohesive functionality is the Adobe's Granite framework.

There are multiple other pieces that adds to the capabilities that AEM provides. One such important piece is the Dispatcher module. The next few articles focuses on this module in detail.

Connected Assets

This is a feature introduced in 6.5 release.  To understand the concept of connected assets clearly, it is essential to understand th...