Thursday, June 27, 2019

Replication under the hood – What happens when you click activate?



When we click the activate button, we know that the item activated gets replicated to all the publish instances that have an active replication agent configured. 

The status of the replication is reflected as yellow icon while the replication is in progress which subsequently turns green or red depending on the final replication status.

But what happens internally during this process?

A sequence of steps happens on the sender side before an item gets placed in the replication queue of each applicable replication agent after which the content gets transferred to the receiver where it gets processed to complete the replication.

At a high level, the following are the steps that are performed
  1. Version creation
  2. Activation process on sender and placement of queue item
  3. Transfer of replicated content to receiver
  4. Processing replication on receiver
  5. Status update and retry if applicable


The high level flow is depicted in the below diagram





Version creation

The first step performed in activation is the creation of a frozen version of the content being replicated. This ensures that subsequent edits to the content while the replication is in progress do not impact the content being replicated. 

The frozen version thus created gets attached to the replication process and it’s this frozen content that would get replicated.

Activation process on sender

When the activation process kicks in, a sequence of steps happens and results in an item getting placed in the replication queue of each associated agents. These steps are







Configuration Collation
One or more replication agents could be configured as active depending on the no. of publish instances to replicate to. The configuration of these replication agents could defer to the extent that the content to be replicated for a given activation could be different. For this reason, the replication package is created separately for each of the replication agent configured.  

The first step in the activation process on the sender, is the identification of all the active replication agents that are configured. For each of the agent identified, its configuration is collated and kept as a ReplicationOptions object.


Permission check
Then a check is performed to validate if the user performing the activate action has the required permission to replicate the content that are selected for activation. Only the nodes that the activating user has replication permission on gets included for replication.


Preprocessing
After collating the replication agent configuration and performing the permission check, all the preprocessors (optional, if there are any), gets applied.

A custom preprocessor can be implemented by creating a class that implements the Preprocessor interface. Providing an implementation for its preprocess method in this class and configuring it as a service will register it as a preprocessor.

All the preprocessors that are thus configured gets called at this stage before proceeding to the next step.


Replication package creation
After applying all the preprocessors, the next step is the creation of the replication package for the activated item. 

A different replication package gets created for each replication agent depending on its configuration (based on the ReplicationOptions object created in the previous steps) and using the serializer as per the serialization type configured.

The replication package contains all the information needed for replication process to complete on the receiver end.


Queuing (Persist in JCR)
The created replication package gets persisted in JCR along with other metadata information like the item on which the activation is performed, type of action, user id, and so on. It gets stored in JCR under the node /var/eventing/jobs.

Once the replication package gets persisted in JCR, the activation process steps on the sender side is complete and is ready for transport over the network on to the receiver.

Also at this stage, the item is visible in the queue associated with the replication agent. The items in the queue are shown by querying for pending items under /var/eventing/jobs for the queue id associated with that replication agent.

Transfer to receiver

The responsibility of transferring the replication package over the network lies with the sender AEM instance. The sling job monitors for items that becomes available for replication and kicks off the process to transfer it to the receiver.

The sling job for a queue is synchronous. It processes the first item in the queue and only when its complete, it picks up the next item for processing.

Processing on receiver

The listener on the receiving end on receiving a replication package, performs deserialization of the received package and installs it to get the content replicated on to itself. 

A success status is sent back if all the steps are successful on the receiver side. If any of the step fails, a failure status is returned prompting the sender to retry

Status update and retry

After transferring the replication package, the sling job on the sender waits for the response from the receiver indicating the status of the replication at the receiving end. 

If the response is successful, it removes the item from the queue (deletes the item from JCR), marking the status of replication as successful.

In case the replication is not successful on the receiving end, the item on the queue is retained for reprocessing. The sling job then waits for the ‘Retry Delay’ duration to elapse before retrying to send the item again to the receiver.

No comments:

Connected Assets

This is a feature introduced in 6.5 release.  To understand the concept of connected assets clearly, it is essential to understand th...