Friday, June 21, 2019

Applying “Binary less” Replication


When configuring replication agents, one of the options for the serialization type setting is “Binary less”. 

We will see what a “Binary less” replication mean, how it behaves, the use cases where its applicable and an approach that can be used for configuring it in this blog.

What is Binary less replication?

Binary less replication means that the binaries are left out from the content being replicated. When replicating an asset for example, only the metadata of the asset gets replicated. The binaries of the asset which comprises of the original asset and all its renditions are not included in the replicated content.

When can it be used?

Binary less replication is useful when multiple AEM instances are sharing a common datastore. The binaries are shared through the common datastore and hence there is no need to replicate them to all the instances.

How it works?

It’s important to understand the way binary less replication works to properly configure it for your scenario. 

While creating a replication package for the content being replicated, the binaries are replaced with their hash code reference. The package with hash code references for the binaries is then sent to the receiver. When the receiver installs the received replication package, it tries to resolve this hash code reference to the binary in its datastore.

Since the receiver shares the same datastore as that of the sender, it resolves the hash code reference to the actual binary in the datastore and links it where the hash code references are included in the replicated content.

If the datastore is not configured properly or for some reason, the receiver is not able to resolve the binary based on the hash code reference, it falls back to the default replication mode and redoes the replication including the binaries in the replication package.

The overall replication does not fail in this case. The status if the binary less replication was successful can be checked in the logs. 

Check for log statements with text patterns “FAILED PATHS START”, “FAILED PATHS END” for details on failed binary less replications and text pattern “set using a reference” for successful binary less replication.

Use cases of Binary less replication

Binary less replication is useful in setups using shared datastore across instances. One common use case is when all AEM instances uses a common datastore (author and all publishers), the replication agents from author to all the publishers can be configured with binary less serialization type as the asset upload to the author would place the binaries in the datastore - which is shared by all the publish instances as well. 

Special cases - Approach for configuring Binary less replication

Replication configuration should be carefully thought-out when we have setups where all instances do not share the same datastore, but instead distinct groups of servers have shared datastore. 

Some common setup configurations where this applies are
  1. A shared datastore for all publishers, but a separate data store for author
  2. Separate shared datastore for Primary and DR environments


Let us take the case 1 where the author has a separate datastore and all publish instances share a common datastore. 

This configuration is illustrated below












               
In this case, configuring binary less for replication to all publish instances from author would cause the replication to fail and fallback to default replication in an ad-hoc manner. The asynchronous nature of replication to all publishers simultaneously causes the no. of failed binary less replication to be high.

To overcome this situation, its ideal to designate one publish instance as a Gateway instance in such scenarios. Configure author to perform default replication to this gateway publish instance. This step would make sure that the binary gets replicated to the single publish instance and gets persisted in its datastore (which is also shared by other publish instances)


Now configure the gateway instance to chain replicate the content to the other publisher instances in binary less mode. Chain replication starts after the successful install of the content on the gateway instance. This ensures that the binaries are replicated and persisted in the shared datastore through the gateway instance before the binary less replication kicks in for replicating content to other instances in the cluster.

This configuration is illustrated in the below diagram



In cases where the setup includes a DR environment with a separate shared datastore for publish instances with-in the DR, the same configuration can be replicated by designating one instance in DR as gateway. 

The author in this case should be configured to perform default replication to the gateway instances of both the Primary and DR cluster. The gateway instances can then chain replicate the content to other instances within its cluster.


Limitations with this approach:

Using this approach discussed in the above section to leverage binary less replication through gateway instances does come with a few limitations that we need to be aware of and plan for

Introduces delay in replication completion
It introduces delay in the replication completion. Also during the interval between replication to gateway and the completion of replication, the content between the gateway instance and other instances within a cluster would be out of sync. 

Usually this gets completed in few seconds but could go higher depending on the load on the system and the no. of concurrent replications performed. 

In cases where content sync mismatch for even such short duration is not acceptable, the gateway could be removed out of the instances serving the content to the end users. This way only the other publish instances which have their content in near real-time sync would be serving content for the end users.


Replication status reflected on author:
Note that with this approach, the status turns green (stating replication successful) as soon as the replication to the gateway instance is successful. This could send a wrong signal to the content publishers, especially when there is delay or issues encountered in chain replication.


Gateway instance failure
Another aspect that must be planned for with the above configuration is in the event of gateway instance failures. 

When a gateway instance fails for some reason, another instance within the cluster should be promoted as the new gateway with replication agents on the author and the new gateway reconfigured to have a working setup. Be ready with scripts to reconfigure the instances which can be run in the event of a gateway failure.




No comments:

Connected Assets

This is a feature introduced in 6.5 release.  To understand the concept of connected assets clearly, it is essential to understand th...