On Adobe Experience Manager (AEM)

Wednesday, August 07, 2019

Connected Assets

This is a feature introduced in 6.5 release.

To understand the concept of connected assets clearly, it is essential to understand the two basic terminologies we have seen being used increasingly as AEM evolved. AEM Assets & AEM Sites

AEM Assets – This refers to the DAM part of AEM.
AEM Sites – This refers to the Pages (web pages created, authored and maintained) part of AEM.

We know that the site pages authored can use the assets from DAM on that AEM instance. Connected assets stretches this concept to allow the site pages authored in one AEM instance to use the DAM assets from another AEM instance.

AEM instances in this architecture are designated as Assets instance and Sites instance.

Essentially they are (at least at this stage) regular AEM instances, which are just designated as Assets and Sites instances based on the purpose they serve.

With connected assets, the sites instance can search for assets on the remote AEM assets instance and use them on its pages, similar to how it works with assets in its local DAM. Assets in its local DAM can continue to be used on pages.

In fact, the assets from the remote assets instance that are used on site pages get downloaded to a designated folder (provided in the configuration) in the DAM of the local sites instance (after which it remains out of sync with its master copy in the assets instance).

Since the assets downloaded gets created as new assets in sites instance DAM, we will have to customize workflow launcher configuration to skip this download folder as otherwise the workflow would generate the renditions again when the asset gets downloaded.

The configurations to be made on the assets and sites instances, the users and groups to be defined and linked for managing permissions across instances is all detailed in the reference documentation on connected assets available at

https://helpx.adobe.com/experience-manager/6-5/assets/using/use-assets-across-connected-assets-instances.html

After going through this feature and the reference documentation, we found that we had a case to leverage this feature. But there were catches that would prevent us from using it. The first of it is the condition that the assets instance must be AMS hosted

Connecting sites instance to on premise assets instance

The reference documentation specifically mentions that the assets instance in the connected assets architecture must be a hosted instance on AMS. This is one of the pre-requisite for using connected assets.

But all our AEM implementations had on premise AEM installations. Not only were there no clients we have, who had AMS hosted AEM setup, none had any plan of migrating to AMS in their roadmap.

So we got curious. Here is a feature that we potentially had immediate need for, but couldn’t be leveraged as its not available for on premise hosted AEM installations.

But what stops us from trying the feature out on locally hosted instances. And below is the configuration we tried.

We made the configurations required and found that the two instances got connected and we could search for assets in (local hosted remote) assets instance from the sites instance. But soon we realized the catch…

The connected assets configuration includes providing the credentials of the service user used to connect the sites instance to the assets instance. But when we tried to search for remote assets, it prompted with a login screen of the assets instance. On authenticating (manually) connected assets works perfectly fine.

But why is it prompting for the user login when we have already configured the credentials in the configuration. Shouldn’t it be using that credentials and connect automatically without prompting for login again (You would not even realize this if you try this configuration with admin credentials everywhere)

Digging further, the reasons are obvious.

Site instance to Assets instance authentication depends on Adobe Identity Management Service and hence the dependency to host Assets instance on AMS.

We further found that there is an ACS Commons tool – Remote assets (https://adobe-consulting-services.github.io/acs-aem-commons/features/remote-assets/index.html) which uses a different approach for syncing assets, but would allow us to connect a sites instance to an on premise hosted assets instance.

Connecting sites instance to assets publish instance

Another use case we wanted to try out is that of assets AEM instance type. The reference documentation talks only of connecting a sites Author instance to an assets Author instance.

Sites author instance makes perfect sense as we will only be searching for and pulling in assets from remote assets instance only when authoring

But for the assets instance, we had scenarios where the assets would be in draft state and run through an approval workflow before it gets published. And we want sites to use only approved (and hence published) assets.

So we need a way to connect a sites instance to assets publish instance. Below is the configuration we tried for this.

This apparently failed. The components and services needed for connected assets functionality is enabled only on Author instance. So there is no way (at least without customization) to use connected assets feature to connect to the Assets instance in publish mode.

And this is understandable. We do not want multiple site author instances to connect to a publish instance and perform heavy operations on it like firing search queries and downloading asset binaries.

Also we could see the cases that for pure assets instance (that AEM instance which hosts only assets and serves no other functionality) may not need a publisher pair.

But the use case we had for letting sites use only published assets from the assets master instance is also valid. Interestingly, we found that the ACS Commons tool – Remote assets supports connecting to and downloading assets from publish instances.

Connecting sites instance to multiple assets instances

This is one other use case that we felt is commonplace. We wanted to try out a configuration like the one shown below:

But the connected assets provide for connecting to only a single assets instance.

The only workaround we found for this is to change the connected assets configuration (an admin function) to point to different assets instance to download assets from them – a hardly useful option.

The ACS Commons tool – Remote assets doesn’t help here either. It also provides for only connecting to a single remote instance.

Modifying the master copy

After the asset gets downloaded to the sites instance, changes to the original asset does not get sync'd with its downloaded copy - one of the key limitations to be aware of when using the connected assets feature.

The downloaded copy can be edited on the local sites instance (though its recommended to maintain it as read only) and these edits stays in the sites instance.

So what happens when you try to use the same asset again from remote (that you had earlier downloaded, edited and used in a few pages)...

We tried and found that the previously downloaded and edited asset gets wiped out from local and the new copy gets downloaded. So be cautious when you are using the same remote asset in multiple places.

If the asset and/or its metadata in the remote has changed in the mean time or you have made some changes on the already downloaded asset - you might be in for some surprise.

Conclusion

We found connected assets to be a very compelling feature, but the restrictions with which it has come out limits the use cases for which it can be applied. We are hopeful that some of the limitations would be addressed in future releases.

Thursday, June 27, 2019

Why an item fails to get replicated?

There are many reasons why replication of an item fails and the replication queue gets built up. Some of the common reasons to check for are

Agent configuration
Network issue
Permissions issue
Missing namespaces / node types
Oak conflicts

Agent configuration

The reason for replication not working could simply be because the agent configuration is wrong.

If you are doing the configuration for the first time or making any changes to the configuration, make sure to perform ‘Test connection’ check to make sure that there is no inadvertent error in the configuration.

Network issue

Replication failures are often caused by network issues. It could be a temporary glitch in which case the queue items gets cleared once the network is restored.

A java.net.ConnectException in the logs of the sender AEM instance is an indication of network issue preventing replication from being successful.

You could see messages like

Error while sending request: java.net.ConnectException: Connection refused: connect

On the sender side error.log file.

Permissions issue

Replication could fail if the transport user does not have write permission for the replicated content on the target instance.

The issue occurs when the replication process installs the replicated content on the receiver end. You would see error message indicating ‘Access denied’ in the logs.

Error message would read as

com.day.cq.replication.ReplicationException: Repository error during node import: Access denied.

On the sender and receiver side error.log file.

Missing namespaces / node types

The custom namespaces and node types that are created must be created on the publish side as well.

Typically, the namespaces and node types creation should be included in the code package so that it gets installed on all instances.

If it gets missed out and if the item replicated used custom node types or name spaces, replication could fail.

This can be detected by looking for “Invalid namespace prefix” and “Invalid node type” messages in the error.log

Oak conflicts

Replication might fail due to conflicts when writing content to the oak repository.

This is more likely to happen on instances configured with Mongo repository when more then on instance shares a common repository.

Look for messages with “Unresolved conflicts” to identify issues due to oak conflicts

Closing Note

The above are only some of the main reasons for replication failure. It is not an exhaustive list of all the reasons for replication failures.

For any replication issue, perform these three basic checks

Perform test connection to make sure connectivity is not an issue
Check the error.log on the sender side. This would almost always give you the cause of the issue
Check the error.log on the receiver side for more details on the cause of the issue.

Poison messages on Replication Queues

So what does it take for a replication queue to get blocked. Well… Just one bad item on the queue.

Yes. One item that has issue would completely block the queue.

This is because of the way the items in the queue gets processed. The items in the queue are ordered and are processed strictly in the first-in-first-out (FIFO) order.

This order needs to be preserved to make sure that there are no overlapping writes of the content and the data integrity is preserved.

So what happens when a single item could not be processed?

Simply it will not get removed from the replication queue and will continue to remain at the head of the queue. Meaning it will continue to remain as the next item to be processed. When the retry happens, this item again fails thus continuing to remain at the head of the queue forever.

Unless this item gets processed and removed from the queue, the next items will not get a chance to get processed

This can be checked by looking at the queue item details in JCR. Look under the path /var/eventing/jobs/assigned in JCR to locate the queue item.

The first item in a blocked queue would have undergone multiple retries

While all the subsequent items would have retry count as 0, indicating the processing has not happened for it

Replication under the hood – What happens when you click activate?

When we click the activate button, we know that the item activated gets replicated to all the publish instances that have an active replication agent configured.

The status of the replication is reflected as yellow icon while the replication is in progress which subsequently turns green or red depending on the final replication status.

But what happens internally during this process?

A sequence of steps happens on the sender side before an item gets placed in the replication queue of each applicable replication agent after which the content gets transferred to the receiver where it gets processed to complete the replication.

At a high level, the following are the steps that are performed

Version creation
Activation process on sender and placement of queue item
Transfer of replicated content to receiver
Processing replication on receiver
Status update and retry if applicable

The high level flow is depicted in the below diagram

Version creation

The first step performed in activation is the creation of a frozen version of the content being replicated. This ensures that subsequent edits to the content while the replication is in progress do not impact the content being replicated.

The frozen version thus created gets attached to the replication process and it’s this frozen content that would get replicated.

Activation process on sender

When the activation process kicks in, a sequence of steps happens and results in an item getting placed in the replication queue of each associated agents. These steps are

Configuration Collation

One or more replication agents could be configured as active depending on the no. of publish instances to replicate to. The configuration of these replication agents could defer to the extent that the content to be replicated for a given activation could be different. For this reason, the replication package is created separately for each of the replication agent configured.

The first step in the activation process on the sender, is the identification of all the active replication agents that are configured. For each of the agent identified, its configuration is collated and kept as a ReplicationOptions object.

Permission check

Then a check is performed to validate if the user performing the activate action has the required permission to replicate the content that are selected for activation. Only the nodes that the activating user has replication permission on gets included for replication.

Preprocessing

After collating the replication agent configuration and performing the permission check, all the preprocessors (optional, if there are any), gets applied.

A custom preprocessor can be implemented by creating a class that implements the Preprocessor interface. Providing an implementation for its preprocess method in this class and configuring it as a service will register it as a preprocessor.

All the preprocessors that are thus configured gets called at this stage before proceeding to the next step.

Replication package creation

After applying all the preprocessors, the next step is the creation of the replication package for the activated item.

A different replication package gets created for each replication agent depending on its configuration (based on the ReplicationOptions object created in the previous steps) and using the serializer as per the serialization type configured.

The replication package contains all the information needed for replication process to complete on the receiver end.

Queuing (Persist in JCR)

The created replication package gets persisted in JCR along with other metadata information like the item on which the activation is performed, type of action, user id, and so on. It gets stored in JCR under the node /var/eventing/jobs.

Once the replication package gets persisted in JCR, the activation process steps on the sender side is complete and is ready for transport over the network on to the receiver.

Also at this stage, the item is visible in the queue associated with the replication agent. The items in the queue are shown by querying for pending items under /var/eventing/jobs for the queue id associated with that replication agent.

Transfer to receiver

The responsibility of transferring the replication package over the network lies with the sender AEM instance. The sling job monitors for items that becomes available for replication and kicks off the process to transfer it to the receiver.

The sling job for a queue is synchronous. It processes the first item in the queue and only when its complete, it picks up the next item for processing.

Processing on receiver

The listener on the receiving end on receiving a replication package, performs deserialization of the received package and installs it to get the content replicated on to itself.

A success status is sent back if all the steps are successful on the receiver side. If any of the step fails, a failure status is returned prompting the sender to retry

Status update and retry

After transferring the replication package, the sling job on the sender waits for the response from the receiver indicating the status of the replication at the receiving end.

If the response is successful, it removes the item from the queue (deletes the item from JCR), marking the status of replication as successful.

In case the replication is not successful on the receiving end, the item on the queue is retained for reprocessing. The sling job then waits for the ‘Retry Delay’ duration to elapse before retrying to send the item again to the receiver.

Friday, June 21, 2019

Applying “Binary less” Replication

When configuring replication agents, one of the options for the serialization type setting is “Binary less”.

We will see what a “Binary less” replication mean, how it behaves, the use cases where its applicable and an approach that can be used for configuring it in this blog.

What is Binary less replication?

Binary less replication means that the binaries are left out from the content being replicated. When replicating an asset for example, only the metadata of the asset gets replicated. The binaries of the asset which comprises of the original asset and all its renditions are not included in the replicated content.

When can it be used?

Binary less replication is useful when multiple AEM instances are sharing a common datastore. The binaries are shared through the common datastore and hence there is no need to replicate them to all the instances.

How it works?

It’s important to understand the way binary less replication works to properly configure it for your scenario.

While creating a replication package for the content being replicated, the binaries are replaced with their hash code reference. The package with hash code references for the binaries is then sent to the receiver. When the receiver installs the received replication package, it tries to resolve this hash code reference to the binary in its datastore.

Since the receiver shares the same datastore as that of the sender, it resolves the hash code reference to the actual binary in the datastore and links it where the hash code references are included in the replicated content.

If the datastore is not configured properly or for some reason, the receiver is not able to resolve the binary based on the hash code reference, it falls back to the default replication mode and redoes the replication including the binaries in the replication package.

The overall replication does not fail in this case. The status if the binary less replication was successful can be checked in the logs.

Check for log statements with text patterns “FAILED PATHS START”, “FAILED PATHS END” for details on failed binary less replications and text pattern “set using a reference” for successful binary less replication.

Use cases of Binary less replication

Binary less replication is useful in setups using shared datastore across instances. One common use case is when all AEM instances uses a common datastore (author and all publishers), the replication agents from author to all the publishers can be configured with binary less serialization type as the asset upload to the author would place the binaries in the datastore - which is shared by all the publish instances as well.

Special cases - Approach for configuring Binary less replication

Replication configuration should be carefully thought-out when we have setups where all instances do not share the same datastore, but instead distinct groups of servers have shared datastore.

Some common setup configurations where this applies are

A shared datastore for all publishers, but a separate data store for author
Separate shared datastore for Primary and DR environments

Let us take the case 1 where the author has a separate datastore and all publish instances share a common datastore.

This configuration is illustrated below

In this case, configuring binary less for replication to all publish instances from author would cause the replication to fail and fallback to default replication in an ad-hoc manner. The asynchronous nature of replication to all publishers simultaneously causes the no. of failed binary less replication to be high.

To overcome this situation, its ideal to designate one publish instance as a Gateway instance in such scenarios. Configure author to perform default replication to this gateway publish instance. This step would make sure that the binary gets replicated to the single publish instance and gets persisted in its datastore (which is also shared by other publish instances)

Now configure the gateway instance to chain replicate the content to the other publisher instances in binary less mode. Chain replication starts after the successful install of the content on the gateway instance. This ensures that the binaries are replicated and persisted in the shared datastore through the gateway instance before the binary less replication kicks in for replicating content to other instances in the cluster.

This configuration is illustrated in the below diagram

In cases where the setup includes a DR environment with a separate shared datastore for publish instances with-in the DR, the same configuration can be replicated by designating one instance in DR as gateway.

The author in this case should be configured to perform default replication to the gateway instances of both the Primary and DR cluster. The gateway instances can then chain replicate the content to other instances within its cluster.

Limitations with this approach:

Using this approach discussed in the above section to leverage binary less replication through gateway instances does come with a few limitations that we need to be aware of and plan for

Introduces delay in replication completion

It introduces delay in the replication completion. Also during the interval between replication to gateway and the completion of replication, the content between the gateway instance and other instances within a cluster would be out of sync.

Usually this gets completed in few seconds but could go higher depending on the load on the system and the no. of concurrent replications performed.

In cases where content sync mismatch for even such short duration is not acceptable, the gateway could be removed out of the instances serving the content to the end users. This way only the other publish instances which have their content in near real-time sync would be serving content for the end users.

Replication status reflected on author:

Note that with this approach, the status turns green (stating replication successful) as soon as the replication to the gateway instance is successful. This could send a wrong signal to the content publishers, especially when there is delay or issues encountered in chain replication.

Gateway instance failure

Another aspect that must be planned for with the above configuration is in the event of gateway instance failures.

When a gateway instance fails for some reason, another instance within the cluster should be promoted as the new gateway with replication agents on the author and the new gateway reconfigured to have a working setup. Be ready with scripts to reconfigure the instances which can be run in the event of a gateway failure.

Wednesday, June 12, 2019

Steps to plan and perform AEM upgrade – Part 5

Go to: Part 1 | Part 2 | Part 3 | Part 4

Step 14: Perform the upgrade

Finally, here you go… Go-ahead and execute the migration as you have defined in the migration document in step 7 and validated on a test environment in step 8, 9.

Do not experiment or take any deviation when running the migration in production, especially when you are handling a complex installation with large repository and voluminous content.

Stick to the tested process, making sure you are not skipping or missing out on anything.

If something goes wrong or you get into a scenario which did not occur in the test configuration, assess the situation and kick off the rollback process if needed.

And if everything goes fine, well your migration gets complete and you are ready to expose it to the world!!!

Just hold on... not before its tested and certified

Step 15: Test and certify

Make sure your application is sanity tested after the migration. Certify that the application is working fine and open it up to the world.

Step 16: The new normal

Embrace the new normal. Adopt the changes in development, operations and dev ops processes. And somewhere down the line, you will be in for another upgrade!!!

Disclaimer

This is a generalization based on my experience with handling AEM upgrades. Please do check the official documentation for upgrade available (it’s at https://helpx.adobe.com/experience-manager/6-5/sites/deploying/using/upgrade.html for AEM6.5) which contains version specific instructions and points to check for performing the upgrade

Go to: Part 1 | Part 2 | Part 3 | Part 4

Steps to plan and perform AEM upgrade – Part 4

Go to: Part 1 | Part 2 | Part 3 | Part 5

Step 11: Train all stakeholders to get them ready

Training is an often overlooked aspect of migration. Make sure you do not miss this important step and cover the training for all the stakeholders, including

Business team who use AEM day to day for authoring and publishing
Developers and the support team who customize, fix issues that arise and build new features
Administrators and the operations team who maintain and manage the environments

The training could even be a self-training where the stakeholders explore the newer version on their own with the aid of documentation or video training materials available. Or it could be a classroom training conducted by a professional services team.

Whatever be the mode of training, make sure to cover this important step. This helps to bring every one on-board and get them excited about the new version you would soon be switching to.

Step 12: Finalize the date(s) for performing the migration

You might have had a target period for completing the migration even before the start of this migration exercise.

One of your stated objectives could be say ‘the migration has to be completed before the end of the mentioned year / quarter” or “before the beginning of a peak season”.

But as you complete your preparation and testing and are nearing the final migration, it is very critical to fix specific dates during which you will carry out the migration.

If your scenario involves putting an authoring freeze during the period of migration, finalize the authoring freeze period. You may want to bring down the web server or disable the gateway to the authoring instance during this period to avoid accidental changes to the content.

Take care to choose a non-peak day and off peak hours to run the final migration impacting end customers. Inform all stakeholders, and declare your downtimes if needed. Plan to bring up your application maintenance page during this period.

Step 13: Socialize your fallback options

Things do go wrong and unexpected does happen at times. Keep all stakeholders appraised of the various fallback options you have in place.

Especially keep all the key decision makers informed so that an informed decision can be taken quickly, if something goes wrong during the execution.

By now you have covered all bases, prepared and fully ready for the final migration. The final part of this series has the steps for the final migration run.

Go to: Part 1 | Part 2 | Part 3 | Part 5

Steps to plan and perform AEM upgrade – Part 3

Go to: Part 1 | Part 2 | Part 4 | Part 5

Step 8: Perform test migration and validate

With the preparation done, it’s time to perform a test migration.

For direct instance migration, cloning the production instance and performing the test migration on the cloned instance is preferred.

For fresh install, plan to sync content as closely with production as possible when performing test migration.

Execute the process as documented in the previous preparation step. Follow the document religiously, updating the document if required to reflect actual execution.

Cover all aspects of migration starting with the pre-migration steps and following through to simulate cutover. Do not miss to simulate rollback scenario so that you are prepared for the worst case if the need arises when migrating your production.

Make sure to measure and validate the following in the test migration

All the objectives set for the migration are achieved
The application functions and performs as expected in the target environment
You would be able to meet the downtime limits agreed for the migration

Step 9: Tweak the process and repeat test migration (Optional)

This optional step can be applied if required. If you are not able to meet any of your set objectives or if any change is needed in the migration execution, make sure to update the migration document and repeat the test migration starting from the beginning.

You may end up in situation where you have to go back to preparation step (step 7) to cover for any gaps identified and come back to step 8 to redo the test migration.

It’s essential to make sure a full end-to-end cycle of migration is performed in the test environment to be absolutely sure of repeating the same in production.

By the end of this exercise, you should have performed an end-to-end test migration and have a proven migration document that can be followed for doing the migration in production.

Step 10: Redefine the dev, ops and devops processes

After the successful test migration, you might want to start aligning the development, operations and dev ops processes for the target AEM version.

Also start planning the migration for your lower environments (Dev, stage, …) to the new version. Update the maintenance procedures for these lower environments for the new AEM version.

Development can also shift to the new version and sub-sequent releases can be planned for the new version. Dev ops process can now be updated for the new AEM version.

With the test migration successfully completed, we can jump into preparing for the final migration run detailed in the next part

Go to: Part 1 | Part 2 | Part 4 | Part 5

Steps to plan and perform AEM upgrade – Part 2

Go to: Part 1 | Part 3 | Part 4 | Part 5

Step 4: Define the target topology

Based on the objectives for the upgrade and the current state assessment, define the target topology to migrate to the target version.

This topology defines the deployment architecture of the target environment with all components included (author, publisher, dispatcher, integration systems and third party systems).

Validate the infrastructure, OS and JRE requirements for the target version to migrate to and arrive at the hardware, OS and software update requirements for the target environment

Step 5: Agree on the migration and cutover downtime

It is extremely critical to agree with all stakeholders on the downtime that the upgrade would need. The approach taken for migration would be influenced by the downtime that can be taken up.

Also check if authoring freeze can be brought in during the migration window so that new content is not getting created when migration is happening.

Applications handling UGC content brings in additional complexity, which has to be factored in for migration

Step 6: Finalize the approach for upgrade

Based on the objectives, current state assessment and the target topology, finalize the approach for upgrade.

Basically there are two approaches to choose from for migration

Direct instance upgrade or In-place upgrade where the current instance with all the contents in it is migrated to the target version
New installation in which a fresh install of the target version is done with content extracted from the current instance and migrated over to the target instance

Choose which approach to adopt for your upgrade.

If the upgrade involves cleaning up of significant part of the content or change in deployment architecture (like moving from TarMK to MongoMK or vice versa), the new installation approach would be appropriate.

On the other hand, if you need to retain all revision history, audit logs and when cleanup is required, direct instance upgrade would be appropriate.

Step 7: Prepare for the migration

With the approach for migration decided, now it’s time to build the pieces that are needed to do the migration. This would involve

Getting the infrastructure for the target state ready, with OS and JRE matching the requirements for the target state. Some spare capacity is required to setup the environments for testing the migration
Updating if required and making sure the application code builds and deploys to the new environment and preferably does not use any deprecated APIs
Keeping ready all the maintenance processes (cleanups, compaction…) and backup scripts, to be used before the migration
In case where content cleanup is required, device an approach to identify active content and eliminate stale content
Creating scripts and commands to extract and sync content to target environment
Make ready the scripts for creating the replication agents and dispatcher flush agents for the target environment
Validate the authentication configuration and authorization mechanism in the new version
Create the test plans to validate the migration. Test plan to cover all the functional and non-functional aspects of the application
Following points apply specifically for the fresh install
Check the customizations done directly on the AEM and move those to code and scripts where possible. Create document & checklists for performing custom configuration in the target environment for items that cannot be moved to code / scripts
Check the indexes in use in the current application and have the configuration ready to create them in the target environment
In case where users need to be migrated, create sync scripts to migrate users and their ACLs
Work out the details for performing the migration on different components (Author, publish and dispatcher)

By the end of this exercise, create a process document and checklists for executing the migration.

Make it comprehensive and cover all aspects of migration including the pre-migration steps (running the maintenance procedures, taking backups), migration activities (executing the actual migration), post migration activities (validation, cut over steps, and covering changes in ops and dev ops processes post migration) and fall back procedures (rollback option).

This aids the execution process greatly and ensures none of the steps gets missed out. Also these documents help bring clarity to all stakeholders during execution.

With the preparation compete, we can jump in to do a test migration on a lower environment as detailed in the next part.

Go to: Part 1 | Part 3 | Part 4 | Part 5

Steps to plan and perform AEM upgrade – Part 1

In this 5-part series, we will see the steps that are required to plan and perform an AEM version upgrade.

Step 1: Establish the objectives

The first step in the migration process is to clearly layout the objectives for the migration.

There could be many reasons why you might want to upgrade your AEM instance. Some of the common reasons to upgrade are

An AEM version in use go out of support and you want to be safe and remain on a supported version
You wanted to use the new features in a later release and hence need the upgrade
You wanted to leverage the enhancements and performance improvements in the later releases
You have a bloated CMS, accumulating a lot of junk content over the period and wanted to use the upgrade to cleanup and trim down your CMS
You may even be wanting to rewrite your application for a newer version and want to migrate the content over to the new version
Or it could just simply be for the reason that you are tech savvy and you don’t want to be on a stale environment or be left behind

The reasons for upgrade could be many. But the important thing to start the upgrade is to clearly establish your reasons for upgrading and their priority.

These reasons typically form the primary objectives for migration and would significantly influence the decisions taken in the subsequent steps

Secondary objectives

You might be upgrading for one or more primary reasons, but you would also want to leverage it to achieve few other objectives. List down all the other objectives that you want to achieve along with the upgrade.

Some examples of such objectives are

Migration the infrastructure to the cloud
Move the datastore to S3
Migrate to MongoMK from the current TarMK setup or vice versa
And so on…

By the end of this exercise, have a listing of all objectives to be achieved with the upgrade, classifying them into primary and secondary objectives.

This would greatly help in making the right choices in the subsequent steps of the migration

Step 2: Finalize the target version to migrate to

This might seem trivial but it’s prudent to give some consideration before finalizing the target version to migrate to.

The default choice is to migrate to the latest available version, but consider the AEMs release timeline to decide if you can wait for some time for the next release before performing the migration.

Or to be safe, you might not want to move to the version that just got released but instead wanted to move to the latest – 1 version so that the stability is guaranteed.

Step 3: Assess the current state

Once you have established the objectives for the upgrade, the next step is to analyze the current state and document it (or update your documentation if there is already one present).

The current state document should include all the details relevant for migration, including

The topology of the deployment – including all components (Author, Publish, Dispatcher, integrations, third party systems, …) and their specifications
Current infrastructure details
Operating systems and JRE versions the current environment is on
The size of your current repository, and how recently / frequently the compaction and maintenance jobs are run
The sanity and validity of the content accumulated – How much of your current content is junk?
A listing of OOB components and services used by the application
Details of all the custom components and services deployed. How frequently do they change? Is there any development work in progress? What is the release cycle followed? When is the next release planned?
Document all the customizations done on the AEM directly (OSGi configurations, Workflow customizations, Overlays, User/User Groups, ACLs,…)
Document all the indexes used (OOB or custom indexes created). Newer versions do not come with indexes pre-defined and we will have to plan creating all missing indexes when migrating
Document the users and usage volume – No. of users, frequency of access and authentication, authorization mechanisms
Does the application generate UGC content? How much is the volume of content created each day?
Detail of integrating systems and integration approach followed by each. Are the regular feeds into AEM or going out of AEM?
Constraints and limitations in the current setup that you would like to be addressed in the target environment

With the objectives established and detailed analysis done on the current state, we can jump in to define the target state on the decided target AEM version. The next part covers this aspect

Go to: Part 2 | Part 3 | Part 4 | Part 5

Tuesday, June 11, 2019

AEM Upgrade – Pattern Detector

The pattern detector tool in AEM helps in upfront identification of compatibility issues in upgrading AEM version from 6.x lower versions to 6.4 or higher.

It is available as a separate package and can be installed on the source AEM instance to detect violations in upgrading to the target version.

This tool is available as an AEM package at https://www.adobeaemcloud.com/content/marketplace/marketplaceProxy.html?packagePath=/content/companies/public/adobe/packages/cq650/compatpack/pd-all-aem65

Download and install this tool in the source AEM instance that needs to be upgraded.

To run this tool simply access the Pattern Detector option under Status in AEM system console (or) directly navigate to the url http://localhost:4502/system/console/status-pattern-detector

Accessing this URL executes the tool and returns the result highlighting the suspected violations and compatibility issues in migrating to the target version.

The text and json format of the output can be downloaded by accessing the URLs

http://localhost:4502/system/console/status-pattern-detector.txt and

http://localhost:4502/system/console/status-pattern-detector.json

respectively.

This tool helps with

Performing a pre-assessment of source repository for upgrade / migration, which provides insights for evolving the right strategy for migration
Helps in quantifying and estimating the effort involved in the migration

References:

https://helpx.adobe.com/experience-manager/6-5/sites/deploying/using/pattern-detector.html

Understanding the AEM release types, release cycle, support period and defining your upgrade cycle

Understanding the different release types of AEM, their release cycle and their support period is important to plan upfront for upgrading your AEM instance to higher versions (major, minor and patches). You should probably create an upgrade cycle suitable for your needs to match with the AEM release cycle

Release types

Full release

A full release is done every year. This release is often referred to as the main release and could be a major release (version change form 5.x.y to 6.x.y, so on) or a minor release (version change from 6.1.x to 6.2.x, so on).

A full release includes many new features and improvements and is delivered as installable jar file. The migration path from lower versions into this version is defined and recommended for each full release as part of the release notes.

Service pack

For each full release, service pack is released every quarter. Service pack is released for a full release version till it remains in the core support period (more on support period below). Post this period no fixes or enhancements are released for that version. Service packs gets released as AEM package in package share.

A service pack may depend on other feature packs or service packs to be installed before applying it. Please read the release notes and installation instructions carefully before applying it

Feature pack

They typically include enhancements and new features (functionalities that typically are meant for next full release but are made available earlier). There are no defined timelines for feature pack releases.

A feature pack may have dependency on other feature pack and service pack to be installed before applying it.

Hot fix

Released on need basis to address critical issues in the product. Each hot fix package typically addresses a specific issue.

Hot fixes are typically quickly made fix packs based on customer complaint and are not thoroughly QA tested. Be cognizant of this fact when applying a hot fix package.

If the issue addressed by the hot fix package is not applicable for your case, you could choose to wait till the cumulative fix pack (which is better QA tested) including the fix becomes available instead of applying the hot fix immediately.

Cumulative fix pack

The cumulative fix packs or CFPs are released every month and includes all hot fixes applicable. It may also include some feature packs. They are independent and does not depend on other hot fixes or feature packs or previous versions of CFPs for its installation.

But a CFP is released for the latest service pack and its essential to have the dependent service pack installed before installing a cumulative fix pack.

Release Cycle

It is advisable to keep track of the timelines for following releases from Adobe for AEM

Full Release – Released annually typically in the month of Apr
Service Pack – Release quarterly in the last month of every quarter (Jun, Sep, Dec, Mar)
Cumulative Fix Pack – Released monthly for the latest service pack

You could choose to watch out for feature packs and hot fixes based on your needs but keeping a track on full release, service pack and CFP releases and upgrading your AEM instance periodically with these is essential for proper maintenance of AEM instance.

Support Period

AEM support is defined based on the full release version and the support for all service packs, CFPs and feature packs of this version ceases to apply when the support for the full release version ends.

Support for a full release version falls into 3 support periods

Core support for a period of 3 years from the date of release
An extended support for an additional 2 years
An official self-service support for 1 year beyond this (through online self-help mechanism)

An AEM full release goes out of support after 6 years from the date of release of the version (Note: not from the date of your purchase, a reason why you should always start with the latest version) of which only the first 5 years is available with Adobe professional support.

Given that the initial implementation or migration to an AEM release takes anywhere from a month to a year (for a reasonably sized project) and providing for some safe zone before the AEM version goes out of support, we would need to plan for upgrade at least every 3 to 4 years if not more frequently.

Upgrade strategy

Devising an upgrade strategy and putting in place a process for upgrade is essential for maintaining an AEM installation over a long run.

The upgrade strategy for AEM falls under two categories

Regular upkeep with service packs, CFPs, feature packs and hot fixes
Repository migration for main release change

Regular upkeep

This does not involve migration of the repository. Device a strategy for dealing with each of type of release packs

Hot fixes

The criticality of applying a hot fix depends on specific scenario. If there is a product bug surfacing in your environment, you would have already got in touch with the Adobe support team and based on their advice plan to install the required hot fixes.

Feature pack

This again depends on your case of how urgently the feature or enhancement addressed in the feature pack is needed for your application. Plan for installing feature packs based on your need.

Service Pack

It is desirable to keep your AEM instance updated of the latest service pack. Apart from brining in the new features, enhancements and bug fixes, it would make sure that your environment is updated and ready for applying future feature packs, CFPs and hot fixes when needed

Cumulative fix pack

Weigh in on the need to apply a CFP depending of the issues it addresses vis-a-vis the overhead of applying it. Remember applying the latest CFP would bring in the fixes addressed in all the previous CFPs applicable.

Device a regular upkeep strategy based on your needs. Define the process and put it in plan.

A general best practice is to keep the AEM environment updated of the latest service pack so that it periodically brings in the benefits of the fixes, features and enhancements made available.

Repository Migration

Apart from many other reasons to migrate to the higher versions, AEM support model forces periodic migration of the repository to newer AEM versions. Depending on your application scenario, you could choose to migrate to

Every new full release
Skip ‘x’ no. of full releases before migrating

With every full release, AEM defines the migration path from all the previous versions within the core support period into the current release. So it becomes extra safe to migrate while your AEM version is in the core support period.

Be cautious of the complexities involved in repository migration as applicable to your case. Repository migration most often involves code updates and content migration and is not in any way a trivial exercise.

Best Practices

Some of the best practices to keep in mind when performing upgrades

Never perform an upgrade directly in production without testing it in lower environments
When testing in lower environment, sync its content as closely as possible with production
Always take a backup of the AEM instance before applying the upgrade
Perform the upgrade both on Author and Publish instances in lower environment and test them
Rebuild the application with the new uber.jar version and other dependencies as applicable and deploy it to the upgraded environment for testing
Finally, thoroughly test the application to make sure that the application meets the functional and non-functional criteria with the upgrades applied