Having the dispatcher flush agents on publisher instead of
on the author has its benefits as detailed here. But configuring the dispatcher
flush agents on the publisher has to be carefully thought though and made
suitable for your environment.
The mapping between the dispatcher and publisher plays a
crucial role in this design. Dispatcher to publisher relationship could be
- One to one
- One to many
One to one configuration is straight forward. All that needs
to be done in this case is to configure the flush agent for the dispatcher on
that one publish instance it is mapped to. This would make sure that for any
resource that gets replicated to that publish instance, a flush request is sent
to its mapped dispatcher and the resource requested gets flushed from the dispatcher cache
But the one to many mapping configuration poses issues that
needs to be resolved as per the application requirements.
The key question to answer is when one dispatcher is mapped to
multiple publishers how many and which publisher(s) should invalidate that dispatchers
cache? Can many publishers flush the cache of single dispatcher? Would this configuration lead to some form of race condition due to asynchronous nature of replication
and cache invalidation? These questions needs to be carefully analyzed based on
the application scenarios.
Typically, the single
dispatcher to multiple publisher mapping configuration falls into one of the below two
categories
- Aggregation configuration where a single dispatcher aggregates the requests flow to two or more publishers
- Composition configuration where each dispatcher in the environment is connected to all publishers in the environment
Aggregation Configuration
In this configuration, dispatcher acts as an aggregator for
two or more publish instances. A simple form of aggregation configuration is
depicted in the diagram below
In this configuration, an optimal solution would be to
configure the flush agents for a dispatcher on all the publish instances that
the dispatcher aggregates. For the above configuration configure flush agents for D1 on
both P1 and P2 and configure the flush agents for D2 on both P3 and
P4.
This would result in duplicate flushing of cached content on
dispatcher but would avoid race condition from occurring.
A worst case scenario
that could occur in this configuration on activation of a resource R1 would
be the following sequence
- Replication requests for R1 gets placed for P1 & P2
- R1 gets replicated to P1, replication to P2 gets delayed
- P1 flushes R1 from D1
- User requests R1 from D1
- D1 does not have R1 in cache, goes to P2 to fetch R1
- P2 serves older version of R1 as replication has not happened on P2 yet
- D1 caches older version of R1 again and serves it as response to user request
- Now replication of R1 to P2 happens
- At this stage P2 flushes R1 from D1 – the older version that got cached gets flushed
- A subsequent user request for R1 will now cache new version of R1 as both P1 and P2 have newer version of R1 at this stage
Though this configuration results in duplicate cache
flushing from multiple publishers on to a single dispatcher, it makes sure that stale content does not live longer on dispatcher
cache due to race conditions
Composition configuration
In this configuration, dispatchers and publishers are mapped
in many to many configuration. It has advantages that a dispatcher can load
balance across multiple publishers and the same publisher can be the renderer
for multiple dispatchers thus providing maximum fault tolerance.
A simple form
of this configuration is depicted below
In this configuration, each dispatcher is connected to all
the publishers. Another simple and most common configuration depicted below,
in which the dispatchers are connected to an external load balancer, with that
load balancer distributing the requests across the publish instances would
also result in the same composition mapping scenario
While it provides maximum fault tolerance, this
configuration is not optimal for handling dispatcher cache flushing especially
when it comes to applications with frequently changing content.
Options that can
be considered for dispatcher cache flush configuration are
- All publisher’s flushes cache of all dispatchers – causes network overhead and too many redundant cache flushes
- A minimal subset of publisher’s as flushing publishers for each dispatcher. This reduces the race condition though not completely avoiding it
- Have one flushing publisher for each dispatcher. This could lead to race condition
- Have flushing of dispatcher cache done from author. Simple to configure and maintain but could lead to race condition
- Purge the dispatcher cache periodically through an external mechanism (say curl fired periodically through corn)
As a best practice avoid having composition mapping
configuration. Consider the Dispatcher & Publisher combined (though one to
one mapping or aggregate configuration) as a single unit for scaling the publish
side capacity.
In case where composition configuration in unavoidable,
supplement with periodic purging of dispatcher cache through an external
mechanism stale content getting cached in rare cases do not live longer on the dispatcher.
No comments:
Post a Comment