DuplicateContentCheck
Duplicate content check microservice is used to classify input messages into duplicate and unique based on the content in the input XML in the XPath provided using CPS. Depending on Cache Duration Time and Sleep Interval, input messages are retained in cache and sent on sleep intervals.
Configuration
Figure 1: Component Property Sheet
Validate Input
- yes: The microservice tries to validate the input received.
no: The microservice will not validate the input and hence the performance increases.
Setting this to "no" may cause undesired results if the input XML is not valid.
Error handling configuration
Allows configuring actions to be taken when an exception occurs during the execution. Following are the configuration options:
- JMS Error
- Response Generation Error
- Request Processing Error
- Invalid Request Error
For descriptions, refer to the Error Handling section in the Common Configurations page.
Input Source
The source type that has to be taken as input. Choose one of the below:
- XML
- Text
Configure schema of input
Using schema editor, load/enter the XML schema for the input message. This property will be visible if the input source is chosen as XML.
Figure 2: Configure input schema
Sleep Interval
The time gap (in milliseconds) with which duplicate messages are throttled through the Duplicate port.
Cache Duration Time
Time duration for which message should be retained in cache. If in case cache is not empty and duration exceeds, then cache is not cleaned until all messages are throttled to duplicate ports.
Cache Duration Time Unit
The unit of the cache duration time can be set here. While the default unit is milliseconds, other options are:
- Seconds
- Minutes
- Hours
- Days
Duplicate Identifier Source
The message is checked if it is duplicate of the earlier message based on the parameters below:
- Body (Text or XML)
- Header
XPath
Choose XPath from which content should be extracted. This property will be visible if the Duplicate Identifier Source is selected as BODY.
Figure 3: Configure XPath
Property Name
Specify the property name whose value will be used to check if the message is duplicate of the earlier message. This property will be visible if the Duplicate Identifier Source is selected as HEADER.
Refer to the XPath Editor section in the Message Body XPath selector page to know more.
Functional Demonstration
Scenario 1
Configure component as shown in Figure 1. Send input messages using Feeder. It can be observed that messages with the same "Author" are classified into duplicate and unique and sent to the display microservices connected to the respective ports.
Figure 4: Sample Event Process
Sample Input
Send 10 messages with same "Author" from Feeder.
Figure 5: Sending messages with same "Author" using Feeder
Sample Output
Unique port
The display window for unique port shows only 1 message which is unique since cache has no messages at present.
Figure 6: Unique Port display window showing the unique message
Duplicate port
The display window for duplicate port shows 9 messages since they have the same Author as the first message
Figure 7: Duplicate Port display window showing messages having the same Author as the first message
When the component is restarted, since some acknowledgments may not be received, redelivery of messages takes place. Such messages will be sent to the duplicate port.
To reduce the number of duplicate messages, set the Acknowledgement mode to 'Auto' in the input port of the microservice by clicking the DuplicateContentCheck input port and select "Auto" from the Acknowledge Mode property under Properties > Messaging.
Figure 8: Setting the Acknowledge mode to "Auto" to reduce the number of duplicate messages
Useful Tips
- CPS does not accept negative values for sleep interval and cache duration. Only if values entered are non-negative Validate button press will pop-up Success message.
- If Cache time is zero, then component throttles messages to Unique port with no gap (sleep interval), irrespective of sleep interval being non-zero value.
- If sleep interval is zero, then cache is cleared as per cache duration time. While clearing cache, no messages are lost.