Robert A Decker Programming Repository


Notes and articles that will reduce the pain

Simple Communication With React Components via Simple Javascript Events

Robert Decker - Monday, January 29, 2018

Here is a simple example of communication with react components through simple javascript events. I'm presenting this here because of the difficulty I had finding a good example online. Most of the online example had you firing the event on the component itself, or they were using eventing libraries or state libraries like Redux. Sometimes we just want something simple.

I'm using code like this in a Ruby on Rails 5.1 library with the react-rails gem. I have data coming in to the client from an activemq server that the client is communicating with via websockets and javascript. Some of this data is then sent to my react components to update the page.

First, here is a button on the page that fires our custom event. I fire the event on the window element, but it could be fired on a lower element in the DOM and will then bubble up to the window.




And here is our react component. It registers itself to listen to our custom event on the window element. When the event comes in it extracts the message and places it on the page.
window.ExternalEventExample = createReactClass({

    getDefaultProps: function() {
        return {title: "title from component"};
    },

    getInitialState: function() {
        return {title: this.props.title};
    },

    componentDidMount: function() {
        console.log('componentDidMount');
        window.addEventListener("sms-custom-event", this.handleSMSCustomEvent, true);
    },

    componentWillUnmount: function() {
        console.log('componentWillUnmount');
        window.removeEventListener("sms-custom-event", this.handleSMSCustomEvent, true);
    },

    handleSMSCustomEvent: function(e) {
        console.log("sms-custom-event triggered message:" + e.detail.message);
        this.setState({
            title: e.detail.message
        });
    },

    render: function() {
        return 

{this.state.title}

} })
Nice and simple!

The important methods are componentDidMount and componentWillUnmount where we register and deregister our component for the sms-custom-event Event.

Escaping emojis

Robert Decker - Thursday, November 16, 2017

While working in Java with emojis you have to deal with surrogate pair characters - these appear as one character in a UI, for example, but in the background they're actually two characters.

Java String class lets you pull out code points which can be single characters or surrogate pair characters. However, in Java 7 there's no good way to iterate through these code points (java 8 adds a codePoints method that gives you an array that you can iterate through).

With Java 7 you can use a Character BreakIterator to iterate over each character, letting you extract plain characters and surrogate pairs. In the following code I escape the surrogate pairs into html entities.
		StringBuffer message = new StringBuffer();
		String str = "🤯😂ab春♞aáéí";
		BreakIterator ci = BreakIterator.getCharacterInstance(java.util.Locale.ENGLISH);
		ci.setText(str);
		int start = ci.first();
		for (int end = ci.next(); end != BreakIterator.DONE; start = end, end = ci.next()) {
			message.append(end - start >= 2 ? "&#" + str.codePointAt(start) + ";" : str.charAt(start));
		}
		_log.debug(message.toString());
Output:
🤯😂ab春♞aáéí


On StackOverflow you'll see solutions to use the InEmoticons CharacterSet, or to search a range of characters. However I couldn't find a good combination of regexs to get all of these surrogate pairs. For example, the following misses the first emoticon above:
Pattern emoticons = Pattern.compile("\\p{InEmoticons}");
Pattern emoticons = Pattern.compile("([\\x{1F601}-\\x{1F64F}])");
You'll also see suggestions to use the EmojiParser library but again, that misses the first emoticon above.

iOS Swift Array Bogosort

Robert Decker - Saturday, November 04, 2017

Sadly iOS's Swift Array class doesn't have a bogosort function. Here is my implementation.
extension Array {
    // first, add a function to Array that returns a boolean for whether the array is sorted or not, > or <
    func isSorted(_ isOrderedBefore: (Element, Element) -> Bool) -> Bool {
        for i in 1..<self.count {
            if !isOrderedBefore(self[i-1], self[i]) {
                return false
            }
        }
        return true
    }

    // we also need a function that randomizes the array
    mutating func randomize() {
        self.sort(by: {_, _ in arc4random() % 2 == 0})
    }

    // and finally we add the bogosort function
    func bogoSort() {
        while(!arr.isSorted(<)) {
            arr.randomize()
        }
    }
}
Output:
var arr: [Int] = [0, 4, 2, 3, 5]
[0, 4, 2, 3, 5]

arr.bogoSort()
[0, 2, 3, 4, 5]

It only had to randomize the array 456 times.

Counting Bytes and Chars for sending SMS

Robert Decker - Tuesday, October 31, 2017

Here I present code snippets and information on sending SMS messages through a tier-1 SMS provider, particularly information on char/byte counting in Java and JavaScript.

1. Introduction

SMS (short messaging service) is the most common means of communication on the planet and is available on every mobile phone. By using a tier-1 SMS provider that has made agreements with mobile telephony companies in every part of the world you can communicate with every cellphone on the planet, which means 80% of people in Africa, and nearly 100% of all people everywhere else, all without developing an app that users must download.

2. SMS Message Sizes and Content

When sending SMS messages you are limited to 1120 bits per message, or if you send a multi-part SMS you must include header information that informs the handset how to stitch the message together, reducing your message body length. There are three ways in which you can send bytes in SMS - GSM which is a limited set of 7-bit characters, Binary which is 8-bit, and Unicode which is 16-bit.

(It's actually a bit more complicated than this - for example, there are different character sets in the GSM standard for different parts of the world, so you're not working with the same character set if you send to Portugal vs sending to the USA - but that's a different topic...)


Mode
bits character size
header size*
  characters
GSM
 single sms
 1120
 7-bit  0 bits
 1120/7 = 160
 multi-part  1120  7-bit  48 bits
 (1120-48)/7 = 153
 Binary
 single sms
 1120  8-bit 0 bits
 1120/8 = 140
 multi-part  1120  8-bit  48 bits
 (1120-48)/8 =
134
 Unicode
 single sms
 1120  16-bit  0 bits
1120/16 =
70
 multi-part  1120  16-bit  0 bits
(1120-48)/16 =
67
* header size can vary but 48 bits is the minimum

The GSM 7-bit character set is a subset of the ISO-8859-1 (latin1) character set, which goes from 0x00 to 0xFF in hex. In the following table the grayed-out boxes are not available in this subset character set. By using this subset we are able to assign 7 bits per character rather than 8 bits.
 ISO-8859-1 Hex Codes With Valid GSM Characters
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
                    LF     CR    
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
                               
20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
SP ! " # $ % & ' ( ) * + , - . /
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
@ A B C D E F G H I J K L M N O
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
P Q R S T U V W X Y Z         _
60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
  a b c d e f g h i j k l m n o
70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
p q r s t u v w x y z          
80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
                               
90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
                               
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
  ¡   £ ¤ ¥   §                
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
                              ¿
C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
        Æ              
D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
            Ø           ß
E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
      æ            
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF
          ø          


3. Building SMS Messages


First, we have to define some constants and enums.

ISO88591_SUBSET is a byte array of each character in the 7-bit GSM subset.

The enum SmsFormat defines the three modes of sending an SMS and their sizes, both full message size and the reduced size when sending a multi-part SMS.

java:
// these are the valid ISO-8859-1 subset characters that can be sent as 7-bit characters in a GSM SMS
public static byte[] ISO88591_SUBSET = new byte[] { 
	0x0A, 0x0D,
	0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2A, 0x2B, 0x2C, 0x2D, 0x2E, 0x2F,
	0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F,
	0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4A, 0x4B, 0x4C, 0x4D, 0x4E, 0x4F,
	0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5A, 0x5F,
	0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E, 0x6F,
	0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7A,
	(byte) 0xA1, (byte) 0xA3, (byte) 0xA4, (byte) 0xA5, (byte) 0xA7,
	(byte) 0xBF,
	(byte) 0xC4, (byte) 0xC5, (byte) 0xC6, (byte) 0xC7, (byte) 0xC9,
	(byte) 0xD1, (byte) 0xD6, (byte) 0xD8, (byte) 0xDC, (byte) 0xDF,
	(byte) 0xE0, (byte) 0xE4, (byte) 0xE5, (byte) 0xE6, (byte) 0xE8, (byte) 0xE9, (byte) 0xEC,
	(byte) 0xF1, (byte) 0xF2, (byte) 0xF6, (byte) 0xF8, (byte) 0xF9, (byte) 0xFC
};

// SMS Message sizes
public static final int SMS_7BIT_SIZE = 160;
public static final int SMS_8BIT_SIZE = 140;
public static final int SMS_16BIT_SIZE = 70;
public static final int SMS_7BIT_SIZE_SPLIT = 153;
public static final int SMS_8BIT_SIZE_SPLIT = 134;
public static final int SMS_16BIT_SIZE_SPLIT = 67;

// an enum for sms message formats
public static enum SmsFormat {
	// create the enums
	TEXT ("Text", SMS_7BIT_SIZE, SMS_7BIT_SIZE_SPLIT),
	BINARY ("Binary", SMS_8BIT_SIZE, SMS_8BIT_SIZE_SPLIT),
	UNICODE ("Unicode", SMS_16BIT_SIZE, SMS_16BIT_SIZE_SPLIT)
	;
    private final String formatName;
    private final int messageSize;
    private final int splitMessageSize;

    SmsFormat(String formatName, int messageSize, int splitMessageSize) {
        this.formatName = formatName;
        this.messageSize = messageSize;
        this.splitMessageSize = splitMessageSize;
    }
    public String formatName() {
        return this.formatName;
    }	    
    public int messageSize() {
        return this.messageSize;
    }	    
    public int splitMessageSize() {
        return this.splitMessageSize;
    }
}
javascript:

In javascript, we define the subset characters. There is no need for the enum.

// these are the valid ISO-8859-1 subset characters that can be sent as 7-bit characters in a GSM SMS
basic_chars_hex = [0x0A, 0x0D,
                  0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2A, 0x2B, 0x2C, 0x2D, 0x2E, 0x2F,
                  0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F,
                  0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4A, 0x4B, 0x4C, 0x4D, 0x4E, 0x4F,
                  0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5A, 0x5F,
                  0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E, 0x6F,
                  0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7A,
                  0xA1, 0xA3, 0xA4, 0xA5, 0xA7,
                  0xBF,
                  0xC4, 0xC5, 0xC6, 0xC7, 0xC9,
                  0xD1, 0xD6, 0xD8, 0xDC, 0xDF,
                  0xE0, 0xE4, 0xE5, 0xE6, 0xE8, 0xE9, 0xEC,
                  0xF1, 0xF2, 0xF6, 0xF8, 0xF9, 0xFC];

When we have a string to send as SMS we need to determine the mode in which we send it. If possible, we try to send it in the GSM character set because we're able to fit more characters. If not, we have to send it as Unicode which has a limit of 70 characters which could potentially triple (70+70+20=160) your SMS charges.

We need two methods for this, first, given a byte, is it in the character subset? Second, examine each character (byte) in the string using this method and return the correct format based on the characters in the string.

java:

	public static boolean isValidSMSSubsetByte(byte b) {
		for (byte aByte : ISO88591_SUBSET) {
			if (b == aByte) {
				return true;
			}
		}
		return false;
	}

	public static SmsFormat smsCharacterSetSmsFormat(String str) {
		// examine all characters, but if we hit a unicode character then return right away
		for (char c : str.toCharArray()) {
			if (c > 255 || (! SMSUtilities.isValidSMSSubsetByte((byte)c))) {
				// character is outside of the ISO-8859-1 character set or it is in the character set but not the subset
				return SmsFormat.UNICODE;
			}
		}
		return SmsFormat.TEXT;
	}

javascript:

We have the same two methods in javascript, one method to check if a character is in the GSM subset, and the next to examine a string to see what length message we're allowed.

This is where I stop with the javascript code. I'm only using javascript to give feedback to the users while they build the SMS message, to let them know if the message can be sent in a single SMS or if it has to be split.
function isBasicChar(code) {
	for (var j = 0; j < basic_chars_hex.length; j++) {
		if (basic_chars_hex[j] == code
			return true;
		}
	}
	return false;
}

function smsCharacterSetMaxSize(str) {
	var SMS_ISO8859_SUBSET_SIZE = 160;
	var SMS_UNICODE_SIZE = 70;

	// examine all characters, but if we hit a unicode character then return right away
	var i = str.length;
	while (i--) {
		var code = str.charCodeAt(i); // uses javascript charCodeAt string method
		if (code > 255 || !isBasicChar(code)) {
			return SMS_UNICODE_SIZE;
		}
	}
	return SMS_ISO8859_SUBSET_SIZE
}

3.1 Counting Unicode Bytes

When we send a Unicode SMS we're limited to 70 characters because we're sending 2-byte characters. SMS uses the UCS-2 encoding which encodes 65,536 characters (up to FFFFh). UTF-16 encodes up to 1,114,112 characters (up to 10FFFFh) and so converting between the two isn't completely straightforward, but we can ignore the extra characters encoded in UTF-16 for now.

UCS-2 doesn't have a byte order mark (BOM) and so is always big endian, and UCS-2 does not support surrogate pairs.

Not all Java characters have a length of 1 character when you query a string - there are surrogate pairs that represent single characters in a Java string. For example, if you had a Java string with a single emoji, FACE WITH TEARS OF JOY, (😂), the string will have a charCount() of 2, not 1. The character is composed of a high surrogate and low surrogate pair. This single character actually ends up taking 4 bytes, while 春 is a single character, not a surrogate pair, but takes 3 bytes in a UTF-8 encoded string.

The UCS-2 character set doesn't exist in Java but by using the character set UTF-16BE (16-bit big endian byte order) we get pretty close to the same thing, except that surrogate pair characters are two separate characters and no longer linked, and we don't have access to the entire UTF-16 character set.

While we're counting characters to keep under the Unicode limit, we should actually be counting the bytes of each character in the string. And if we have to split the string we should not split in the middle of a surrogate pair (although this probably doesn't matter when sending an SMS)

To count characters, instead of using some of the obvious methods on java's String class we instead use a Character BreakIterator to iterate over what people would normally consider the characters in the String. With the Character BreakIterator, 😂 we can extract the surrogate pairs together

 

java:

The following code will run through the String mixString and split it into 5 byte segments.
		Charset UTF16BE = Charset.forName("UTF-16BE");
		String mixStr = "a😂b春♞aáéí";
		System.out.println("\"" + mixStr + "\"" + " java String.length:" +  mixStr.length() + " #bytes:" + mixStr.getBytes(UTF16BE).length);
		// create the BreakIterator and set the text we want to examine
		BreakIterator ci = BreakIterator.getCharacterInstance(java.util.Locale.ENGLISH);
		ci.setText(mixStr);

		int bytesLimit = 5; // limit of each string we're creating
		int byteCount = 0;
		StringBuffer currentPiece = new StringBuffer();
		Vector strings = new Vector(); // substrings split into byteLimit or less
		int start = ci.first();
		for (int end = ci.next(); end != BreakIterator.DONE; start = end, end = ci.next()) {
			System.out.println("start:" + start + " end:" + end + " str:" + mixStr.substring(start,end) + " length:" + mixStr.substring(start,end).length() + " #bytes:" + mixStr.substring(start,end).getBytes(UTF16BE).length);
			char[] chars = new char[(end - start)]; // size of char array is based on number characters from the iterator
			mixStr.getChars(start, end, chars, 0); // fill the char array
			byte[] bytes = new String(chars).getBytes(UTF16BE); // get the number of bytes that are in the char array
			if (byteCount + bytes.length > bytesLimit) {
				// we are beyond our limit of bytes so we save the current stringbuffer as a string and start a new stringbuffer
				strings.add(currentPiece.toString()); 
				currentPiece = new StringBuffer();
				byteCount = 0;
			}
			// append the chars to the stringbuffer that we're working with
			currentPiece.append(chars);
			// byte count of the current string is increased
			byteCount = byteCount + bytes.length;
		}
		// get any stragglers
		if (currentPiece.length() > 0) {
			strings.add(currentPiece.toString());
		}
		
		// debugging:
		for (String aString : strings) {
			System.out.println(aString.length() + ":" + aString.getBytes(UTF16BE).length + ":"+ aString);
		}
Output:
"a😂b春♞aáéí" java String.length:10 #bytes:20
start:0 end:1 str:a length:1 #bytes:2
start:1 end:3 str:😂 length:2 #bytes:4
start:3 end:4 str:b length:1 #bytes:2
start:4 end:5 str:春 length:1 #bytes:2
start:5 end:6 str:♞ length:1 #bytes:2
start:6 end:7 str:a length:1 #bytes:2
start:7 end:8 str:á length:1 #bytes:2
start:8 end:9 str:é length:1 #bytes:2
start:9 end:10 str:í length:1 #bytes:2
1:2:a
2:4:😂
2:4:b春
2:4:♞a
2:4:áé
1:2:í

Looking at the output in more detail:

1) "a😂b春♞aáéí" java String.length:10 #bytes:20
This shows that initial string, which looks like it's 9 characters, is actually 10 characters and 20 bytes.


2) We then iterate through each character in the Character BreakIterator
start:0 end:1 str:a length:1 #bytes:2
start:1 end:3 str:😂 length:2 #bytes:4
start:3 end:4 str:b length:1 #bytes:2
start:4 end:5 str:春 length:1 #bytes:2
start:5 end:6 str:♞ length:1 #bytes:2
start:6 end:7 str:a length:1 #bytes:2
start:7 end:8 str:á length:1 #bytes:2
start:8 end:9 str:é length:1 #bytes:2
start:9 end:10 str:í length:1 #bytes:2

This shows in the second line that the emoji is actually 2 characters (high and low surrogate pairs) and 4 bytes. The rest are two bytes.

3) And finally, these are the strings that we built that are 5 or fewer bytes:
1:2:a
2:4:😂
2:4:b春
2:4:♞a
2:4:áé
1:2:í


4. Conclusion

When sending SMS messages programmatically you first try to send the message as a GSM character message, which is a subset of the ISO-8859-1 character set composed of 7-bit characters, giving you a limit of 160 charactes in your message.

If you must send the message as a Unicode SMS you need to do more than just count characters to stay under the 70 character limit. You also need to count bytes or at least character surrogate pairs to stay under the 70 character limit, and you should attempt to not split your message in the middle of java Character surrogate pairs.

Message Queues Over Websockets With JavaScript

Robert Decker - Saturday, June 18, 2016

Here I present a brief introduction to message queues and how they can be used over Websockets with Javascript and the Message Queuing Telemetry Transport (MQTT) protocol to provide notifications between multiple javascript clients.

1. Introduction

While developing web applications developers will often find the need to have actions or underlying changes by multiple clients and server applications be communicated between all clients live and without user interaction.

One simple way of doing this is by having clients poll the system for changes. However, this quickly leads to performance issues due to every client hitting the server periodically whether or not there are changes that need to be communicated.

A better solution is to use a messaging system to push action/change notifications between the clients. By using messaging a client initiating the change will push a notification to all other clients through a message broker. When a client receives a notification by another client it can then decide if it needs to access the server to retrieve updated information.

Here I provide a simple introduction to message queuing and provide a simple example of using JavaScript to send and receive messages over Websockets using the Message Queuing Telemetry Transport (MQTT) protocol. 

1.1 Broker

A message broker is simply a middleware that validates, transforms, and routes messages between clients. Each client is decoupled from the other clients and is only aware of the broker, simplifying the overall system. 

broker with clients 

fig. 1, broker with clients

Apache ActiveMQ is a powerful open source message broker that supports multiple protocols and communication channels. It has libraries in Java, JavaScript, .NET, Ruby on Rails, and many other languages. ActiveMQ is an ideal broker when you have multiple software modules written in multiple languages communicating over multiple protocols and communication channels.

1.2 Point-to-Point and Publish-Subscribe Messaging

There are two approaches to messaging: point-to-point and publish-subscribe. In point-to-point messaging a message is sent from one application (producer) to one other application (consumer) via a queue. Point-to-point messaging is best for passing a unit of work between applications.

Usually in point-to-point messaging your queues will have names that include their function and clients will subscribe to these queues to pick up the next unit of work. For example:
/application_name/version_number/function,
text_processor/1/pre_processed_a_text
text_processor/1/post_processed_a_text 

Because point-to-point messaging is queue based you are able to run any number of clients at each step in order to keep the system running at full capacity.

fig. 2, point-to-point messaging



In publish-subscribe messaging a message is sent from one application (producer) to multiple applications (subscribers) through a topic. All subscribers to a topic will receive any messages published to that topic. Publish-subscribe messaging is best used for sending logging type information and notifications of actions/changes.

fig. 3, publish-subscribe messaging



In publish-subscribe messaging your topics will have more generic names and the message itself will contain the type of action or change.
topic: /web_app/1/general
payload: {message_type=10, client_id=“web_app-13422”}
 

1.3 Message Queueing Telemetry Transport (MQTT)

Message Queueing Telemetry Transport (MQTT) is a well-established (ISO/IEC PRF 2092) publish-subscribe messaging protocol designed to be extremely simple and lightweight and to be used when a small code footprint is required. Therefore it is an ideal protocol for JavaScript clients and for Machine-to-Machine (M2M) / Internet of Things (IoT) devices. The Paho project at the Eclipse Foundation has released a JavaScript client library that uses the MQTT protocol over the WebSocket communication channel and supports all modern browsers.

Using the Paho JavaScript library you are able to connect your JavaScript clients to a message broker using a publish-subscribe messaging model.  

2. Simple JavaScript Example

 

  1. Download the latest Apache ActiveMQ here.
  2. Within the ActiveMQ directory review the file conf/activemq.xml to confirm that the ws (websocket) protocol is active and on port 61614. At this time, leave the other protocols active as well as I will be showing examples in a future post for connecting from java and ruby. The transportConnectors section should look like:
            <transportConnectors>
                <!-- DOS protection, limit concurrent connections to 1000 and frame size to 100MB -->
                <transportConnector name="openwire" uri="tcp://0.0.0.0:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600"/>
                <transportConnector name="amqp" uri="amqp://0.0.0.0:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600"/>
                <transportConnector name="stomp" uri="stomp://0.0.0.0:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600"/>
                <transportConnector name="mqtt" uri="mqtt://0.0.0.0:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600"/>
                <transportConnector name="ws" uri="ws://0.0.0.0:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600"/>
            </transportConnectors>
    
  3. Launch ActiveMQ. You can find the appropriate launcher for your operating system in the bin/ directory within the AcitveMQ folder you downloaded in step 2.

    There is no need to do any further configuration of ActiveMQ. You can log into the admin console of your local ActiveMQ here:
    http://127.0.0.1:8161/admin/

    Default login is admin/admin but you can confirm by looking in conf/jetty-realm.properties
  4. Download the eclipse project's Paho MQTT JavaScript client mqttws31.js from here. (Scroll down to the download section and get only the js file - not all of the Paho files). I also include this file in my example files in step #5.
  5. Download my example files here and unzip the directory.
  6. Open the mqtt_example/example_mq.html webpage in two or more browser windows.

    In each window you should see a text area and some debug information like the following:

  7. Now type text into each window and view the output in the other window. Here is an example - window 1 is first and window 2 is second and I am typing in window 2.

    window 1:

    window 2:
     

    Window 2 shows the payload before being sent to the broker. Window 1 shows the 'message' field of the payload after it is received from the broker and decoded from the json payload.

 
Review the full code below and the comments to get a better understanding of what is happening. 

See http://127.0.0.1:8161/admin/topics.jsp to see your broker topic and the number of messages that have been transferred.

 

3. Conclusion

Here I presented an introduction to message queues and brokers and provided an example of using a publish-subscribe messaging system to send information between multiple JavaScript clients with the MQTT protocol over Websockets through a broker.

I chose to use ActiveMQ as the broker in my example because next I will provide additional examples in Ruby and Java which will require communication with the broker over protocols other than MQTT over Websockets, specifically STOMP and OpenWire.

Later I will also provide more detail on how this system would be configured and used in a secure production environment.

4. Full Code



HTML:
<html>
	<head>
		<script type="text/javascript">
			//<![CDATA[
			// our broker runs on our local machine and without ssl
			var mqHost = 'ws://localhost:61614/'
			var mqUseSSL = false;
			var isWebsocketsSupported = false;
			//]]>
		</script>
	</head>
	<body>
		<!-- load html content first so no appearance of delay to user -->
		
		<!-- messages from other clients appear here -->
		<div id="client_messages" border="1">
			
		</div>
		
		<!-- messages we want to send to other clients are typed here -->
		<textarea id="textarea"></textarea>
		
		<!-- a div to present debugging information -->
		<div>
			debug:
			<pre id="mq_debug"></pre>
		</div>
		
		<script type='text/javascript'>
		//<![CDATA[
			// turn on the javascript for the textarea so that it sends any changes in the textarea through our MYMQ library
			var textarea = document.getElementById("textarea");
			textarea.oninput = function() {
				if (isWebsocketsSupported) {
					MYMQ.send(textarea.value);
				}
			}
			
			// make our websockect connection
			window.onload = function() {
				if( !window.WebSocket) {
					// websockets are not support by this client
					isWebsocketsSupported = false;
				} else {
					isWebsocketsSupported = true;
					// connect
					MYMQ.connect(mqHost, mqUseSSL);
					// when disconnecting, close the connection first
					window.onbeforeunload = function() {
					    MYMQ.disconnect();
					};
				}
			}
		//]]>
		</script>
		<script type="text/javascript" src="mqttws31.js"></script>			
		<script type="text/javascript" src="example_mq.js"></script>			
	</body>
</html>

        


JavaScript:
/**
* MYMQ encapsulates communication with message queues over a Paho MQTT client. 
* Connects to a general queue for the application
* Automatically reconnects the client if disconnected
*
* public functions
*    connect: connect to the message broker
*      mqhost - the message broker host
*      useSSL - boolean for if ssl enabled
*      returns: a PAHO mqtt client
*
*    disconnect: disconnect the client from the broker
*
*    send: send a message to the broker. wraps the message into a json object before sending
*      txt - the text message to send to the other clients
*/
var MYMQ = (function() {
	
	// private vars
	var appName = "mqexample"; // to separate different application topics
	var mq_v = 1; // message queue version. should only use whole numbers because '.' can be escaped by different libraries
	var client; // the mqttws client that does all of the work
	
	// we hold these vars so that we can reconnect automatically when a connection is lost
	var host; // the message broker location
	var uSSL;
	var destination; // the topic being subsbcribed to
	
	// the message types
	var TYPE_USER_SENT_MESSAGE = 10;
	
	// the json keys in the payload
	var PAYLOAD_KEY_TYPE = "type";
	var PAYLOAD_KEY_MESSAGE = "message";
	var PAYLOAD_KEY_CLIENTID = "clientid";
	
	// private functions. scroll to bottom to see the public functions
	
	// we have a private connect that is called from the public connect
	// method and is also used to reconnect automatically when 
	// connnection is lost
	function internalConnect(mqHost, useSSL) {
		host = mqHost;
		uSSL = useSSL;
		// set up our topic/destination that we subscribe to
		destination = 'jms/astra/mq/' + mq_v + '/general'; 
		
		// need to create a unique clientId for each client
		var clientId = appName + "-" + (Math.floor(Math.random() * 100000)); 
		
		debug("opening " + (uSSL ? "ssl" : "non-ssl") + " websocket to host:" + host + " as clientId:" + clientId);

		client = new Paho.MQTT.Client(host, clientId);
		
		// set our callbacks
		client.onConnect = onConnect;
		client.onMessageArrived = onMessageArrived;
		client.onConnectionLost = onConnectionLost;
		
		// now do our connection to the broker
		client.connect({
			useSSL:uSSL,
			onSuccess:onConnect, 
			onFailure:onFailure
		}); 
		return client;
	}
	
	// the client is notified when it is connected to the server. 
	// Once connected we subscribe to our topic
	function onConnect(frame) {
		debug("connected to activemq. client:" + client + " subscribing to " + destination);
		client.subscribe(destination);
	}
	
	// just print the error message
	function onFailure(failure) {
	  debug("failure: " + failure.errorMessage);
	}
	
	// the following connect codes come from mqttws.js
	//		OK: {code:0, text:"AMQJSC0000I OK."},
	//		CONNECT_TIMEOUT: {code:1, text:"AMQJSC0001E Connect timed out."},
	//		SUBSCRIBE_TIMEOUT: {code:2, text:"AMQJS0002E Subscribe timed out."}, 
	//		UNSUBSCRIBE_TIMEOUT: {code:3, text:"AMQJS0003E Unsubscribe timed out."},
	//		PING_TIMEOUT: {code:4, text:"AMQJS0004E Ping timed out."},
	//		INTERNAL_ERROR: {code:5, text:"AMQJS0005E Internal error. Error Message: {0}, Stack trace: {1}"},
	//		CONNACK_RETURNCODE: {code:6, text:"AMQJS0006E Bad Connack return code:{0} {1}."},
	//		SOCKET_ERROR: {code:7, text:"AMQJS0007E Socket error:{0}."},
	//		SOCKET_CLOSE: {code:8, text:"AMQJS0008I Socket closed."},
	//		MALFORMED_UTF: {code:9, text:"AMQJS0009E Malformed UTF data:{0} {1} {2}."},
	//		UNSUPPORTED: {code:10, text:"AMQJS0010E {0} is not supported by this browser."},
	//		INVALID_STATE: {code:11, text:"AMQJS0011E Invalid state {0}."},
	//		INVALID_TYPE: {code:12, text:"AMQJS0012E Invalid type {0} for {1}."},
	//		INVALID_ARGUMENT: {code:13, text:"AMQJS0013E Invalid argument {0} for {1}."},
	//		UNSUPPORTED_OPERATION: {code:14, text:"AMQJS0014E Unsupported operation."},
	//		INVALID_STORED_DATA: {code:15, text:"AMQJS0015E Invalid data in local storage key={0} value={1}."},
	//		INVALID_MQTT_MESSAGE_TYPE: {code:16, text:"AMQJS0016E Invalid MQTT message type {0}."},
	//		MALFORMED_UNICODE: {code:17, text:"AMQJS0017E Malformed Unicode string:{0} {1}."},
	function onConnectionLost(responseObject) {
		var errorCode = responseObject.errorCode;
		if (errorCode !== 0) {
			debug("connection lost: " + client.clientId + " errocCode:" + errorCode + " msg:" + responseObject.errorMessage);
			switch(errorCode) {
				case 5:
					/* This is probably an unsupported browser version. Don't try to reconnect.
					Provide a message to your user like:
						var msg = "Unable to connect to message queue server. You will not receive real-time automatic UI updates.";
						msg = msg + "<br/>Mininum browser versions to use this application are:";
						msg = msg + "<br/>	Internet Explorer 10";
						msg = msg + "<br/>	Firefox 21";
						msg = msg + "<br/>	Chrome 21";
						msg = msg + "<br/>	Safari 6";
						msg = msg + "<br/>	Opera 12.1";
						msg = msg + "<br/>	iOS (Safari) 6";
						msg = msg + "<br/>	Android OS 4.4";
						msg = msg + "<br/>The exact error is:" + responseObject.errorMessage;
					*/
					break;
				default:
					// reconnect to broker
					internalConnect(host, uSSL);
					break;
			}
		}
	}
	
	// this allows us to display debug logs directly on the web page
	var debug = function(str) {
		var mq_debug = document.getElementById("mq_debug")
		if (mq_debug) {
			mq_debug.insertAdjacentHTML("afterbegin", Math.floor(Date.now() / 1000) + " " + str + "\n");
		}
	};
	
	function onMessageArrived(jsonMessage) {
		debug("jsonMessage arrived:" + jsonMessage);
		try {
			// the payload comes in json format. See public send method.
			var json = JSON.parse(jsonMessage.payloadString);
			
			// ignore messages coming from our own client
			var clientid = json[PAYLOAD_KEY_CLIENTID]
			if (clientid == client.clientId) {
				debug(".onMessageArrived ignoring message from self clientId:" + clientid);
				return;
			}
			
			// find the message type and do the appropriate action based on the type
			var type = json[PAYLOAD_KEY_TYPE];
			switch(type) {
				case TYPE_USER_SENT_MESSAGE:
					// a user sent a message that we've received
					var message = json[PAYLOAD_KEY_MESSAGE];
					messageReceived(message, clientid);
					break;
				default:
					debug(".onMessageArrived cannot understand type:" + type);
					break;
			}
		} catch(err) {
		  	var str = "error processing jms message err:" + err;
			str = str + "<br/>destination/source:" + jsonMessage.destinationName;
			str = str + "<br/>payload:" + jsonMessage.payloadString;
		  	debug(str);
		}
	}
	
	// called from onMessageArrived callback based on the message type TYPE_USER_SENT_MESSAGE
	function messageReceived(message, clientid) {
		debug(".messageReceived message:" + message);
		var message_box = document.getElementById(clientid);
		if (message_box == null) {
			// insert the message box for this from client because it doesn't exist on the page yet
			var clients_box = document.getElementById("client_messages");
			clients_box.insertAdjacentHTML("beforeend", clientid + ':' + '<pre id=' + '"' + clientid + '"' + '></pre><br/>'); // ' here to fix bug appearance of code
			message_box = document.getElementById(clientid);
		}
		if (message_box) {
			// replace the contents of the message box of this client with the message
			message_box.innerHTML = message;
		}
	}

	// public methods and vars go here
	return {
		connect: function(mqHost, useSSL) {
			// we have a private internalConnect method that we call from here.
			return internalConnect(mqHost, useSSL);
		},
		disconnect: function() {
			client.disconnect();
		},
		send: function(txt) { 
			// build the json string for the payload. Little hacky because I'm
			// not using any third-party libraries to build the json
			var jsonPayload = '{'
				+ '"'+PAYLOAD_KEY_CLIENTID+'"' + ':' + '"'+client.clientId+'"' + ','
				+ '"'+PAYLOAD_KEY_TYPE+'"'     + ':' + TYPE_USER_SENT_MESSAGE + ','
				+ '"'+PAYLOAD_KEY_MESSAGE+'"'  + ':' + '"'+txt+'"'
				+'}';
			// {"clientid":"mqexample-72178","type":10,"message":"hi there"}
			
			debug("sending payload:" + jsonPayload);
			var message = new Paho.MQTT.Message(jsonPayload);
			message.destinationName = destination; // the topic/queue
			client.send(message); // sends to the destination/topic on the broker
		}
	};
	
	
})();
        

Basic Apache Sling Development Patterns: Configurations

Robert Decker - Wednesday, July 01, 2015

Apache Sling Configurations

Here's a real-world example of using Sling's OSGi configurations to control the garbage collection schedule that I presented in the last entry.

I had a project where we were processing millions of short pieces of text per day. The documents were generated in another system and saved to a folder served over WebDAV by Apache Sling where we processed the text with various engines, saved the results, and then deleted the document and its intermediary documents. It was a bit of a hack but surprisingly stable and speedy and worked well enough for what we needed.

One problem we found is that in Sling / Jackrabbit (and Adobe CQ5) when a node is deleted the data is not removed from the hard drive. This was compounded by when modifying a document over WebDAV we had to delete the original and rewrite a new document with its modifications, and so every change of a document was generating new nodes in Jackrabbit while unlinking the original node.

This caused our system quickly build up gigabytes of unused data on the filesystem and so we had to set up a Jackrabbit repository garbage collection to run periodically. Normally you would run the garbage collection as infrequently as once per week even on a large CQ5 installation with hundreds of authors. However, because we were creating, modifying, and deleting millions of documents a day we found we had to run garbage collection several times a day.

 

 

pom.xml maven-bundle-plugin settings:

sling-initial-content: text here

 

image

 

 text

Basic Apache Sling Development Patterns: Handlers, Services, Servlets, Schedulers

Robert Decker - Thursday, April 24, 2014

Basic Apache Sling Development Patterns

After about a one-year hiatus I’m starting a new Apache Sling project. In order to prepare I’m reviewing some basic patterns that we found in the past to work well for Apache Sling development and OSGi development in general.

These patterns fall into the area of implementation strategy patterns, since they are more focused on program organization and parallel execution. By following common patterns your project is more predictable and easy to understand by other developers.

A caveat - here I am describing very basic patterns. I won’t be talking about things like the whiteboard pattern even though it’s something that is used frequently in Sling. What I’m writing about here are the most basic bits of code that we found ourselves repeating many times in our projects. This is code that any Sling/CQ5 developer is probably already familiar with, but for someone new to Sling/CQ5 this will hopefully make your initial study much shorter.

Also, I will write only briefly about sling events in this entry because I plan on covering this topic in more detail later. And in future entries I will describe other patterns we commonly use in Apache Sling development.

Handlers, Services, Servlets, Schedulers


The most basic components that we write over and over again in Apache Sling are Handlers, Services, Servlets, and Schedulers. The more that you can spread functionality between these and the more you take advantage of the Sling eventing system to communicate between these components the better off you will be in the long run.

Handlers Handlers are called through the Sling eventing system. Handlers may interact directly with Sling Services. By using the Sling eventing system we are able to take advantage of threading, thread pools, distributed processing, and other features that come from an event-driven architecture.
Services Standard OSGi services that do most of our work and will usually be called from the Handlers.
Servlets A Servlet is a component that can be interacted with through through the http protocol, usually REST but not always. Your servlet code can do work directly but it’s best to create an event and have the work handled somewhere else. However, because of the responsive nature of Servlets this alway isn’t possible.
Schedulers Schedulers are a Job type that is automatically executed periodically. They are part of the Apache Sling event system.

When adding a piece of new functionality to a Sling/CQ5 project you must take some time to plan how you can divide its functionality among these component types.

Example - Apache Sling Repository Garbage Collection

For this example we will create a repository garbage collection system for Sling which will remove Jackrabbit nodes that are no longer in use. This is a feature in Sling that doesn't come out-of-the-box unless you buy Adobe's CQ5 product.

We will have the garbage collection run automatically periodically using a Scheduler and we will also allow it to be run manually through a Servlet URL. Both the Scheduler and the Servlet won’t run the garbage collection directly but instead will interact with an event Handler that will then interact with a garbage collection Service directly. 

The DatastoreGCServiceImpl does the actual garbage collection work - its java interface is DatastoreGCService. The DatastoreGCService is called directly from the DatastoreGCHandler. The DatastorePeriodicGC periodically fires off the <<gc>> event which is handled be the DatastoreGCHandler. The DatastoreGCServlet is a servlet that can either interact directly with the DatastoreGCService or indirectly through the DatastoreGCHandler, depending on how responsive the servlet must be. In my code examples below I went with the indirect method.

You will never call a handler directly from your own code.

 Also, the Apache Sling event management system recently diverged from the base OSGi event system. My examples use the new Apache Sling system.



Java Interface DatastoreGCService:
package com.astracorp.examples.patterns.services;

import java.util.HashMap;
import java.util.Map;

/*
 * Interface class to the DatastoreGCService 
*/
public interface DatastoreGCService {

    // The event topic that is used to request a garbage collection
    public static final String TOPIC_DATASTORE_GC_REQUESTED = "com/astracorp/core/datastore/gc/requested";

    // Properties of the datastore garbage collection job. For now this is empty but would normally contain
    // fields describing the thread priority, job queue name, etc
    public static final Map<String, Object> DATASTORE_GC_REQUESTED_JOB_PROPERTIES = JobConstants.datastoreGCJobProperties();

    // the method that does the garbage collection work
    public void runDatastoreGarbageCollection();

    // Inner class for providing job properties
    public class JobConstants {
        public static Map<String, Object> datastoreGCJobProperties() {
            Map<String, Object> props = new HashMap<String,Object>();
            return props;
        }
    }
}


Implementation of DatastoreGCService:
package com.astracorp.examples.patterns.services.impl;

import com.astracorp.examples.patterns.services.DatastoreGCService;
import org.apache.felix.scr.annotations.*;
import org.apache.jackrabbit.api.management.DataStoreGarbageCollector;
import org.apache.jackrabbit.api.management.RepositoryManager;
import org.osgi.service.component.ComponentContext;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.jcr.RepositoryException;

/*
 * Implementation of the DatastoreGCService
*/
@Component(immediate = true, metatype = false, label = "Astracorp astra-example Datastore Garbage Collector Service", description = "Provides methods for datastore garbage collection and other repository cleanups")
@Service(value = DatastoreGCService.class)
public class DatastoreGCServiceImpl implements DatastoreGCService {
    private static final Logger LOGGER = LoggerFactory.getLogger(DatastoreGCService.class);

    @Reference
    private RepositoryManager repositoryManager = null;

    //Runs a datastore garbage collection to clean up old files in the repository. Should be run periodically or more frequently if you are doing a lot of WebDAV operations
    @Override
    public void runDatastoreGarbageCollection() {
        LOGGER.debug("DatastoreGCService gc called. repositoryManager:" + repositoryManager);
        long time = System.currentTimeMillis();
        DataStoreGarbageCollector gc = null;
        try {
            gc = repositoryManager.createDataStoreGarbageCollector();
            LOGGER.debug("gc:" + gc);
            gc.mark();
            gc.sweep();
        } catch (RepositoryException e) {
            LOGGER.error("Error running the garbage collection", e);
        } finally {
            if (gc != null) {
                gc.close();
            }
        }
        LOGGER.debug("DatastoreGCService ran gc in " + ((System.currentTimeMillis() - time)/1000) + " seconds");
    }
}


The DatastoreGCHandler:
package com.astracorp.examples.patterns.handlers;

import com.astracorp.examples.patterns.services.DatastoreGCService;
import org.apache.felix.scr.annotations.*;
import org.apache.sling.event.jobs.Job;
import org.apache.sling.event.jobs.consumer.JobConsumer;
import org.osgi.service.component.ComponentContext;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/*
 * Handles datastore garbage collection-related events
*/
@Component(enabled = true, immediate = true, metatype = false, label = "Astracorp astra-example Datastore GC Handler", description = "Datastore garbage collector for the Jackrabbit repository")
@Service(value = JobConsumer.class)
@Property(name = JobConsumer.PROPERTY_TOPICS, value = DatastoreGCService.TOPIC_DATASTORE_GC_REQUESTED)
public class DatastoreGCHandler implements JobConsumer {
    public static final Logger LOGGER = LoggerFactory.getLogger(DatastoreGCHandler.class);

    @Reference
    private DatastoreGCService datastoreGCService = null;

    @Override
    public JobResult process(final Job job) {
        LOGGER.debug("process job called. about to call the gcservice:" + datastoreGCService);
        datastoreGCService.runDatastoreGarbageCollection();
        LOGGER.debug("finished calling the gcservice.");
        return JobResult.OK;
    }
}


The DatastorePeriodicGC:
currently set to every 25 seconds for debugging purposes. You would normally only run this once per week, or more frequently if you use WebDAV in your system
package com.astracorp.examples.patterns.schedulers;

import com.astracorp.examples.patterns.services.DatastoreGCService;
import org.apache.felix.scr.annotations.*;
import org.apache.sling.commons.scheduler.Job;
import org.apache.sling.commons.scheduler.JobContext;
import org.apache.sling.event.jobs.JobManager;
import org.osgi.service.component.ComponentContext;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/*
 * Periodically sends the garbage collection event
*/
@Component(enabled = true, immediate = true, metatype = false, label = "Astracorp Datastore GC", description = "Creates a datastore GC event periodically")
@Service(value = Job.class)
@Properties({@Property(name="scheduler.expression", value="0/25 * * * * ?"), @Property(name="scheduler.concurrent", boolValue=false)})
public class DatastorePeriodicGC implements Job { // this is a scheduler.job
    public static final Logger LOGGER = LoggerFactory.getLogger(DatastorePeriodicGC.class);

    @Reference
    private JobManager jobManager = null;

    @Override
    public void execute(JobContext jobContext) {
        this.sendGarbageCollectEvent();
    }

    private void sendGarbageCollectEvent() {
        LOGGER.debug("sendGarbageCollectEvent called. sending " + DatastoreGCService.TOPIC_DATASTORE_GC_REQUESTED + ":" + DatastoreGCService.DATASTORE_GC_REQUESTED_JOB_PROPERTIES);
        org.apache.sling.event.jobs.Job job = jobManager.addJob(DatastoreGCService.TOPIC_DATASTORE_GC_REQUESTED, DatastoreGCService.DATASTORE_GC_REQUESTED_JOB_PROPERTIES);
        //LOGGER.debug("job:" + job);
    }
}


The DatastoreGCServlet:
package com.astracorp.examples.patterns.servlets;

import com.astracorp.examples.patterns.services.DatastoreGCService;
import org.apache.felix.scr.annotations.Reference;
import org.apache.felix.scr.annotations.sling.SlingServlet;
import org.apache.sling.api.SlingHttpServletRequest;
import org.apache.sling.api.SlingHttpServletResponse;
import org.apache.sling.api.servlets.SlingSafeMethodsServlet;
import org.apache.sling.event.jobs.JobManager;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/*
 * Servlet for starting the datastore garbage collection through a url
*/
@SlingServlet(paths={DatastoreGCServlet.DATASTORE_GC_URL_PATH}, methods = {"GET"})
public class DatastoreGCServlet extends SlingSafeMethodsServlet {
    public static final Logger LOGGER = LoggerFactory.getLogger(DatastoreGCServlet.class);

    public static final String DATASTORE_GC_URL_PATH = "/bin/util/gc";

    @Reference
    private JobManager jobManager = null;

    @Override
    protected void doGet(SlingHttpServletRequest request, SlingHttpServletResponse response) {
        // fire off event that a datastore garbage collection was requested
        LOGGER.debug("doGet called. sending " + DatastoreGCService.TOPIC_DATASTORE_GC_REQUESTED + ":" + DatastoreGCService.DATASTORE_GC_REQUESTED_JOB_PROPERTIES);
        org.apache.sling.event.jobs.Job job = jobManager.addJob(DatastoreGCService.TOPIC_DATASTORE_GC_REQUESTED, DatastoreGCService.DATASTORE_GC_REQUESTED_JOB_PROPERTIES);
    }
}

You will access the servlet gc method at http://localhost:8080/bin/util/gc

 

Conclusion

This example provides you with the ability to do datastore garbage collections, something normally not automatically handled by Apache Sling. You are able to run the garbage collection on a schedule and you are also able to manually launch the garbage collection through a servlet action.

As a side note, I found that if you use Apache Sling's WebDAV feature you can potentially end up with a large number of unused nodes in your repository. It looks like every save to a file opened through WebDAV produces new nodes (as seen by some of the events being sent in Apache Sling), and so if you're doing a lot of automated WebDAV actions you will probably need to run this garbage collection more than once a week. When we were doing automated file processing over WebDAV we ended up having to run the garbage collection every hour.

Further reading:
https://sling.apache.org/documentation/bundles/apache-sling-eventing-and-job-handling.html
http://en.wikipedia.org/wiki/Event-driven_architecture