Robert A Decker Programming Repository

Notes and articles that will reduce the pain

Basic Apache Sling Development Patterns: Configurations

Robert Decker - Wednesday, July 01, 2015

Apache Sling Configurations

Here's a real-world example of using Sling's OSGi configurations to control the garbage collection schedule that I presented in the last entry.

I had a project where we were processing millions of short pieces of text per day. The documents were generated in another system and saved to a folder served over WebDAV by Apache Sling where we processed the text with various engines, saved the results, and then deleted the document and its intermediary documents. It was a bit of a hack but surprisingly stable and speedy and worked well enough for what we needed.

One problem we found is that in Sling / Jackrabbit (and Adobe CQ5) when a node is deleted the data is not removed from the hard drive. This was compounded by when modifying a document over WebDAV we had to delete the original and rewrite a new document with its modifications, and so every change of a document was generating new nodes in Jackrabbit while unlinking the original node.

This caused our system quickly build up gigabytes of unused data on the filesystem and so we had to set up a Jackrabbit repository garbage collection to run periodically. Normally you would run the garbage collection as infrequently as once per week even on a large CQ5 installation with hundreds of authors. However, because we were creating, modifying, and deleting millions of documents a day we found we had to run garbage collection several times a day.



pom.xml maven-bundle-plugin settings:

sling-initial-content: text here