We are currently trying out Jive ESF solution and have few doubts regarding performance:
As per our internal case, search.binaryContentByteLimit does not limit extracted file size to mentioned size. We got following explanation:
There are two very different things that happen during a search reindex:
1) we loop over all of the content in the system
2) for each piece of content, we check to see if there's an attachment or binary document associated w/ said content
3) if so, we'll grab that binary object and send it over to the text extraction server
4) the text extraction server attempts to grab the plain text from the binary content and writes the resulting text to disk, no matter it's size.
5) the web node gets the extracted text back from the text extraction server and then only grabs the first N bytes of the text and put's that into the search index.
So the problem is that the property applies to the search index (step 5) but doesn't affect the text extraction process (step 4) and as a result, you can see their text extraction directory very quickly balloons in size.
As seen above, the text extraction would be carried out regardless of upload size limit. With ESF we intend to provide select number of users the ability to upload large files (greater than size cap). Currently ESF content undergoes text extraction and index rebuilding. We have following queries related to ESF:
1. Is there any way to bypass attachments.maxAttachmentSize and similar limit for all content types ? Upload size limit should only be applicable for users where ESF is not enabled.
2. Similarly, I suspect allowed file extension property is also applicable to ESF. Any way around this so we can ensure users are able to upload any type of file in ESF ?
3. Since our attachment size would exceed normal file size limit, text extraction and similar activity would take significant time. As per above jive response, most of index rebuild would go wasted when index limit exceeds search.binaryContentByteLimit. Is there a way to enhance performance of Jive document conversion (generating image previews) & text extraction service during upload of huge files?
4. Is there a way to enhance performance of Jive core application during upload of huge files?