1 Reply Latest reply on Jun 25, 2015 10:16 AM by jesse.fuller

    Text Extraction and Doc Conversion Limits


      Hi everyone,

      I am currently trying to find file size limit for text extraction and document conversion.


      I have following questions :


      1. What is relation between document conversion and text extraction ?

      1.      In my view, text extraction extracts plain text from a file for all of its complete file size (without any extraction limit).
      2.      This extracted plaintext is further used for 2 purposes: Search Index rebuild and document conversion.
      3.      Search index rebuild is useful searching content including uploaded files and attachments.
      4.      Document conversion is process of creating preview images with plaintext obtained above.


      2. Is there any way to bypass text-extraction and document conversion for file with size greater than predetermined configurable limit ?

           a.     I have come across below system properties for limiting file size for various purposes:

                     attachments.maxAttachmentSize      :      Max size for attachments

                     search.binaryContentByteLimit          :      Limit Search Index per file.

                     officeintegration.conversion.filelimit    :     document conversion limit

                     docbody.maxBodySize                      :     Max Uploaded File Size Limit

           b.     Still if I am right that document conversion follows text extraction, and if in someway am able to limit text extraction for files above threshold, there should be no use of              docbody.maxBodySize usage, right ?

           c.     Is officeintegration.conversion.filelimit system property limit also applicable for non MS Office file types like pdf ?




      Jeph Yang (Similar Help : Can Document Conversion limits be changed?)


      Doug Woolley (https://community.jivesoftware.com/casethread/371381#comment-2596840)


      Similar Case: Performance Enhancement During Large File Uploads