we are currently experiencing a problem of disk storage related to the usage of a file system based Storage Provider on our Clearspace installation.
Our setup is the following:
- We have a cluster with 2 nodes
- We have a common file system of 100GB mounted on both nodes and which is physically on a 3rd machine.
- We have configured Clearspace to use this mount point with a file system based Storage Provider and a single namespace "jiveSBS"
- We have a 4th machine hosting document conversion and cache.
Recently, while doing some monitoring on our systems, we realized that our 100GB storage was almost full (94%) with a number of files greater than 900.000.
Our problem is that the "real" number of files calculated from the database is more around 51.000 (tables: jiveAttachment, jiveDocumentBody, jiveImage) and the size around 30GB
We do understand from the related documentation, that for each document uploaded and for each version of a document, Clearspace generates thumbnail images, previews (one swt file per page of the document), etc. However, the documentation also states that this should increase the final space required by 30%.
We also understand that for every single item stored by the file system provider, there are actually 2 files: XXX.key and XXX.bin.
However, we still find that the difference between the numbers found from the db and the ones found from the file system is too important.
What we would like to know is the following:
- Does this sound normal to you or do you think there might be a problem on our system?
- If everything is normal, is there a way to do some cleanup and save some space? We were thinking that maybe there are some unused files on the file system that we could somehow remove ...
- If everything is normal and there are no unused files, is there an easy way to massively delete versions: we have some documents with a very important number of versions (>100)
Any technical information on how the Storage Provider work will also be welcome as it will help us understand how to best deal with the storage issue:
- Is there any technical documentation on the subject (apart from the document "Configuring a Binary Storage Provider")
- What are the .key and .bin files and what do they contain?
- How are these files related to the actual records in the database (jiveDocument, jiveAttachment, jiveImage ...)?
- What is exactly stored by the Storage Provider and what is not?
- How does document conversion relates to storage provider?
Your help will be highly appreciated on this subject,