The purpose of the document is to describe how the main features of the File System Content Provider and the IMAP Content Provider are implemented and what the configuration parameters necessary for the system are.
... < plugins > ... < plugin > < id >net.sf.iqser.plugin.filesystem</ id > < type >Document</ type > < name >Filesystem Content Provider</ name > < vendor >IQser Technologies</ vendor > < provider-class >net.sf.iqser.plugin.filesystem.FilesystemContentProvider </ provider-class > <!-- Use a Cron formatted string to define the synchronisation schedule. --> < scheduler > < syncjob >0 5 * * * ?</ syncjob > < housekeeperjob >0 0 23 * * ?</ housekeeperjob > </ scheduler > < init-param > <!-- folder from where the files are read --> < param-name >folder</ param-name > < param-value >[D:/folder1/][D:/folder2/]</ param-value > </ init-param > < init-param > <!-- file filter pattern --> < param-name >filter-pattern</ param-name > < param-value >[txt][pdf][zip]</ param-value > </ init-param > < init-param > <!-- folders that are taken in consideration --> < param-name >filter-folder-include</ param-name > < param-value >[D:/folder1/include]</ param-value > </ init-param > < init-param > <!-- folder that are not taken in consideration --> < param-name >filter-folder-exclude</ param-name > < param-value >[D:/folder2/exclude]</ param-value > </ init-param > < init-param > <!-- the attributes of a content that are keys --> < param-name >key-attributes</ param-name > < param-value >[Title][Author][attr3][attr4]</ param-value > </ init-param > < init-param > <!-- attribute mappings --> < param-name >attribute.mappings</ param-name > < param-value >{AUTHOR:Autor, TITLE:Bezeichnung} </ param-value > </ init-param > </ plugin > ... </ plugins > ... |
The URL of the file system content represents the location of the file on the hard disk together with the name of the file.
The system synchronizes the object graph with a folder from the file system specified in the iQser configuration file. The configuration file specified also the type of files that should be taken in consideration and also the sub-folders of the folder that should be included or excluded from the synchronization.
... < init-param > <!-- folder from where the files are read --> < param-name >folder</ param-name > < param-value >[D:/folder1/][D:/folder2/]</ param-value > </ init-param > <!-- file filter pattern --> < param-name >filter-pattern</ param-name > < param-value >[txt][zip]</ param-value > </ init-param > < init-param > <!-- folders that are taken in consideration --> < param-name >filter-folder-include</ param-name > < param-value >[D:/folder1/included1/][D:/folder2/included2/]</ param-value > </ init-param > < init-param > <!-- folder that are not taken in consideration --> < param-name >filter-folder-exclude</ param-name > < param-value >[D:/files1/excluded1/][D:/files2/exluded2/]</ param-value > </ init-param > ... |
doHousekeeping
This action will delete the content object if the corresponding file is no longer on the file system.
doSynchronization
This action will add or update a content object on new or changed files on the local disk.
Explained
The folders that are taken in consideration are folder1 and folder2 from the d: drive. The files that are taken in consideration are the plain text files and the zip archives (also the plain text files from the zip archive). The folders that are included in the process are the included1 and included2 from the folders that are taken in consideration and the excluded folders are the excluded1 and excluded2 from the folders that are taken in consideration. There can be any number of filters for files or folders.
The content objects that can be built are the ones specified in the requirements document. These are the attributes that are extracted for each document type that has been tested. The attributes are the ones extracted by Tika or using the PDF and RTF library.
EXCEL Document Attributes
ODF Document Attributes
PPT Document Attributes
DOC Document Attributes
RTF Document Attributes
Text Document Attributes
PDF Document Attributes
TXT Document Atrributes
There are 2 different ways for extracting the binary content of content. The first type is for extracting the binary content for files that are not packed while the other one is for zip files. The URL of the zip file is:
zip://location_of_zip/zip_file.zip!/location_of_file_entry/file_entry |
This method is implemented only for files that are not archived. It creates a FileContent object from an input stream. It also modifies the key attributes from the initial content using the configuration from iQser configuration file. It also maps the initial attributes with the ones specified in the iQser configuration file.
The attributes that are specified in the key-attributes param-name are the new key-attributes of the content. The rest of the attributes are false.
... < init-param > <!-- the attributes of a content that are keys --> < param-name >key-attributes</ param-name > < param-value >[Title][Author]</ param-value > </ init-param > ... |
This example of configuration file specifies that the key-attributes are the Title and the Author. The rest of the attributes are non-key attributes.
... < init-param > <!-- attribute mappings for certain parameters extracted by the mail api in json format --> < param-name >attribute.mappings</ param-name > < param-value >{AUTHOR:Autor, Last-Author:Autor, Author:Autor, title:Bezeichnung,TITLE:Bezeichnung}</ param-value > </ init-param > ... |
The content new attributes will be Autor instead of AUTHOR, Autor instead of Last-Author, Bezeichnung instead of title or TITLE.
This operation deletes the content from the object graph and also from the file system. This feature is available also for files that are contained in a zip archive.
This operation updates or creates a file and a file content object. This feature is available also for files that are contained in a zip archive. The content is added to the object graph or updated if it already exists. If the file cannot be saved an exception is thrown and the content object is not saved.