Java PDFBox – Crawling Function Design
– Create Database Table with File Path / File Content / File Hash Column(s).
– Develop a Schedule Job to Perform Crawling Step ( to Read the PDF File under the Specific Folder Path ).
– Iterate the File from the Specific Folder Path
– If the File Item is not existed on Database Table,
– Execute the PDFBox to retrieve the File Content
– Insert Record with PDF File Hash Value on the Database Table
– If the File Item is existed on SQL Table & The Hash is different from Database Table,
– Execute the PDF Box the File Content
– Update Record with PDF File Hash Value on the Database Table
– If the File Item is existed on SQL Table & The Hash is the same as Database Table,
– Nothing to do
– Develop a Web Function and UI to search the File by using the File Name or File Content Keyword.
– Prepare a SQL Statement to search the Keyword from File Name / File Content Field …
– Develop a Schedule Job to Perform Crawling Step ( to Read the PDF File under the Specific Folder Path ).
– Iterate the File from the Specific Folder Path
– If the File Item is not existed on Database Table,
– Execute the PDFBox to retrieve the File Content
– Insert Record with PDF File Hash Value on the Database Table
– If the File Item is existed on SQL Table & The Hash is different from Database Table,
– Execute the PDF Box the File Content
– Update Record with PDF File Hash Value on the Database Table
– If the File Item is existed on SQL Table & The Hash is the same as Database Table,
– Nothing to do
– Develop a Web Function and UI to search the File by using the File Name or File Content Keyword.
– Prepare a SQL Statement to search the Keyword from File Name / File Content Field …