PDFBox | 馬仔驚自己唔記得要留既notes

Java PDFBox – Crawling Function Design

– Create Database Table with File Path / File Content / File Hash Column(s).
– Develop a Schedule Job to Perform Crawling Step ( to Read the PDF File under the Specific Folder Path ).
   – Iterate the File from the Specific Folder Path
      – If the File Item is not existed on Database Table,
         – Execute the PDFBox to retrieve the File Content
         – Insert Record with PDF File Hash Value on the Database Table
      – If the File Item is existed on SQL Table & The Hash is different from Database Table,
         – Execute the PDF Box the File Content
         – Update Record with PDF File Hash Value on the Database Table
      – If the File Item is existed on SQL Table & The Hash is the same as Database Table,
         – Nothing to do
– Develop a Web Function and UI to search the File by using the File Name or File Content Keyword.
   – Prepare a SQL Statement to search the Keyword from File Name / File Content Field …