maeser.admin_portal.extract_text module

maeser.admin_portal.extract_text module#

This module is used to extract text from pdfs in markdown format. The resultant markdown is saved to the same directory where the pdfs are located. This module can be executed in the terminal or used within another script.

maeser.admin_portal.extract_text.extract_all_pdf_texts(dir: str)[source]#

Extracts text from all pdfs located in dir.

Parameters:

dir (str) – The directory where the pdf files are located.

maeser.admin_portal.extract_text.extract_pdf_text(dir: str, pdf: str)[source]#

Extracts pdf text in markdown format and saves the output to the pdf’s directory.

Parameters:
  • dir (str) – The directory where the pdf file is located.

  • pdf (str) – The name of the pdf, inlcuding the extension.