Blog Archive, January 2019

Featured Image

Extracting Text from a PDF Using Python

 Jan. 6, 2019    0 comments 

Recently I needed to extract text from a PDF file using Python. Quick googling led me to PyPDF2 package, however I wasn't able to extract any text from my test PDF with it. The test PDF was created with Google Docs (a very common scenario) and did not have any fancy formatting, so PyPDF2 was disqualified for my purposes. After further googling I found pdfminer package and its Python 3 compatible version — pdfminer.six. (...)

Read post