Blog Archive, January 2019

Extracting Text from a PDF Using Python

Jan. 6, 2019 0 comments

Recently I needed to extract text from a PDF file using Python. Quick googling led me to PyPDF2 package, however I wasn't able to extract any text from my test PDF with it. The test PDF was created with Google Docs (a very common scenario) and did not have any fancy formatting, so PyPDF2 was disqualified for my purposes. After further googling I found pdfminer package and its Python 3 compatible version — pdfminer.six. (...)

Read post

Python

Blog Archive, January 2019

Extracting Text from a PDF Using Python

Featured Posts

Categories

Archive