Description
Lazy is too hard. When I had 40,000 PDFs and needed to extract their data, I knew that the "lazy" approach was insufficient. This talk reviews tools to tame PDFs with confidence. I'll use my open-data project's workflow as an example (ETL anyone?). It's also a follow-up/response to PyOhio2016's "We Don’t Need No Stinkin’ PDF Library: Build PDFs with Python the Lazy Way".