Recently, I have encountered a very rare out-of-print book that I decided to save it for the world with modern technology.
Because I don’t have a real scanner, I tested all sorts of scanner android apps that are open source.
- OpenNoteScanner not working on every phone, auto cropping not working properly, capture fast with postprocessing, high quality result.
- OpenScan no auto cropping
- PDF-Doc-Scan auto cropping not working with book at all
- docus not so good auto cropping, no retouch
- CleanSCAN no auto cropping
- OSS-DocumentScanner with functional auto cropping most of the times, has filter and postprocessing but at cost of slow capture, very high quality result
It ends up OSS-DocumentScanner
is the best of all. To make things easier, I customized the best gamma, contrast and filter as default, based on my lighting and book condition.
After all pages of the book are captured into a single PDF file, I also tested the OCR function in the app. It works but not good at all.
So, now it’s the time to transfer the PDF to a computer.
Optional but strongly recommend to use pdfc or pdfmini to compress image for file size reduction, this can prevent OOM error and speed up for OCR process.
There are different tools for different languages, such as Umi-OCR for Asian languages and OCRmyPDF for European languages, but they all support English very well.
To use OCRmyPDF in cli by simply run ocrmypdf input.pdf output.pdf
After OCR done, it is very wired that the file size decreased but also very pleasant.
The final step before public sharing, is to use exifcleaner to remove the metadata. Use signaturepdf if need more modifications.
While sharing it to shadow libraries such as archive.org, zlib, libgen, using protonVPN and Tor browser to prevent any risk from copyright surveillance.
In some cases like this, the law only protects property but not morality. That is why we need anonymity.