OCyara
OCyara performs OCR on images and PDF files to extract text content and scan it against Yara rules for malware detection.

OCyara
OCyara performs OCR on images and PDF files to extract text content and scan it against Yara rules for malware detection.
OCyara Description
OCyara is a Python module that performs Optical Character Recognition (OCR) on image files and scans the extracted text for matches against Yara rules. The tool can process various image formats and also handles images embedded within PDF files. The module requires Python 3.5+ and is designed to work on Debian-based Linux distributions, with testing performed on Kali Rolling and Ubuntu 16.10. Installation requires Tesseract OCR API and associated libraries including libtesseract-dev, libleptonica-dev, and various image format libraries. OCyara uses tesserocr for OCR functionality and requires manual installation of dependencies including python3-dev, tesseract-ocr, and image processing libraries. The tool supports multiple image formats including GIF and TIFF, though some Ubuntu LTS installations may require manual compilation of Tesseract and Leptonica for full format support. Installation is performed through pip after meeting system requirements, with Cython requiring separate installation due to tesserocr dependencies.
OCyara FAQ
Common questions about OCyara including features, pricing, alternatives, and user reviews.
OCyara is OCyara performs OCR on images and PDF files to extract text content and scan it against Yara rules for malware detection.. It is a Security Operations solution designed to help security teams with Linux, YARA, PDF.