Ever wanted to grep a pattern in a PDF document? How about a database or the web? crgrep is a powerful grep-like utility written using JAVA that can do much more than just searching for patterns in text files. crgrep stands for Common Resource grep.

Resources crgrep supports:

text documents, PDFs

database tables

ZIP, TAR, WAR, EAR and JAR archive formats

image metadata (jpeg, gif etc.)

text in scanned documents (jpeg/gif/tiff/bmp/png), extracted using OCR

Maven POM files, following dependency trees of resource artifacts

web resources

combinations of supported resources

DOWNLOAD

crgrep is distributed as binary from its SourceForge project page. After extracting the archive, crgrep binary can be found in the bin directory.

USAGE

Normal calling convention $ crgrep <pattern> <resource path(s)> Wildcards such as * and ? are supported in pattern or resource path(s). Output is displayed in the format: <resource>[[:pagenum]:linenum:matching_content] For example: Output Match ------------------------------------------------------------------ src/foo.java File listing match src/bar.txt:25:some text File content match (+lineno) lib/all.zip[image.gif] Archive file listing match lib/app.war[WEB-INF/web.xml]:6:<d..> Archive file content match pom.xml->stuff.zip[doc.txt] File listing match mypic.jpg: @{Size=25,Com=Scene} File meta-data match TAB: [COL1,COL2,COL3] Table column name match TAB: data1,data2,data3 Table data match Node[1]:{name:"John"} Graph database node match sample.pdf:1:1:Sample PDF Document Text extracted from a PDF (+pageno and +linenum)

Find files and data matching key under target directory. Include archives. $ crgrep -r key target target/simple_file.txt: a key moment target/misc.zip[misc/nested_monkey.txt] target/monkey-pics.txt:1:A file about happy monkeys. target/test-ear.ear[META-INF/MANIFEST.MF]:5:Created-By: Apache monkey

What column data in my database matches ‘handle’?

(database username and password should be in ~/.crgrep)

For relational DB: $ crgrep -d -U "jdbc:sqlite:/databases/db.sqlite3" handle '*' For Neo4J graph DB: $ crgrep -d -U "http://localhost:7474/" handle '* -d stands for database and -U for URI.

(database username and password should be in ~/.crgrep) For relational DB: Search pattern in an image using OCR: $ crgrep --ocr report report_scan.png

Search in image metadata $ crgrep --ocr report report_scan.png

Does the google home page contain a ‘favicon’ reference? $ crgrep google_favicon http://www.google.com

Find maven (POM) dependencies in my project with content matching ‘RunWith’ $ crgrep -m RunWith pom.xml

Webpage: Common Resource grep

Similar software