Home

Frodo, the LOTR ==================== Frodo is a Line-Oriented Text Rewriter, a tool for scripting bulk rewrites of line-oriented text data. It can be used for reformatting data, or for other editing tasks more challenging than global-search-and-replace. The easiest way to explain is just to give some examples. Suppose you have a data file that looks like this: last="Ambrose" first="Alice" bdate="1885-04-22" last="Baldwin" first="Bernard" bdate="1887-09-13" last="Corbin" first="Charles" bdate="1879-11-03" last="Dalton" first="Dennis" bdate="1881-01-07" That's nice and all, but it would be more convenient if it were in CSV format, with maybe a header that gave names to the column. This is a perfect job for Frodo. Try this script on it. insert "LAST,FIRST,BDATE" while ( replace /last="(.*)" first="(.*)" bdate="(.*)"/ |{1},{2},{3}| next ) You'll get something that looks like this. LAST,FIRST,BDATE Ambrose,Alice,1885-04-22 Baldwin,Bernard,1887-09-13 Corbin,Charles,1879-11-03 Dalton,Dennis,1881-01-07 This is probably pretty obvious, except for the "replace" directive. The first argument - the part between the slashes (/) - is a regular expression. Because Frodo is a Java application, these are the java.util.regex.Pattern regular expressions. The parentheses mark capture groups. When the regex matches the current line, the parts of the string inside the parentheses are captured and saved to be used later - for example, by the second argument. The second argument of the replace directive - the part between the vertical bars (|) - is a string format as defined by the java.text.MessageFormat class. The numbers inside the braces indicate arguments which are replaced by the corresponding capture groups from the previous regex match. The entire line is then replaced by this newly formatted string. If we wanted, we could make the script slightly more readable by giving names to the all the strings. define CsvHeader "LAST,FIRST,BDATE" define OldFormat /last="(.*)" first="(.*)" bdate="(.*)"/ define NewFormat |{1},{2},{3}| insert CsvHeader while ( replace OldFormat NewFormat next ) Let's amp it up to 0.5. Suppose we want the output in XML format, rather than CSV. This script will do it. define InputFormat /last="(.*)" first="(.*)" bdate="(.*)"/ insert "<records>" while ( match InputFormat insert " <record>" insert | <last>{1}</last>| insert | <first>{2}</first>| insert | <bdate>{3}</bdate>| insert " </record>" remove ) append "</records>" The "match" directive captures the parts of the line for use by the inserts. Note that all lines are inserted before the current line being edited. When the current line is finally removed, all lines after it move up in the list, and so the next line after it automatically becomes the new current line. Once we've processed all lines, the match directive will fail because there is no current line, and the while loop will exit, allowing the append directive to be executed. Here's the output. <records> <record> <last>Ambrose</last> <first>Alice</first> <bdate>1885-04-22</bdate> </record> <record> <last>Baldwin</last> <first>Bernard</first> <bdate>1887-09-13</bdate> </record> <record> <last>Corbin</last> <first>Charles</first> <bdate>1879-11-03</bdate> </record> <record> <last>Dalton</last> <first>Dennis</first> <bdate>1881-01-07</bdate> </record> </records> Finally, let's amp it up to about a 1. Suppose we want to convert the XML back to CSV. Check out this script. define CSV_Header "LAST,FIRST,BDATE" define XML /\s*<last>(.*)</last>\s*<first>(.*)</first>\s*<bdate>(.*)</bdate>" define CSV |{1},{2},{3}| insert CSV_Header match "<records>" remove while ( match "<record>" remove catenate 2 -- join the current line with the next two lines replace XML CSV next match "</record>" remove ) match "</records>" remove You might notice here that comments begin with a double dash and continue to the end of the line. Also notice that terminators are not required to mark the end of a directive. Each directive begins with a keyword, and the next keyword marks the start of the next directive. This means, for example, that match "<records>" remove is the same as match "<records>" remove So why do we match "<records>" when there are no capture groups to extract? Every directive either succeeds or fails, and a sequence of directive continue only as long as every one of them succeeds. If the current line - in this case the first line - doesn't match the regex, then the directive will fail and Frodo will stop processing the script. It's possible to provide an alternate sequence of directives in case a sequence doesn't complete. For example, match "<records>" remove else log "Didn't find a <records> tag at the start of the file" fail In this example, if both the match and remove directives succeed, then neither the log or the fail directive will be attempted. If the match fails, or if the match succeeds but the remove fails, then Frodo will continue at the "log" directive. The remove will only be attempted if the match succeeds. Obviously, there's a lot more to Frodo than this. See the programmer manual for more information How to Execute Frodo ==================== Frodo is a Java application. Make sure you have Java installed. Download the Frodo jar file from the dist folder, and run this command: java -jar frodo.jar <script-file> [ <input-file> [ <output-file> ] ] The first argument, the script file is required. If the output file is not specified, the output will be written to the console. If the input file is also not specified, then the input will be read from the console.