Here's our situation: there are over 100 let's say plain text dbs, each one containing lines of different fixed-size fields ASCII format. Let's assume lines within a single db have same format.

# excerpt from the record S5 db S5XXZSXX151217999999CBF X FLEX CONDITION S5YYF021160629999999IBG CY PE081CPETC201PET IN CABIN DOG # excerpt from the category 1 db 00100030530CNNX0211396626 NRNTR 00100030531CPNX 396627 NRNTR 00100030622UNN 11000000

So you can consider that each db has a schema where each field is located at a fixed offset from the beginning of the line. We have to provide subset of SQL-like operations over them, for example:

SELECT cxr,subcode,commercial_name,date_disc FROM recordS5 WHERE cxr LIKE 'YY|XX' AND ((commercial_name LIKE 'PET' AND type = 'C' ) OR ( type = 'F' AND commercial_name= 'MEAL' )) AND date_disc < '180620' AND date_eff < date_disc SELECT COUNT (*) FROM category1 WHERE age_min <= 28 AND 28 <= age_max AND (tbl_no < '00050000' OR tbl_no > '01000000' )

(cheers if you guess the domain from these, also no, I've never been affiliated with the well known Lisp company in that sector)

We'll have to iterate all db entries, check the WHERE filter over each line and extract information to be returned in case it passes. These queries can happen to run over billions of entries so we'll eye some "systems" language, band-aids like Python are out of question.

Let's focus only on the filter condition. WHERE is essentially a tree where intermediate nodes are boolean predicates and, or or not and leaves being concrete line field operation - comparison, regular expression matching, either with constant string or other field.