How Tax Service, OpenStreetMap, and InterSystems IRIS

could help developers get clean addresses

Class Sample.Address Extends %Persistent { Property streetName As %String; Property cityName As %String; Property areaName As %String; Property postalCode As %String; }

But I have to say a few words about why it shouldn't be done this way. What is our object Address? Why can't it just be a group of text strings? The most obvious objections that pop up come from the context: who is using this Address, what form are they using it in, and for what purpose? Try to put your programmer logic to the side and imagine how a «foreign tourist,» «historian,» «tax collector,» or «lawyer» think.



I'm guessing you immediately came up with a bunch of additional questions: what language and encoding to use, what time period to consider, and what kind of documents are involved in this operation: legal or postal? And a city: is that a named locality, or what? Even a street could be a boulevard, lane, avenue, or something else. How should all these important details be handled?



Let's look at a real-life example. Google is now run by Sundar Pichai. He is from India. He was born in the city of Chennai. Or is it Madras? In 1996, the residents decided that the name of the city sounded too Portuguese and renamed the capital of the state of Tamil Nadu from Madras to Chennai. So what should Sundar and his 72 million compatriots enter in their electronic documents?



In fact, there's a whole science that studies this: applied toponymy.

When there are two things in one place – texts and the InterSystems IRIS data platform – a developer has a real opportunity to really turn things around without stepping away from the machine. By using the embedded object components iKnow and iFind, for example. These components are meant for working with unstructured data and full-text search, respectively.

nodes – defining points in space,

ways – defining linear features and area boundaries, and

relations – which are sometimes used to explain how other elements work together.

set xmlSchema = ##class(%XML.Utils.SchemaReader).%New() do xmlSchema.Process("/path/to/OSMSchema.xsd")

python3 -m http.server 80

Class OSM.osm Extends (%Persistent, %XML.Adaptor) [ ProcedureBlock ] { Parameter XMLNAME = "osm"; Parameter XMLSEQUENCE = 1; Property bounds As OSM.bounds(XMLNAME = "bounds", XMLREF = 1) [ Required ]; Relationship node As OSM.node(XMLNAME = "node", XMLPROJECTION = "ELEMENT", XMLREF = 1) [ Cardinality = many, Inverse = osm ]; Relationship way As OSM.way(XMLNAME = "way", XMLPROJECTION = "ELEMENT", XMLREF = 1) [ Cardinality = many, Inverse = osm1 ]; Relationship relation As OSM.relation(XMLNAME = "relation", XMLPROJECTION = "ELEMENT", XMLREF = 1) [ Cardinality = many, Inverse = osm2 ]; Property version As %xsd.float(XMLNAME = "version", XMLPROJECTION = "ATTRIBUTE") [ InitialExpression = ".6", ReadOnly ]; Property generator As %String(MAXLEN = "", XMLNAME = "generator", XMLPROJECTION = "ATTRIBUTE") [ InitialExpression = "CGImap 0.0.2", ReadOnly ]; }

Class OSM.node Extends (%Persistent, %XML.Adaptor) [ ProcedureBlock ] { Parameter XMLNAME = "node"; Parameter XMLSEQUENCE = 1; Relationship tag As OSM.tag(XMLNAME = "tag", XMLPROJECTION = "ELEMENT", XMLREF = 1) [ Cardinality = many, Inverse = node ]; Property id As %xsd.unsignedLong(XMLNAME = "id", XMLPROJECTION = "ATTRIBUTE"); Property lat As %xsd.double(XMLNAME = "lat", XMLPROJECTION = "ATTRIBUTE"); Property lon As %xsd.double(XMLNAME = "lon", XMLPROJECTION = "ATTRIBUTE"); Property user As %String(MAXLEN = "", XMLNAME = "user", XMLPROJECTION = "ATTRIBUTE") [ SqlFieldName = _user ]; Property uid As %xsd.unsignedLong(XMLNAME = "uid", XMLPROJECTION = "ATTRIBUTE"); Property visible As %Boolean(XMLNAME = "visible", XMLPROJECTION = "ATTRIBUTE"); Property version As %xsd.unsignedLong(XMLNAME = "version", XMLPROJECTION = "ATTRIBUTE"); Property changeset As %xsd.unsignedLong(XMLNAME = "changeset", XMLPROJECTION = "ATTRIBUTE"); Property timestamp As %TimeStamp(XMLNAME = "timestamp", XMLPROJECTION = "ATTRIBUTE"); Relationship osm As OSM.osm(XMLPROJECTION = "NONE") [ Cardinality = one, Inverse = node ]; }

By the way, the maximum length of a string literal in InterSystems IRIS is 3,641,144 characters. In other words, loading a file or URL directly into it won't work. You can see the other limits in the documentation. To work with large amounts of data, you can use data streams that don't have length restrictions.

set reader = ##class(%XML.Reader).%New()

do reader.Correlate("node","OSM.node")

set url="http://localhost/kaliningrad-latest.osm" write reader.OpenUrl(url)

Important! At this point, most people who try this example for themselves will encounter something horrifying. Instead of a happy «1» (everything's fine), the system will return something starting with «0, STORE...» And that will be disappointing. In other words, the file with what seems to be mBD turned out to be not so micro, and won't fit our object. There wasn't enough memory allocated to it. Can this be fixed? Absolutely. The IRIS data platform allows you to create objects up to 4 TB in RAM. So what went wrong? By default, the size of an object is 256 MB in the system settings. But we need much more than that. And remember, these are RAM requirements. Do you have enough room on your computer/server?

set $ZSTORAGE=170000000

set reader = ##class(%XML.Reader).%New() set reader.UsePPGHandler = 1

do reader.Next(.object) do object.%Save()

ClassMethod Import(url) { Set reader = ##class(%XML.Reader).%New() Set reader.UsePPGHandler = 1 Set status = reader.OpenURL(url) Do reader.Correlate("node","OSM.node") While (reader.Next(.object)) { Do object.%Save() } //back to top of XML file Do reader.Rewind() Do reader.Correlate("way","OSM.way") While (reader.Next(.object)) { Do object.%Save() } Do reader.Rewind() Do reader.Correlate("relation","OSM.relation") While (reader.Next(.object)) { Do object.%Save() } }

do ##class(OSM.osm).Import("http://localhost/kaliningrad-latest.osm")

set xmlScheme = ##class(%XML.Utils.SchemaReader).%New() do xmlScheme.Process("/path/to/AS_ADDROBJ_2_250_01_04_01_01.xsd")

/// Composition and structure of the file with classifier information for FIAS DB elements in address form Class Test.AddressObjects Extends (%Persistent, %XML.Adaptor) [ ProcedureBlock ] { Parameter XMLNAME = "AddressObjects"; Parameter XMLSEQUENCE = 1; /// Classifier for elements in address form Relationship Object As Test.Object(XMLNAME = "Object", XMLPROJECTION = "ELEMENT") [ Cardinality = many, Inverse = AddressObjects ]; }

/// Created from: http://localhost:28869/AS_ADDROBJ_2_250_01_04_01_01.xsd Class Test.Object Extends (%Persistent, %XML.Adaptor) [ ProcedureBlock ] { Parameter XMLNAME = "Object"; Parameter XMLSEQUENCE = 1; /// Global unique identifier of the address object Property AOGUID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "AOGUID", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Formal name Property FORMALNAME As %String(MAXLEN = 120, MINLEN = 1, XMLNAME = "FORMALNAME", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Region code Property REGIONCODE As %String(MAXLEN = 2, MINLEN = 2, XMLNAME = "REGIONCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Autonomy code Property AUTOCODE As %String(MAXLEN = 1, MINLEN = 1, XMLNAME = "AUTOCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Area code Property AREACODE As %String(MAXLEN = 3, MINLEN = 3, XMLNAME = "AREACODE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// City code Property CITYCODE As %String(MAXLEN = 3, MINLEN = 3, XMLNAME = "CITYCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Code of area within city Property CTARCODE As %String(MAXLEN = 3, MINLEN = 3, XMLNAME = "CTARCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Locality code Property PLACECODE As %String(MAXLEN = 3, MINLEN = 3, XMLNAME = "PLACECODE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Planning structure element code Property PLANCODE As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "PLANCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Street code Property STREETCODE As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "STREETCODE", XMLPROJECTION = "ATTRIBUTE"); /// Code of additional element in address form Property EXTRCODE As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "EXTRCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Code of subordinate additional element in address form Property SEXTCODE As %String(MAXLEN = 3, MINLEN = 3, XMLNAME = "SEXTCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Official name Property OFFNAME As %String(MAXLEN = 120, MINLEN = 1, XMLNAME = "OFFNAME", XMLPROJECTION = "ATTRIBUTE"); /// Postal code Property POSTALCODE As %String(MAXLEN = 6, MINLEN = 6, XMLNAME = "POSTALCODE", XMLPROJECTION = "ATTRIBUTE"); /// Federal Tax Service - Private Individual code Property IFNSFL As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "IFNSFL", XMLPROJECTION = "ATTRIBUTE"); /// Federal Tax Service - Private Individual territorial district code Property TERRIFNSFL As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "TERRIFNSFL", XMLPROJECTION = "ATTRIBUTE"); /// Federal Tax Service - Legal Entity code Property IFNSUL As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "IFNSUL", XMLPROJECTION = "ATTRIBUTE"); /// Federal Tax Service - Legal Entity territorial district code Property TERRIFNSUL As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "TERRIFNSUL", XMLPROJECTION = "ATTRIBUTE"); /// Russian Classification on Objects of Administrative Division Property OKATO As %String(MAXLEN = 11, MINLEN = 11, XMLNAME = "OKATO", XMLPROJECTION = "ATTRIBUTE"); /// Russian Classification of Territories of Municipal Formations Property OKTMO As %String(MAXLEN = 11, MINLEN = 8, XMLNAME = "OKTMO", XMLPROJECTION = "ATTRIBUTE"); /// Date of record entry Property UPDATEDATE As %Date(XMLNAME = "UPDATEDATE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Short name of object type Property SHORTNAME As %String(MAXLEN = 10, MINLEN = 1, XMLNAME = "SHORTNAME", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Address object level Property AOLEVEL As %Integer(XMLNAME = "AOLEVEL", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ]; /// Object identifier of the parent object Property PARENTGUID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "PARENTGUID", XMLPROJECTION = "ATTRIBUTE"); /// Unique record identifier. Key field. Property AOID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "AOID", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Record identifier associated with previous historical record Property PREVID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "PREVID", XMLPROJECTION = "ATTRIBUTE"); /// Record identifier associated with next historical record Property NEXTID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "NEXTID", XMLPROJECTION = "ATTRIBUTE"); /// Address object code in one string with validity indicator from Russian Classifier of Addresses (KLADR) 4.0. Property CODE As %String(MAXLEN = 17, MINLEN = 0, XMLNAME = "CODE", XMLPROJECTION = "ATTRIBUTE"); /// Address object code from KLADR 4.0 in one string without validity indicator (last two digits) Property PLAINCODE As %String(MAXLEN = 15, MINLEN = 0, XMLNAME = "PLAINCODE", XMLPROJECTION = "ATTRIBUTE"); /// Validity status of FIAS address object. Current address as of today's date. Usually the last entry about the address object. /// 0 - Not current /// 1 - Current Property ACTSTATUS As %Integer(XMLNAME = "ACTSTATUS", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ]; /// Center status Property CENTSTATUS As %Integer(XMLNAME = "CENTSTATUS", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ]; /// Operation status on record - reason for record's appearance (see description of OperationStatus table): /// 01 – Activation; /// 10 – Addition; /// 20 – Change; /// 21 – Group change; /// 30 – Deletion; /// 31 - Deletion due to the deletion of the parent object; /// 40 – Attachment of the address object (merger); /// 41 – Reassignment due to the merger of the parent object; /// 42 - Termination due to the attachment to another address object; /// 43 - Creation of a new address object due to a merger of address objects; /// 50 – Reassignment; /// 51 – Reassignment due to the reassignment of the parent object; /// 60 – Termination due to segmentation; /// 61 – Creation of a new address object due to segmentation Property OPERSTATUS As %Integer(XMLNAME = "OPERSTATUS", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ]; /// KLADR 4 validity status (last two digits in the code) Property CURRSTATUS As %Integer(XMLNAME = "CURRSTATUS", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ]; /// Start of record operation Property STARTDATE As %Date(XMLNAME = "STARTDATE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// End of record operation Property ENDDATE As %Date(XMLNAME = "ENDDATE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Foreign key to requirements document Property NORMDOC As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "NORMDOC", XMLPROJECTION = "ATTRIBUTE"); /// Current address object indicator Property LIVESTATUS As %xsd.byte(VALUELIST = ",0,1", XMLNAME = "LIVESTATUS", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Address type: /// 0 - not defined /// 1 - municipal; /// 2 - administrative/territorial Property DIVTYPE As %xsd.int(VALUELIST = ",0,1,2", XMLNAME = "DIVTYPE", XMLPROJECTION = "ATTRIBUTE") [ Required ]; Relationship AddressObjects As Test.AddressObjects(XMLPROJECTION = "NONE") [ Cardinality = one, Inverse = Object ]; }

AS_ADDROBJ_20190106_90809714-fe22-45b2-929c-52bd950963e0.XML

Class FIAS.AddressObject Extends (%Persistent, %XML.Adaptor) [ ProcedureBlock ] { Parameter XMLNAME = "Object"; Parameter XMLSEQUENCE = 1; /// Global unique identifier of the address object Property AOGUID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "AOGUID", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Official name Property OFFNAME As %String(MAXLEN = 120, MINLEN = 1, XMLNAME = "OFFNAME", XMLPROJECTION = "ATTRIBUTE"); /// Postal code Property POSTALCODE As %String(MAXLEN = 6, MINLEN = 6, XMLNAME = "POSTALCODE", XMLPROJECTION = "ATTRIBUTE"); /// Short name of object type Property SHORTNAME As %String(MAXLEN = 10, MINLEN = 1, XMLNAME = "SHORTNAME", XMLPROJECTION = "ATTRIBUTE") [ Required ]; /// Address object level Property AOLEVEL As %Integer(XMLNAME = "AOLEVEL", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ]; /// Object identifier of the parent object Property PARENTGUID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "PARENTGUID", XMLPROJECTION = "ATTRIBUTE"); /// Unique record identifier. Key field. Property AOID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "AOID", XMLPROJECTION = "ATTRIBUTE") [ Required ]; }

set reader = ##class(%XML.Reader).%New()

do reader.Correlate("Object","FIAS.AddressObject") set url="http://localhost/AS_ADDROBJ_20190106_90809714-fe22-45b2-929c-52bd950963e0.XML" write reader.OpenUrl(url)

do reader.Next(.object) do object.%Save()

ClassMethod Import() { // Create object to read XML Set reader = ##class(%XML.Reader).%New() // Get source XML for parsing Set status = reader.OpenURL("http://localhost/AS_ADDROBJ_20190106_90809714-fe22-45b2-929c-52bd950963e0.XML") If $$$ISERR(status) {Do $System.Status.DisplayError(status)} // Join object with the right sample structure Do reader.Correlate("Object","FIAS.AddressObject") // Read and save the object in storage While (reader.Next(.object,.status)) { Set status = object.%Save() If $$$ISERR(status) {do $System.Status.DisplayError(status)} } // If an error occurs during parsing, display a message If $$$ISERR(status) {Do $System.Status.DisplayError(status) }

do ##class(FIAS.AddressObject).Import()

Good news: Gartner has just completed its annual collection of real user ratings and feedback in the category of DBMS and used this information to publish its rankings of the best DBMSs of 2019. InterSystems Caché and InterSystems IRIS Data Platform received the highest rating for «Customers' Choice.» You can check out which products were considered and how they were rated.