1. Introduction

Internet of Things (IoT) is a rapidly evolving paradigm, ranging over multiple different vital domains such as manufacturing, smart cities, precision agriculture, and smart hospitals addressing the critical mission of data aggregation. Data integrity refers to the reliability and consistency of the data over its entire lifecycle, from sensor detection to cloud storage.

In this paper, a system which ensures the integrity of IoT aggregated field data is proposed and its corresponding alpha prototype implementation is demonstrated in the precision agriculture domain. The system’s core is an innovative distributed ledger (DLT) implementation which securely process the aggregated field data and its uniqueness lies in the embedded use of IOTA’s ledger, called “The Tangle”, used to transmit and store the data. The combination of an immutable ledger of information (in our case, sensor data) and the use of cryptographic primitives, such as public key cryptography, transforms the IoT nodes from data collectors to data owners. The data is stored in an anonymous and secure fashion on the ledger, while the IoT node is the sole actor that can procure access to the data streams. This envisioned Edge-centric architecture where each IoT node is a autonomous unit in a “swarm” of nodes that belong to the same stakeholder and perform the same “activities” is in tune with the latest industry advancements in the area of IoT, where progressively more activities are migrated from the cloud to the Edge layer. The system’s Super Node (SN), aggregates the data from the sensors, packages them into transactions and pushes them in the IOTA network. Access control to the data stored in the Tangle is provided by the Masked Authenticated Messaging (MAM) which uses appropriate keys to manage encrypted data streams over the Tangle. MAM, on the one hand, empowers the users to have a fine-grained access control over sensor-owned data existing in the Tangle and on the other hand, a sensor data marketplace can be built on top of it, exploiting the monetization of sensor data while also acting as an incentive to further install IoT nodes. The system design was first conceptualized in terms of architecture and envisioned functionality in previous work [ 1 ], while this publication attempts to highlight an actual implementation. Regarding the implementation, it constitutes an alpha version of the system which showcases the most critical aspects of the architecture as depicted in Figure 1 . Additionally, it is deployed in an actual use-case demonstrating promising results with respect to the introduction of DLT technologies in the IoT infrastructure. The use-case comprises IoT Nodes deployed in an actual precision agriculture farm, called IoT nodes, with the collected data send to a gateway node functioning as an Super Node which is responsible for aggregating all sensor data into a data log and broadcasting them over MAM.

In a nutshell, the proposed system highlights how a DLT implementation can enable new functionality in existing IoT systems, while solving critical issues in terms of privacy and security. Moreover, a prototype implementation is presented demonstrating its effectiveness while at the same time enriching the open source community build around the IOTA protocol initiative [ 2 ].

1.1. IOTA Constituents directly 2 transactions, it also references indirectly all the transactions that are referenced by those two, either directly or indirectly . The current implementation offers a reference algorithm [ The IOTA protocol and cryptocurrency were introduced in early 2015, illustrating the use of a Directed Acyclic Graph (DAG) in lieu of a blockchain [ 3 4 ]. As Figure 2 depicts, IOTA’s ledger, “The Tangle”, stores all the transactions that are issued in the network by actors called IOTA Full Nodes (FN). In a blockchain each block aggregates a large number of transactions and is connected to the previous block of usually the longest chain depending on the consensus rule. In IOTA, each transaction references two (2) other transactions in the DAG so in order to issue a transaction, an agent needs to verify two (2) previous transactions. Albeit each transaction references2 transactions, it also referencesall the transactions that are referenced by those two, eitheror. The current implementation offers a reference algorithm [ 5 ] to the FN for the choice of the two transactions but there is no enforcement by the protocol. The protocol will give incentives to the FN to choose transactions that are not referenced already, called “Tips”, in order for the graph to grow organically. On top of that, IOTA is built using ternary logic, thus using trytes instead of bytes for its various operations (e.g., hashing) [ 6 ]. Each user has a unique string formed from 81 trytes called a seed, from which the protocol generates all the user’s private and public keys. A transaction, as depicted in Figure 3 , is the smallest unit of data in the IOTA protocol consisting of 2673 trytes (1589 bytes) and can be used to transfer both value (IOTA tokens) and data (1300 bytes). It is worth noting that as the IOTA network is enriched with new transactions, the size of the Tangle grows substantially. For this reason, the network is currently performing network synchronization or local snapshots at fixed time intervals, eliminating all zero transactions.

1.2. IOTA Functionality As noticed in Figure 4 , IOTA uses a seed in order to generate multiple private and public keys, as the signing scheme of IOTA is similar to the Winternitz type signature scheme [ 7 ]. Thus, each private key should be used only once to sign a transaction, as key re-use leads to forge-ability. To issue a transaction, the FN needs to perform 3 distinct actions: (i) sign the transaction using a unique private key and store the signature in the transaction; (ii) use the reference algorithm to choose 2 transactions; (iii) perform Proof of Work (PoW); and finally (iv) broadcast the transaction in the network. It is worth noting that in IOTA, PoW is not used to achieve consensus but rather as a spam counter-measure. When a FN receives a transaction from the network, it verifies the validity of the above steps and moreover verifies the validity of all the transactions directly or indirectly referenced. This verification includes both the validity of the transaction structure as well as the absence of conflicts. This process is conducted each time by a different FN that has different view of the Tangle, since at a particular time no FN knows all the transactions that are currently being issued in the network due to lag. Consensus is achieved by collaboratively assessing the state of the Tangle as each FN verifies a specific subset of the whole Tangle. As these subsets overlap, different FN agree on their view of the Tangle and progressively the whole system achieves equilibrium. Thus, a transaction can have different levels of acceptance by the network, depending on how many different nodes have accepted it. The system will let the users to decide what acceptance level is adequate to consider a transaction legitimate and thus proceed with the exchange of goods or services. Currently, the aforementioned mechanism remains a concept in the IOTA white paper and network consensus is achieved using a centralized architecture. The IOTA foundation runs a private special Node called "The Coordinator", which issues regularly special transactions that are called “milestones”. Every transaction that is directly or indirectly referenced by a milestone is considered as accepted. This particular irregularity is also discussed in Section 6

1.3. MAM Specification chid (channel-id) and corresponds to a private key used to sign endpoints and/or messages. Each endpoint is identified by a public key, called epid (endpoint-id), corresponding to a private key used to sign messages. The central endpoint is the endpoint whose epid = chid . IOTA is also the medium to transfer data in a fashion that ensures their integrity. Using the MAM protocol, a mechanism called channel broadcasters, creates channels to which other mechanisms called channel subscribers, subscribe. MAM’s latest specification is released by the ITsec lab of the Belarusian State University [ 8 ]. As illustrated in Figure 5 , each channel is divided into endpoints from which messages are splitted into multiple packets and broadcasted. Each channel is identified by a public key, called(channel-id) and corresponds to a private key used to sign endpoints and/or messages. Each endpoint is identified by a public key, called(endpoint-id), corresponding to a private key used to sign messages. The central endpoint is the endpoint whose

1.4. MAM Protocol WOTS , MSS , NTRU , Protobuf3 , and MAM2 layers. Winternitz One-Time Signatures (WOTS) layer generates private/public keys and signatures, it verifies a signature and it recovers a public key from a signature. As the layer’s name suggests, the signing scheme uses the Winternitz signatures. Merklee-tree Signature Scheme ( MSS layer) [ 2 d signatures of different messages, where d is asserted as d ≤ 20. A Merklee tree is a binary tree where the leaves are the public keys of a corresponding WOTS instance, thus 2d instances, and each tree layer is hashed until the root which stands as the public key of the whole tree. NTRU supports the use of NTRU-style public key encryption [ Protobuf3 supports the higher-level encoding, decoding and cryptographic processing of the data. In essence, Protobuf3 is a language based on the Protocol Buffers Version 2 notation [ MAM2 layer is responsible for the high-level operations of the protocol, like sending and receiving messages. MAM’s protocol consists of many different layers which are characterised by a specific state. In order to illustrate MAM’s usage in the proposed design and implementation, it is pertinent to reference the, andlayers. Winternitz One-Time Signatures (WOTS) layer generates private/public keys and signatures, it verifies a signature and it recovers a public key from a signature. As the layer’s name suggests, the signing scheme uses the Winternitz signatures. Merklee-tree Signature Scheme (layer) [ 9 ] is responsible for generatingsignatures of different messages, where d is asserted as d ≤ 20. A Merklee tree is a binary tree where the leaves are the public keys of a correspondinginstance, thus 2d instances, and each tree layer is hashed until the root which stands as the public key of the whole tree.supports the use of NTRU-style public key encryption [ 10 ].supports the higher-level encoding, decoding and cryptographic processing of the data. In essence,is a language based on the Protocol Buffers Version 2 notation [ 11 ].layer is responsible for the high-level operations of the protocol, like sending and receiving messages. While the full specification and apt description of each algorithm and data structure can be found in the MAM specification, it is crucial to outline certain high-level algorithms. The creation of a data stream by an agent, starts with the creation of a proper channel. When creating a channel, the protocol creates a Merklee tree, where the Merklee Tree Root MTR = chid . The inputs of the CreateChannel algorithm are height d (as described above) and a channel name. The output is the chid . Having generated a channel, the agent can generate an endpoint by running the CreateEndpoint routine, where input is a height d, a channel name and an endpoint name. The output is the epid . Finally, in order to broadcast a message, the agent needs to procure a Header data structure comprised of a message_id, a type_id, a session key and a KEY (optional). The session key can be encrypted using the KEY (either using a pre-shared key or a NTRU public key ). This is crucial as it grants access control to each message, where only the key owner (either having the pre-shared key or the NTRU private key ) will be able to decrypt the session key and access the data packets. A message, finally, has M ≥ 1 data packets, depending on the data size.

1.5. MAM in IOTA IOTA serves as the transport layer of the MAM data layer protocol; thus, MAM messages are transported using the Tangle. MAM messages are split into fragments and each fragment is encapsulated into an IOTA transaction. Transactions are divided into 2 categories: Header transactions that encapsulate the Channel, the Endpoint and the Header data structures, and the Packet transactions that encapsulate data packets. Transaction objects have numerous metadata but two of them are relevant to the MAM protocol: (i) the tryte address field (81 trytes) = chid and (ii) the tryte metadata field (27 trytes) = msgid + (0 OR packet order). Both address field and metadata field are strings made of [X] trytes. The metadata_field is a contention of the message identifier and a “0” if it is a header message or the packet’s number if it is a packet transaction. The latter is very important, because using the IRI’s API, an agent can easily find all the header transactions registered to a specific address, read the session key and then again using the metadata, find and decrypt all the transactions with the specific msgid . The aforementioned data structures are presented below, in pseudocode. An IOTA transaction will either have the mam_header structure or the mam_packet structure encapsulated in the data field. { struct mam_endpoint { trits name; mam_mss mss; }; struct mam_channel { trits name; trits msg_ord; mam_endpoint_set endpoints; trint endpoint_ord; } where mam_mss is a data structure that refers to a Merklee Tree which holds the signatures as described, msg_ord a trit that is incremented each time a message is added in the channel, mam_endpoint_set is a set of all the active endpoints in the channel and endpoint_ord is a trint that is incremented with every endpoint added to the channel struct mam_header { mam_channel channel; mam_endpoint endpoint; mam_channel new_channel: mam_endpoint: new_endpoint; mam_psk psk_keys; mam_ntru ntru_keys; mam_msg_id msg_id; mam_msg_id_type type; } where mam_channel is a reference to the mam_channel data structure (it either references the current channel or a new one (fork), mam_endpoint a reference to the mam_endpoint data structure (it either references the current endpoint or a new one), mam_psk a set of the relevant Pre-Shared Keys (PSKs) that are required to unlock the session_key, mam_ntru a set of the relevant NTRU public keys which belong to a private key that unlock the session_key, mam_msg_id: the id of the message and mam_msg_id_type type: a message that can either be a MAM message or a Public Key Certificate. struct mam_packet { context ctx; mam_checksum checksum; mam_payload payload; } where context references the current state of the MAM library (and its various layers), mam_checksum the checksum of the packet so as the reader can verify it’s integrity and mam_payload the payload that encapsulates a segment of the streamed data. The above data structures are used in the MAM2 reference implementation in C and are based on the MAM specification. The structures are different from those described in the specification as these are the actual data structures that are encapsulated into the IOTA transactions while the others are Protobuf3 models.