Reading time: 40 minutes | Coding time: 20 minutes

A Rope data structure is a tree data structure which is used to store or manipulate large strings in a more efficient manner. It allows for operations like insertion, deletion, search and random access to be executed faster and much more efficiently in comparison to a traditional String. We will look into this data structure and its various operations and implementation.

This data structure is widely used by softwares such as text editors like Sublime, email systems like Gmail and text buffers to handle large strings efficiently.

Introduction

A rope is a type of binary tree i.e. each node can have maximum of 2 children at the maximum, where every leaf node holds a String or a Substring and the length of the same(Also known as the "weight" for the corresponding leaf node) and every following parent node holds the total sum of the weights of the leaf nodes in its left subtree, which in turn is described as the weight of the said node. In a more simpler form, the weight of a node is the sum of all the leaf nodes that sprout from its immediate left child node.

In this article we will understand the Rope Data Structure along with its comparison to a standard string to determine when or when not to use it, along with its advantages & disadvantages. There is a wide variety of applications for the Rope, however the most common is found in text editors which handle large amounts of text.

Rope Data Structure

A rope is made up of nodes, arranged into a binary tree. As a tree, its starting node is marked as the Root and has a left & right child node which further have their child nodes and those with no children are known as leaf nodes. If we take the number of levels to be h, where h=0 for the root, the maximum possible nodes in a rope can be 2(h+1)-1.

Before diving into the implementation, let us compare this data structure with a string:

Comparison with String

Advantages

Ropes enable much faster insertion and deletion of text in comparison to string arrays, on which the operations have a time complexity of O(n).

Ropes do not require O(n) extra memory when being operated upon, unlike arrays which require it for copying operations.

Ropes do not need large contiguous memory areas like arrays.

If the operations performed are non-destructive(i.e the altered content is preserved) it behaves as a persistent data structure allows for multiple undo levels.

Stable performance regardless of size.

Disadvantages

Occupies greater overall storage space in comparison to a simple string, when not being operated upon, in order to store the parent nodes. However, this trade-off reduces as the string grows.

Increased complexity of structure creates a greater risk of bugs.

Significantly slower on small strings.

From this, we can derive that ropes are more preferable when the size of our data is large and it is to be modified on a regular basis. Strings perform better on smaller data & are better when there are less operations.

Operations

Search(Index I)

In order to find the character at ith position, we search recursively beginning at the root node.

We follow the premise that if the weight of the current node is lower than the value of i, we subtract the weight from i & move right. If the weight is less than value of i we simply move left. We continue till the point we reach a leaf node.

function index(RopeNode node, integer i) if node.weight < i and exists(node.right) then return index(node.right, i - node.weight) end if exists(node.left) then return index(node.left, i) end return node.[i]string end

Average case Time complexity: Θ(N)

Average case Time complexity: Θ(log n)

Concatenation(S1+S2)

A concatenation operation between 2 strings(S1 & S2) is performed by creation of a new root node which has a weight equal to the sum of weights of leaf nodes in S1. This takes time if the tree is already balanced. Since, most of the rope operations need a balanced tree, it might require re-balancing after the operation.

function concatenation(ropenode node1, ropenode node2) newrope.left = node1 newrope.right = node2 newrope.val = length(node1)

Time complexity: Θ(1) to concatenate strings and Θ(log N) to calculate weight of root.

Split(I)

In order to split a string at any given point i, there are 2 major cases we might encounter:

Split point being the last character of a leaf node. Split point being a middle character of a leaf node.

With the second case, we can reduce it to the much simpler first one by splitting the string at given point into 2 leaf nodes & creating a new parent for these component strings. Follow this with subtracting the weights of the cut off parts from the parent nodes and re-balancing the tree.

Time complexity: Θ(log N)

Insertion(I,S')

In order to insert a string S' between our string, at position i, we simply need to split it at the index from which to insert, and the concatenate the new string followed by the split off part of the original string. The cost of this operation is the sum of the three operations performed.

function insert(index i, string S') newrope = split(i) concatenation(S',newrope) concatenation(rope,S')

Time complexity: Θ(log N)

Deletion(I,J)

In order to delete a part of string from the middle, split the string at provided indices from ith to i+jth character and then concatenate the strings without the remaining part.

function delete(index i, index j) splitrope = split(j) remove = split(i) concatenation(rope,splitrope)

Time complexity: Θ(log N)

Report(I,J)

In order to report the string between given points, find the node that contains the starting character and whose weight is greater than the ending index j. Then traverse the tree starting at this node and output all the characters by performing an in-order traversal operation.

Time complexity: Θ(j + log N)

Implementation

Implementation of a Rope from a provided string:

class Rope(object): #Accepts a string or list of strings and forms it into a rope. #This is the basic implementation which forms a rope with the input data. #Create an object and pass the string or list as the argument to the constructor to form a rope. #Ex - s= Rope('abcxyz') or s = Rope(['abc','def','ghi']) #Simply use print on the instance of this class in order to print the entire string. Ex - print(s) def __init__(self, data=''): #check if the input data is string or not if isinstance(data, list): if len(data) == 0: self.__init__() elif len(data) == 1: self.__init__(data[0]) else: # Round-up division (to match rope arithmetic associativity) idiv = len(data) // 2 + (len(data) % 2 > 0) self.left = Rope(data[:idiv]) self.right = Rope(data[idiv:]) self.data = '' self.length = self.left.length + self.right.length elif isinstance(data, str): self.left = None self.right = None self.data = data self.length = len(data) else: raise TypeError('Kindly use strings for the purpose') # Word iteration self.current = self #checks if the tree is balanced def __eq__(self, other): if (self.left and self.right) and (other.left and other.right): return self.left == other.left and self.right == other.right elif (self.left and self.right) or (other.left and other.right): return False else: return self.data == other.data #concatenation of strings #forms a new rope with other string and joins the 2 ropes #into a new rope def __add__(self, other): #ensure that the string being concatenated is of type string if isinstance(other, str): other = Rope(other) r = Rope() r.left = self r.right = other r.length = self.length + other.length r.current = self return r #Fetch the length of string in the specified rope def __len__(self): if self.left and self.right: return len(self.left.data) + len(self.right.data) else: return(len(self.data)) #fetch the word present at the specified index def __getitem__(self, index): #ensure the index specified is an integer if isinstance(index, int): if self.left and self.right: if index < -self.right.length: subindex = index + self.right.length elif index >= self.left.length: subindex = index - self.left.length else: subindex = index if index < -self.right.length or 0 <= index < self.left.length: return self.left[subindex] else: return self.right[subindex] else: return Rope(self.data[index]) elif isinstance(index, slice): if self.left and self.right: start = index.start if index.start is None: if index.step is None or index.step > 0: head = self.left else: head = self.right elif (index.start < -self.right.length or 0 <= index.start < self.left.length): head = self.left if index.start and index.start < -self.right.length: start += self.right.length else: head = self.right if index.start and index.start >= self.left.length: start -= self.left.length stop = index.stop if index.step is None or index.step > 0: if (index.stop is None or -self.right.length <= index.stop < 0 or index.stop > self.left.length): tail = self.right if index.stop and index.stop > self.left.length: stop -= self.left.length else: if head == self.right: tail = self.right stop = 0 else: tail = self.left if index.stop < -self.right.length: stop += self.right.length else: if (index.stop is None or index.stop < (-self.right.length - 1) or 0 <= index.stop < self.left.length): tail = self.left if index.stop and index.stop < (-self.right.length - 1): stop += self.right.length else: if head == self.left: tail = self.left stop = -1 # Or self.left.length - 1 ? else: tail = self.right if index.stop >= self.left.length: stop -= self.left.length # Construct the rope as a binary tree if head == tail: return head[start:stop:index.step] else: if not index.step: offset = None elif index.step > 0: if start is None: delta = -head.length elif start >= 0: delta = start - head.length else: delta = max(index.start, -self.length) + tail.length offset = delta % index.step if offset == 0: offset = None else: if start is None: offset = index.step + (head.length - 1) % (-index.step) elif start >= 0: offset = index.step + min(start, head.length - 1) % (-index.step) else: offset = index.step + (start + head.length) % (-index.step) if not tail[offset:stop:index.step]: return head[start::index.step] else: return head[start::index.step] + tail[offset:stop:index.step] else: return Rope(self.data[index]) #Report the characters present inside the rope def __repr__(self): if self.left and self.right: return '{}{} + {}{}'.format('(' if self.left else '', self.left.__repr__(), self.right.__repr__(), ')' if self.right else '') else: return "Rope('{}')".format(self.data) #Check if the current node is a leaf node and print its value #otherwise iterate down a step on both sides def __str__(self): if self.left and self.right: return self.left.__str__() + self.right.__str__() else: return self.data #The iterator def __iter__(self): return self #Iteration down the tree def __next__(self): if self.current: if self.left and self.right: try: return next(self.left) except StopIteration: self.current = self.right return next(self.right) else: self.current = None return self.data else: raise StopIteration def next(self): return self.__next__() firstRope = Rope('hello_my_name_is') secondRope = Rope('_rope_data_structure') #concatenate these strings fullRope = firstRope.__add__(secondRope) print(fullRope) #Gives result hello_my_name_is_rope_data_structure #Show the character at specified index print(fullRope.__getitem__(14)) #Displays character "i" which is at index 14

Complexity

Worst case time complexity: Θ(n)

Average case time complexity: Θ(log N)

Best case time complexity: Θ(log N)

Space complexity: Θ(n)

Applications

The areas where the Rope data structure is used are: