It’s been a long time since my last post – more than 4 months, very busy and exciting time for me. The main technical topic I have been working on is deployment of iRODS[1] for management of tens of PB of data, analyed by hundreds of engineers with the use of classical HPC resources, cloud providers and traditional desktops in company network. I hope I’ll have time to share universal part of it in series of posts. Starting with the one today about…

Ansible based automated deployment of iRODS grid.

Ansible has been my orchestration choice for more than 5 years now, so it was kind of natural to develop irods-srv role[2]. It automates the process of installation of both iRODS catalogue provider and consumer (aka resource). Recommended installation process of iRODS makes use of a python script setup_irods.py that gets additional information like iRODS zone name, port and various passwords interactively from administrator performing the installation. This can be automated by either sending appropriate answers to standard input or by --json_configuration_file option given to the script /var/lib/irods/scripts/setup_irods.py while executing it. My goal was not only to make it working, but also to understand the details, so I decided to build my own “unattended_installation.json”.

If you have existing iRODS grid you’d like to use as a base for automated installation with “`.json“` file you can execute izonereport command to get its full configuration.

If you’ll see error message: ERROR - failed in call to rcZoneReport - -154000 as a result of izonereport it may mean that your server_config.json attributes are missing – in my case it was schema_version [3].

To make sure that you have only necessary fields you can check irods configuration schemas [5]. Following this I’ve found a few additional fields in server_config section not required by schema validation being critical for replication functionality. The whole section is really used to create a separate irods configuration file /etc/irods/server_config.json .

Errors you may see executing irepl

Action: Replication between resources on two servers. You may see different errors in absence of specific advanced_settings Result: Replication fails with error:

remote addresses: 127.0.1.1 ERROR: replUtil: repl error for /HPCC/home/rods/testFile, status = -1800000 status = -1800000 KEY_NOT_FOUND

rodsLog contains message: May 3 15:00:16 pid:10026 remote addresses: 127.0.0.1, 192.168.112.31 ERROR: iRODS Exception: file: /tmp/tmppTB_kL/lib/core/include/irods_configuration_parser.hpp function: T &irods::configuration_parser::get(const key_path_t &) [T = const int] line: 105 code: -1800000 (KEY_NOT_FOUND) message: key "transfer_buffer_size_for_parallel_transfer_in_megabytes" not found in map. Replication between resources on two servers. You may see different errors in absence of specificReplication fails with error:rodsLog contains message: Resolution: Add transfer_buffer_size_for_parallel_transfer_in_megabytes to advanced_settings dictionary in server_config.json Result: irepl -v shows “0” threads for the replication. In fact one thread transfer is done over the icat server.

rodsLog contains message: May 3 16:42:08 pid:1186 remote addresses: 127.0.0.1 ERROR: getNumThreads: acGetNumThreads error, status = -1800000

Resolution: Add “default_number_of_transfer_threads” to “advanced_settings” dictionary in server_config.json. Result: irepl command fails with error message: remote addresses: 127.0.1.1 ERROR: replUtil: repl error for /HPCC/home/rods/testFile, status = -1800000 status = -1800000 KEY_NOT_FOUND”

rodsLog contains information: May 3 16:47:16 pid:1230 remote addresses: 127.0.0.1 ERROR: iRODS Exception: file: /tmp/tmppTB_kL/lib/core/include/irods_configuration_parser.hpp function: T &irods::configuration_parser::get(const key_path_t &) [T = const int] line: 105 code: -1800000 (KEY_NOT_FOUND) message: key "maximum_size_for_single_buffer_in_megabytes" not found in map. stack trace: -------------- [stack trace here] May 3 16:47:16 pid:1230 NOTICE: rsDataObjRepl - Failed to replicate data object. shows “0” threads for the replication. In fact one thread transfer is done over the icat server.rodsLog contains message:Add “default_number_of_transfer_threads” to “advanced_settings” dictionary in server_config.json.command fails with error message:rodsLog contains information: Resolution: Add maximum_size_for_single_buffer_in_megabytes to advanced_settings in server_config.json dictionary.

I gathered discovered issues in pull request to iRODS configuration schema repository[5] to get feedback from the team. I think that all of them should be marked as required. One of concerns ( default_number_of_transfer_threads ) affects multithreaded transfers only and falls back to single thread, but still ends-up with error message in rodsLog and omitting the key in configuration file can’t be a recommended way to achieve the behavior.

Another section of special interest in .json file we have to pass to setup_irods.py script is hosts_config which is actually the content of the /etc/irods/hosts_config.json file after the installation. This file is kind of iRODS’ own /etc/hosts file. If you can fully rely on DNS you don’t have to use it at all. However, in some complicated scenarios, like DNS round-robin servers providing access to shared file system it may help to make sure that each individual server won’t be redirecting traffic to others in his round-robin group. I decided to build it based on ansible dictionary that should be shared between all hosts in the grid. Storing this dictionary in ansible group_vars may be a convenient way to distribute it between all servers in the grid. Compared to /etc/hosts every server in hosts_config.json is specified as either local or remote . I assumed that ansible_default_ipv4.address will be on the list of IPs configured for each host, based on its existence on the list template selects local type. For the details please check the contents of hosts_config.json.j2 [6].

Obviously, the approach with dedicated python script executed as part of installation process is not straightforward to integrate with orchestration frameworks like ansible. The important aspect of orchestration is that the same code that is used for deployment is used to maintain long term services configuration. Unfortunately iRODS setup_irods.py will just fail when executed on already configured catalogue server – with error message stating that: IrodsError: Database specified already in use by iRODS. We can overcome this in ansible using failed_when and ignore_errors construct, nevertheless I think that executing this script on iRODS installation that is already running is not a good idea at all. It’s not tested in such scenario and it has additional side effects of resource creation and execution of iput/iget commands.

My approach is to execute the script only during the first installation with configuration JSON file created from template where sections responsible for specific files are included from separate templates. Those templates are then be used to generate files like “server_config.json” or “hosts_config.json” when servers are in operation.

Current status of the role is working, but for sure there is a lot of things that can be improved. The next step in development will be finaliazation of molecule based CI followed by code restructuring.

Since my main focus was on deployment of new environment now it will be very difficult to work on the role without the possibility to quickly recreate development environment from scratch. To automate the process I decided to use vagrant[7]. In the first release of the role on github you’ll find Vagrantfile as well. Thanks to it, you can start playing with the role and/or iRODS fairly easily – simply clone the repository and start three (1 catalogue + 2 resource) node iRODS grid just executing vagrant up [8] like on the snippet below:

cinek@cinek-schmd:~/git-repos/ansible-role-irods-srv$ vagrant up --provider=libvirt Bringing machine 'icat' up with 'libvirt' provider... Bringing machine 'ires-a1' up with 'libvirt' provider... Bringing machine 'ires-a2' up with 'libvirt' provider... [...] PLAY RECAP ********************************************************************* ires-a2 : ok=16 changed=13 unreachable=0 failed=0 cinek@cinek-schmd:~/git-repos/ansible-role-irods-srv$ vagrant ssh ires-a2 Last login: Fri May 3 17:58:39 2019 from 192.168.121.1 [vagrant@ires-a2 ~]$ [vagrant@ires-a2 ~]$ [vagrant@ires-a2 ~]$ ping ires-a1.local PING ires-a1.local (192.168.112.31) 56(84) bytes of data. 64 bytes from ires-a1.local (192.168.112.31): icmp_seq=3 ttl=64 time=0.916 ms ^C [root@ires-a2 vagrant]# sudo su [root@ires-a2 vagrant]# su - irods Last login: Fri May 3 18:15:37 UTC 2019 on pts/0 -bash-4.2$ ils /HPCC/home/rods: -bash-4.2$ ilsresc demoResc:unixfilesystem -bash-4.2$ mkdir /var/lib/irods/testResource -bash-4.2$ iadmin mkresc testA2 unixfilesystem ires-a2.local:/var/lib/irods/testResource/ Creating resource: Name: "testA2" Type: "unixfilesystem" Host: "ires-a2.local" Path: "/var/lib/irods/testResource/" Context: "" -bash-4.2$ dd if=/dev/zero of=/tmp/testFile bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.147028 s, 713 MB/s -bash-4.2$ iput /tmp/testFile -bash-4.2$ ils -l /HPCC/home/rods: rods 0 demoResc 104857600 2019-05-03.18:18 & testFile -bash-4.2$ ls /var/lib/irods/testResource/ -bash-4.2$ irepl -R testA2 testFile -bash-4.2$ ils -l /HPCC/home/rods: rods 0 demoResc 104857600 2019-05-03.18:18 & testFile rods 1 testA2 104857600 2019-05-03.18:18 & testFile -bash-4.2$ ls /var/lib/irods/testResource/ home -bash-4.2$ ls /var/lib/irods/testResource/home/rods/testFile /var/lib/irods/testResource/home/rods/testFile -bash-4.2$ ls -l /var/lib/irods/testResource/home/rods/testFile -rw-------. 1 irods irods 104857600 May 3 18:18 /var/lib/irods/testResource/home/rods/testFile

As you can see I just started 3 CentOS7 VMs and applied appropriate ansible playbook that ended up with iRODS grid running on my laptop (just executing vagrant up ). Then I logged in to one of the resource servers ires-a2 , defined a resource on it and used iput to upload the file to resource on catalog server. Finally, the file was replicated by irepl back to ires-a2 .

Comments appreciated, especially because I’m thinking about more posts on iRODS in the near future!

[1] https://irods.org/

[2] https://github.com/cinek810/ansible-role-irods-srv

[3] https://github.com/irods/irods_schema_configuration/pull/40

[4] https://github.com/irods/irods/tree/4-2-stable/configuration_schemas

[5] https://github.com/irods/irods_schema_configuration/pull/39

[6] https://github.com/cinek810/ansible-role-irods-srv/blob/master/roles/irods-srv/templates/hosts_config.json.j2

[7] https://www.vagrantup.com/

[8] https://github.com/cinek810/ansible-role-irods-srv/blob/master/README.md