A short while ago I was working on a PL/SQL task where I needed to loop through some particular files in a folder. As always, I aimed for a solution that is simple and requires less effort to get the result. Let’s see what I came up with.

Task description

The task was to find a good solution to list all the required files in a directory. The directory is on the same Linux server where the database server is running. In this specific case I had to list all the csv files.

For testing purposes I created a folder with the following files:

excel_file.xlsx file_1.csv file_2.csv file_3.csv file_4.csv

First ideas

My first idea was to write a shell script that lists the files into a simple text file ( file_list.txt ) and then to process it with PL/SQL.

The shell script would have looked like the following:

#!/bin/bash cd /home/tcs_test/test/files ls -p *.csv > file_list.txt

Giving the result:

file_1.csv file_2.csv file_3.csv file_4.csv

Then I would have simply looped through the lines of file_list.txt (for now only in script format, not as a package):

declare f_file_list utl_file . file_type ; v_file_name varchar2 ( 256 ) ; begin f_file_list : = utl_file . fopen ( 'TEST_FILES' , 'file_list.txt' , 'r' ) ; loop begin utl_file . get_line ( f_file_list , v_file_name ) ; exception when no_data_found then exit ; end ; dbms_output . put_line ( v_file_name ) ; end loop ; utl_file . fclose ( f_file_list ) ; end ; /

Result ( DBMS_OUTPUT ) is correctly:

file_1.csv file_2.csv file_3.csv file_4.csv

I also had a second idea: keeping the shell script, but using an external table to read file_list.txt , instead of looping through it “manually”.

My biggest concern with both solutions was that we have to make sure that the shell script is always executed before processing the file list. Obviously, the files on the server can change and we want to get the up-to-date result. Of course, we can execute a shell script from PL/SQL, but it just seemed too much hassle for this small task.

The final solution

After some research I discovered that there was a simple and effective way to do all of this, being available from Oracle 11g. We can create an external table with a preprocessor. We can define - among others - a shell script as a preprocessor that will generate the source data for our external table. And of course, the preprocessor always runs upon querying the table.

Let’s go through all the necessary steps to make this work.

First of all, we need to define two SQL directories: one for the preprocessor script and another for the directory containing the files. They could be in the same folder, but I would encourage everyone to keep the preprocessor scripts in a separate folder.

create directory PREPROC_DIR as '/home/tcs_test/test/preprocessors' ; grant read , execute on directory PREPROC_DIR to PANEL ; create directory TEST_FILES as '/home/tcs_test/test/files' ; grant read , write on directory TEST_FILES to PANEL ;

If the directories are created with the admin user, then we need to grant the necessary privileges to the user (schema) that we will work with.

Please note that I only granted the minimum required privileges.

Now let us see how the preprocessor script looks like:

preroc_list_csv_files.sh

#!/bin/bash cd /home/tcs_test/test/files /bin/ls -p *.csv

As we can see it is just a simple ls command to list all the csv files. For a certain reason we need to cd into the /home/tcs_test/test/files folder. A few lines later I will talk about a dummy.txt file that is stored there.

The last remaining step is to actually create the external table:

CREATE TABLE panel . preproc_ext_table ( ls_line VARCHAR2 ( 256 ) ) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY TEST_FILES ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR PREPROC_DIR: 'preproc_list_csv_files.sh' FIELDS ( ls_line POSITION ( 1 : 256 ) ) ) LOCATION ( 'dummy.txt' ) ) REJECT LIMIT UNLIMITED ;

The table has only one column referring to the file name. We need to specify the preprocessor directory ( PREPROC_DIR ) and the default directory ( TEST_FILES ).

It is important to highlight: thanks to Oracle, the dummy.txt must exist even if it is not used. In our case the shell script does not have to read anything from that file, however, for more complicated cases we can create a more dynamic preprocessor script by making it read this file. For example, the dummy.txt file could contain a list of directories where the files are to be listed from.

It is important that the Linux Oracle user has the permission to both read and write the dummy.txt file.

That is all there is to it, from now on we may just enjoy the automatic functioning of our work by running the following query:

select * from panel . preproc_ext_table ;

The result is as expected:

file_1.csv file_2.csv file_3.csv file_4.csv

Upon querying the external table the preprocessor script always runs, therefore we always get the up-to-date list of files.

It is often said that it is always the first thought that is right, well, I am happy that I didn’t go for it this time …