RedShift Unload All Tables To S3

RedShift unload function will help us to export/unload the data from the tables to S3 directly. It actually runs a select query to get the results and them store them into S3. But unfortunately, it supports only one table at a time. You need to create a script to get the all the tables then store it in a variable, and loop the unload query with the list of tables. Here I have done with PL/SQL way to handle this. You can Export/Unload all the tables to S3 with partitions.

If you didn’t take a look at how to export a table with Partition and why? Please hit here and read about the importance of it.

Why this procedure actually doing?

It will get the list of schema and table in your database from the information_schema . Store this information in a variable. IAM role, Partitions are hardcoded, you can customize it or pass them in a variable. A FOR LOOP will run the unload query for all the tables.

Variables:

list - list of schema and table names in the database.

db - Current connected database name.

tablename - table name (used for history table only).

tableschema - table schema (used for history table only).

starttime - When the unload the process stated.

endtime - When the unload process end.

SQL - its a select * from , but if you want to change like where timestamp >= something you can customize this variable.

, but if you want to change like you can customize this variable. s3_path - Location of S3, you need to pass this variable while executing the procedure.

iamrole - IAM role to write into the S3 bucket.

delimiter - Delimiter for the file.

max_filesize - Redshift will split your files in S3 in random sizes, you can mention a size for the files.

un_year, un_month, un_day - Current Year, month, day

unload_query - Dynamically generate the unload query.

unload_id - This is for maintaining the history purpose, In one shot you can export all the tables, from this ID, you can get the list of tables uploaded from a particular export operation.

unload_time - Timestamp of when you started executing the procedure.

NOTE: This stored procedure and the history table needs to installed on all the databases. Because from information schema it’ll only return the list of tables in the current schema. Its Redshift’s limitation.

I have made a small change here, the stored procedure will generate the COPY command as well. You can query the unload_history table to get the COPY command for a particular table. So you can easily import the data into any RedShift clusters.

I have published a new blog. You can now export based on your requirements like export only few tables, all tables in a schema, all tables in multiple schema and etc. Click on the below link. https://thedataguy.in/redshift-unload-multiple-tables-schema-to-s3/

Table for maintaining the History of Unload:

CREATE TABLE unload_history ( pid INT IDENTITY ( 1 , 1 ), u_id INT , u_timestamp DATETIME , start_time DATETIME , end_time DATETIME , db_name VARCHAR ( 100 ), table_schema VARCHAR ( 100 ), table_name VARCHAR ( 100 ), export_query VARCHAR ( 65000 ), import_query VARCHAR ( 65000 ) );

Stored Procedure:

CREATE OR replace PROCEDURE unload_all ( s3_location varchar ( 10000 )) LANGUAGE plpgsql AS $$ DECLARE list RECORD ; db VARCHAR ( 100 ); tablename VARCHAR ( 100 ); tableschema VARCHAR ( 100 ); starttime datetime ; endtime datetime ; SQL text ; s3_path VARCHAR ( 1000 ); iamrole VARCHAR ( 100 ); delimiter VARCHAR ( 10 ); max_filesize VARCHAR ( 100 ); un_year INT ; un_month INT ; un_day INT ; unload_query varchar ( 65000 ); copy_query varchar ( 65000 ); unload_id INT ; unload_time timestamp ; BEGIN -- Pass values for the variables SELECT extract ( year FROM getdate ()) INTO un_year ; SELECT extract ( month FROM getdate ()) INTO un_month ; SELECT extract ( day FROM getdate ()) INTO un_day ; SELECT DISTINCT ( table_catalog ) FROM information_schema . TABLES INTO db ; SELECT coalesce ( max ( u_id ), 0 ) + 1 FROM unload_history INTO unload_id ; SELECT getdate () INTO unload_time ; s3_path : = s3_location ; -- IAM ROLE and the Delimiter is hardcoded here iamrole : = 'arn:aws:iam::123123123:role/myredshiftrole' ; delimiter : = '|' ; -- Get the list of tables except the unload history table FOR list IN SELECT table_schema , table_name FROM information_schema . TABLES WHERE table_type = 'BASE TABLE' AND table_schema NOT IN ( 'pg_catalog' , 'information_schema' ) AND table_name != 'unload_history' LOOP SELECT getdate () INTO starttime ; sql : = 'select * from ' || list . table_schema || '.' || list . table_name || '' ; RAISE info '[%] Unloading... schema = % and table = %' , starttime , list . table_schema , list . table_name ; -- Start unloading the data unload_query : = 'unload ( '' ' || sql || ' '' ) to '' ' || s3_path || un_year || '/' || un_month || '/' || un_day || '/' || db || '/' || list . table_schema || '/' || list . table_name || '/' || list . table_schema || '-' || list . table_name || '_ '' iam_role '' ' || iamrole || ' '' delimiter '' ' || delimiter || ' '' MAXFILESIZE 300 MB PARALLEL ADDQUOTES HEADER GZIP' ; EXECUTE unload_query ; copy_query : = 'copy ' || list . table_schema || '.' || list . table_name || ' from '' ' || s3_path || un_year || '/' || un_month || '/' || un_day || '/' || db || '/' || list . table_schema || '/' || list . table_name || '/ '' iam_role '' ' || iamrole || ' '' delimiter '' ' || delimiter || ' '' IGNOREHEADER 1 REMOVEQUOTES gzip' ; SELECT getdate () INTO endtime ; SELECT list . table_schema INTO tableschema ; SELECT list . table_name INTO tablename ; -- Insert into the history table INSERT INTO unload_history ( u_id , u_timestamp , start_time , end_time , db_name , table_schema , table_name , export_query , import_query ) VALUES ( unload_id , unload_time , starttime , endtime , db , tableschema , tablename , unload_query , copy_query ); END LOOP ; RAISE info ' Unloading of the DB [%] is success !!!' , db ; END ; $$ ;

Hardcoded Items:

In the stored procedure, I have hardcoded the follow parameters.

IAM ROLE - arn:aws:iam::123123123123:role/myredshiftrole

Delimiter - |

Also, the following Items are hardcoded in the Unload query. You can get these things as variable or hardcoded as per your convenient.

MAXFILESIZE - 100 MB

PARALLEL

ADDQUOTES

HEADER

GZIP

Execute the procedure:

call unload_all('s3://bhuvi-datalake/test/');

You can see the status in the terminal

INFO: [2019-10-06 19:20:04] Unloading... schema = etl and table = tbl1 INFO: UNLOAD completed, 2 record(s) unloaded successfully. INFO: [2019-10-06 19:20:10] Unloading... schema = etl and table = tbl2 INFO: UNLOAD completed, 2 record(s) unloaded successfully. INFO: [2019-10-06 19:20:12] Unloading... schema = stage and table = tbl3 INFO: UNLOAD completed, 2 record(s) unloaded successfully. INFO: [2019-10-06 19:20:12] Unloading... schema = stage and table = tbl4 INFO: UNLOAD completed, 2 record(s) unloaded successfully. INFO: [2019-10-06 19:20:14] Unloading... schema = stage and table = tbl5 INFO: UNLOAD completed, 2 record(s) unloaded successfully. INFO: [2019-10-06 19:20:15] Unloading... schema = prod and table = tbl6 INFO: UNLOAD completed, 2 record(s) unloaded successfully. INFO: [2019-10-06 19:20:15] Unloading... schema = prod and table = tbl7 INFO: UNLOAD completed, 2 record(s) unloaded successfully. INFO: [2019-10-06 19:20:15] Unloading... schema = public and table = debug INFO: UNLOAD completed, 0 record(s) unloaded successfully. INFO: [2019-10-06 19:20:16] Unloading... schema = public and table = test INFO: UNLOAD completed, 1 record(s) unloaded successfully. INFO: Unloading of the DB [preprod] is success !!! CALL

Files on S3:

Retrieve the History:

preprod =# select * from unload_history limit 1 ; pid | 28 u_id | 1 u_timestamp | 2019 - 10 - 08 10 : 33 : 23 start_time | 2019 - 10 - 08 10 : 33 : 23 end_time | 2019 - 10 - 08 10 : 33 : 23 db_name | preprod table_schema | etl table_name | tbl2 export_query | unload ( 'select * from etl.tbl2' ) to 's3://bhuvi-datalake/test/2019/10/8/preprod/etl/tbl2/etl-tbl2_' iam_role 'arn:aws:iam::123123123:role/myredshiftrole' delimiter '|' MAXFILESIZE 300 MB PARALLEL ADDQUOTES HEADER GZIP import_query | copy etl . tbl2 from 's3://bhuvi-datalake/test/2019/10/8/preprod/etl/tbl2/' iam_role 'arn:aws:iam::123123123:role/myredshiftrole' delimiter '|' IGNOREHEADER 1 REMOVEQUOTES gzip

Advertisement