

Original Article by Juan Valencia

Although some file archivers offer us the option of split the files, this can be easily accomplished with two commands: split and cat .

Splitting a file with split

split just needs the size of the parts that we want to create, and the file that we want to split, e.g.:

split -b 1024 file_to_split.bin split -b 1024 file_to_split.bin

If this file is 6 kibibytes long, it will create 6 files of 1 kibibyte each, named xaa , xab , xac , xad , xae and xaf .







The parameter -b is what defines the size of the resulting parts. You can use suffixes, either the SI suffixes KB (Kilobyte: 1000 bytes), MB (Megabyte: 1000×1000 bytes), GB (Gigabyte: 1000×1000×1000 bytes), TB, PB, EB, ZB, YB. Or you can use the EIC suffixes K (Kibibyte: 1024 bytes), M (Mibibyte: 1024×1024 bytes), G (Gibibyte: 1024×1024×1024 bytes), T, P, E, Z, Y. E.g.:

split -b 1K file_to_split.bin split -b 10M file_to_split.bin split -b 1KB file_to_split.bin split -b 10MB file_to_split.bin split -b 1K file_to_split.bin split -b 10M file_to_split.bin split -b 1KB file_to_split.bin split -b 10MB file_to_split.bin

The past examples would create parts of 1,024 bytes, 10,485,760 bytes, 1,000 bytes and 10,000,000 bytes respectively.

If we don’t want the default prefix x, we can change it by adding the new prefix after the name of the file that we want to split, e.g.:

split -b 1024 file_to_split.bin a_part_ split -b 1024 file_to_split.bin a_part_

If we are splitting a file of 6 kibibytes as in the first example, this would generate the files: a_part_aa , a_part_ab , a_part_ac , a_part_ad , a_part_ae and a_part_af .

We can change the length of the suffix in the resulting files, and we can choose between an alphabet based suffix (the default) or a numeric suffix. How many parts we can create depends of this two features of the suffix. If we keep the default length of 2, and don’t use a numeric suffix, we can split a file in up to 676 parts. If split runs out of suffixes, it will fail, leaving us with the files created until the moment it failed.

To change the length of the suffix use the parameter -a followed by a number. E.g.:

split -b 1024 -a 4 file_to_split.bin split -b 1024 -a 4 file_to_split.bin

Following the first given example again, we would end with the files: xaaaa , xaaab , xaaac , xaaad , xaaae and xaaaf .

To use a numeric suffix use the parameter -d , of course with a numeric suffix and a length of 2 we can split a file in up to 100 parts. E.g.:

split -b 1024 -d file_to_split.bin split -b 1024 -d file_to_split.bin

And using the first example one last time, we would end with the files: x00 , x01 , x02 , x03 , x04 and x05 .

Merging the parts that were created with split

Since the files created with the split command are sequential, we can simply use cat to merge this files into a new file, e.g.:

cat x?? > file_ricostruito.bin cat x?? > file_ricostruito.bin

The question mark acts as a wild-card character. How many question marks do we use depends of course of the length of the suffixes used when creating the parts.

Split a file per lines

split also allow us to split a text file per lines rather than per the size of the resulting parts. I am sure this was very useful at some point in the past, but I can’t think of a reason to split a text file per lines other than for experimental or didactic purposes, text processors are quite capable of dealing with very large text files (great, just days after writing this I found a good application, splitting long lists of URLs and splitting those long MySQL files full of commands so we can upload them to those web-based systems that can’t handle big files, I left this comment for humorous purposes). Nevertheless, the parameter to split a text file per lines is -l followed by the number of lines. E.g.:

split -l 20 file_to_split.txt split -l 20 file_to_split.txt

This will create one file for every chunk of 20 lines in the original file, so if the file had 54 lines, it would create the files xaa (lines 1-20), xab (lines 21-40) and xac (lines 41-54).

There is another option in split , a mix between splitting the file per size in bytes and per number of lines. In this mode we specify a maximum size in bytes for the parts, and split will fit as many complete lines as possible in each part without exceeding the specified size in bytes. For this we use the parameter -C followed by a number representing the size in bytes, you can use any of the suffixes that are valid for the option -b . E.g.: Assume that we have a file that contains 20 lines of 100 characters each, totalling 2000 bytes, and we use the following command:

split -C 512 file_to_split.txt split -C 512 file_to_split.txt

This would give us the files: xaa (lines 1-5, since the sixth line, having 100 characters, would not fit in the 512 bytes that we set as limit), xab (lines 6-10), xac (lines 11-15) and xad (lines 16-20). Each file would have a size of 500 bytes.





Popular Posts:

None Found