Created on 2005-01-16 04:02 by irmen, last changed 2016-04-08 22:49 by pfalcon.

Messages (16)

msg47558 - (view) Author: Irmen de Jong (irmen) Date: 2005-01-16 04:02

This patch is a first take at adding a recvall method to the socket object, to mirror the existence of the sendall method. If the MSG_WAITALL flag is available, the recvall method just calls recv() with that flag. If it is not available, it uses an internal loop (during the loop, threads are allowed, so this improves concurrency). Having this method makes Python code much simpler; before you had to test for MSG_WAITALL yourself and write your own loop in Python if the flag is not there (on Windows for instance). (also, having the loop in C improves performance and concurrency compared to the same loop in Python) Note: the patch hasn't been tested very well yet. (code is based on a separate extension module found here: http://www.it-ernst.de/python/ )

msg47559 - (view) Author: Martin v. Löwis (loewis) * Date: 2005-02-24 20:55

Logged In: YES user_id=21627 I like the feature (but see below). The patch is incomplete, though: - there are no changes to Doc/lib/libsocket.tex - there are no changes to Lib/test/test_socket.py Furthermore, the patch is also wrong: if a later recv call fails, all data read so far are discarded. I believe this is different from the WAITALL flag, which I hope will preserve the data in the socket, for a subsequent recv call. As keeping the data in the socket seems unimplementable, the partial data should somehow be returned to the application. A note on coding style: please omit the spaces after the opening paren and before the closing in while ( (bytes_got<total_size) && (n > 0) )

Irmen, do you want to update this patch for the current Python trunk, taking Martin's comments into account?

msg101178 - (view) Author: Irmen de Jong (irmen) Date: 2010-03-16 18:36

Sure, I'll give it another go. I've not done any c-development for quite a while though, so I have to pick up the pieces and see how far I can get. Also, I don't have any compiler for Windows so maybe I'll need someone else to validate the patch on Windows for me, once I've got something together.

msg101206 - (view) Author: Irmen de Jong (irmen) Date: 2010-03-17 00:14

Ok I've looked at it again and think I can build an acceptable patch this time. However there are 2 things that I'm not sure of: 1) how to return the partial data to the application if the recv() loop fails before completion. Because the method will probably raise an exception on failure, as usual, it seems to me that the best place to put the partial data is inside the exception object. I can't think of another easy and safe way for the application to retrieve it otherwise. But, how is this achieved in code? I'll be using set_error() to return an error from my sock_recvall function I suppose. 2) the trunk is Python 2.7, should I make a separate patch for 3.x?

msg102374 - (view) Author: Irmen de Jong (irmen) Date: 2010-04-05 14:45

Ok I think I've got the code and doc changes ready. I added a recvall and a recvall_into method to the socket module. Any partially received data in case of errors is returned to the application as part of the args for a new exception, socket.partialdataerror. Still need to work on some unit tests for these new methods.

Just a couple comments: * If MSG_WAITALL is defined and a signal interrupts recv, will a string shorter than requested will be returned by sock_recvall? * Since MSG_WAITALL is already exposed to Python (when the underlying platform provides it), I wonder if this could all be implemented more simply in pure Python. Can you elaborate on the motivation to use C? Someone should do another review when there are unit tests.

msg102391 - (view) Author: Irmen de Jong (irmen) Date: 2010-04-05 17:58

Currently if MSG_WAITALL is defined, recvall() just calls recv() internally with the extra flag. Maybe that isn't the smartest thing to do because it duplicates recv's behavior on errors. Which is: release the data and raise an error. Would it be nicer to have recvall() release the data and raise an error, or to let it return the partial data? Either way, I think the behavior should be the same regardless of MSG_WAITALL being available. This is not yet the case. Why C: this started out by making the (very) old patch that I once wrote for socketmodule.c up to date with the current codebase, and taking Martin's comments into account. The old patch was small and straightforward. Unfortunately the new one turned out bigger and more complex than I thought. For instance I'm not particularly happy with the way recvall returns the partial data on fail. It uses a new exception for that but the code has some trickery to replace the socket.error exception that is initially raised. I'm not sure if my code is the right way to do this, it needs some review. I do think that putting it into the exception object is the only safe way of returning it to the application, unless the semantics on error are changed as mentioned above. Maybe it could be made simpler then. In any case, it probably is a good idea to see if a pure python solution (perhaps just some additions to Lib/socket.py?) would be better. Will put some effort into this.

msg114395 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-19 18:44

@Irmen if you do proceed with this it should be against the py3k trunk.

msg234117 - (view) Author: Antoine Pitrou (pitrou) * Date: 2015-01-16 08:39

I'm frankly not sure why this is useful. If you want a guaranteed read size you should use the buffered layer - i.e. socket.makefile(). No need to complicate the raw socket implementation.

msg234126 - (view) Author: STINNER Victor (vstinner) * Date: 2015-01-16 11:16

The patch uses the flag MSG_WAITALL for recv() if available. Extract of the manual page: MSG_WAITALL (since Linux 2.2) This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or discon- nect occurs, or the next data to be received is of a different type than that returned. It looks interesting, but it doesn't guarantee that you will always get exactly the expected size. You still have to call again recv() to get more data if a signal was received. Jean-Paul Calderone wrote: > Since MSG_WAITALL is already exposed to Python (when the underlying platform provides it), I wonder if this could all be implemented more simply in pure Python. Can you elaborate on the motivation to use C? sendall() is implemented in C while it would be possible to implement it in Python. The same rationale can be used on a large part of the stdlib :-) (The io module is implemented in Python in Python 2.6!) The C gives you a full control on the GIL, signal handle, and it might be faster. Antoine Pitrou wrote: > I'm frankly not sure why this is useful. recvall() allows to easily fix existing code: just replace recv() with recvall(), no need to refactor code to call makefile() which has a different API (ex: read/recv, write/send). The addition is small and well defined. -- About the exception: asyncio.StreamReader.read_exactly() raises an IncompleteReadError which contains the read bytes and inherits from EOFError: see https://docs.python.org/dev/library/asyncio-stream.html#asyncio.StreamReader.readexactly and https://docs.python.org/dev/library/asyncio-stream.html#asyncio.IncompleteReadError The following issue discussed the design on this exception in asyncio: https://code.google.com/p/tulip/issues/detail?id=111 http.client uses an IncompleteRead (which inherits from HTTPException): https://docs.python.org/dev/library/http.client.html#http.client.IncompleteRead

msg234157 - (view) Author: Irmen de Jong (irmen) Date: 2015-01-17 01:50

I created the patch about 5 years ago and in the meantime a few things have happened: - I've not touched C for a very long time now - I've learned that MSG_WAITALL may be unreliable on certain systems, so any implementation of recvall depending on MSG_WAITALL may inexplicably fail on such systems - I've been using a python implementation of a custom recv loop in Pyro4 for years - it is unclear that a C implementation will provide a measurable performance benefit because I think most of the time is spent in the network I/O anyway, and the GIL is released when doing a normal recv (I hope?) In other words, I will never follow up on my original C-based patch from 5 years ago. I do still like the idea of having a reliable recvall in the stdlib instead of having to code a page long one in my own code.

msg239975 - (view) Author: STINNER Victor (vstinner) * Date: 2015-04-03 12:02

> - I've learned that MSG_WAITALL may be unreliable on certain systems, so any implementation of recvall depending on MSG_WAITALL may inexplicably fail on such systems Something else occurred since 5 years: the PEP 475 was accepted, it makes Python more reliable when it receives signals. If recv(WAIT_ALL) is interrupted by a signal and returns less bytes, we must call PyErr_CheckSignal(). If the signal handler raises an exception, drop read data and raises the exception. If the signal handler does not raise an exception, we now *must* retry recv(WAIT_ALL) (with a shorter length, to not read too much data). The IncompleteRead exception is still needed if the socket is closed before receiving the requested number of bytes.

msg249662 - (view) Author: STINNER Victor (vstinner) * Date: 2015-09-03 15:47

recvall.patch: implement socket.socket.recvall() in pure Python. It raises a new socket.IncompleteReadError (copied from asyncio) exception if the connection is closed before we got the expected number of bytes. The patch has unit tests, document the new method and the new exception. TODO: I don't like how the method handles timeout. The method must fail if it takes longer than socket.gettimeout() seconds, whereas currently the timeout is reset each time we got data from the server. If the idea of the new socket method is accepted, I will reimplement it in C. In C, it's more easy to implement the timeout as I want. In Python, the socket timeout cannot be changed temporary, because it would impact other threads which may use the same socket. I changed how socket.sendall() handle timeout in Python 3.5, it is now the maximum total duration to send all data. The timeout is no more reset each time we send a packet. Related discussion: https://mail.python.org/pipermail/python-dev/2015-April/139001.html See also the issue #23236 which adds a timeout reset each time we get data to the asyncio read() method. It will be complementary to the existing "wait(read(), timeout)" timeout method, it's for a different use case.

msg250402 - (view) Author: STINNER Victor (vstinner) * Date: 2015-09-10 18:49

Oh, in fact recvall() is a bad name. The io module uses the "readall()" name to really read all content of a file, whereas "recvall(N)" here only read up to N bytes. It would be better to reuse the same name than asyncio, "readexactly(N)": https://docs.python.org/dev/library/asyncio-stream.html#asyncio.StreamReader.readexactly asyncio and http.client already have their IncompleteRead exceptions. Maybe it would be time to add a builtin exception?

msg254428 - (view) Author: Martin Panter (martin.panter) * Date: 2015-11-10 02:44