🔗 Network Communications
🔗 Stuff to keep in mind
- The comm code needs to say simple, light-weight and fast; its going to be called -a lot-
- The comm code needs to be reasonably flexible wrt its IO buffering - ideally I’d like the Squid comm model to map reasonably cleanly onto Windows Completion Ports IO and any similar APIs that pop up for *NIX
🔗 How I view the communications layer
- A way of scheduling reads and writes to network sockets
- A simple way of filling/emptying data buffers
🔗 What stuff I’d like the comm layer to do and not do
- be involved in scheduling read/write to stream sockets (currently comm_read and comm_write)
- be involved in UDP datagram socket send/recv where possible
- I’d also like the comm layer to get involved with delayed reads like it is right now
- It should also take control of creating and tearing down sockets; tracking half-closed sockets and such
- In theory, code shouldn’t ever get its fingers into the fd_table and fdc_table; there should be really cheap inline methods to do so
🔗 The current comm API
🔗 What I’d like the network comm layer to look like
- There should only be one pending read and one pending write IO op per filedescriptor
- .. and therefore, only one pending read/write IO callback per filedescriptor
- The comm_read/comm_write routines should use a statically allocated - one per FD - read/write callback structure. This structure should have a dlink_node to ‘thread’ them together to form a completed callback list.
- The UDP send/receive routines should become callback-driven
- The buffer management should be done by the comm layer and a
reference to completed buffers should be given to the callee. Why?
- I’d like the comm code to fill/empty the buffers as appropriate;
- and if producers/consumers wish to consume less (eg delay pools) then the comm buffer will fill up and the comm layer will cease scheduling IO until the buffer is close to or empty;
- It also means we could cut down on IO scheduling overhead (which Steve Wilton managed to “do” as an optimisation in the epoll code, ta Steve!) to schedule IO changes whenever they happen, not each read/write
- I’d like to support Windows Completion IO and whatever strange and wonderful things the *NIX world comes up with in the future to cut back on that extra copy
- Finally, it has to stay simple and lightweight but filling and emptying buffers as quickly and efficiently as possible.
🔗 What needs to be thought about!
- The whole “pack header into contiguous memory range and then write that out once” when we really should be using writev().
- If the average ‘web’ object size is still under 64k in size then we should be able to do all of that in a single write() (or writev()) without any copying.
- Whats the most optimal size to read/write?