Dynamic Blocks stream buffer


2.1 Introduction

2.1.1 About this chapter

This chapter describes class dbstreambuf_ct, which is derived from the ANSI/ISO C++ class streambuf.

Seldom a class is derived from streambuf; chances are that you never used anything else for stream buffering than the standard streambuf class from your C++ library.

The knowledge needed to understand and work with a streambuf class (a class derived from streambuf) is too large to be covered in this chapter, hence a basic knowledge about stream buffers is presumed.  If you have no idea what a streambuf is and don't want to know it, then you can skip this chapter: It will not be of concern to you.

2.1.2 No wchar_t

For historical reasons a streambuf controls input from and output to character sequences.  ANSI/ISO C++ supports both char and wchar_t by defining the template basic_streambuf<class charT> and defining streambuf with:
typedef basic_streambuf<char> streambuf;
Libcw is currently not supporting wchar_t because at the time of writing there is no good libstdc++ available.  This will change in the future.

2.1.3 Text versus binary

A function which converts an arbitrary type into a series of characters is called a serializer.  If this is done in order to store an object of that type to disk for instance, then no information should be lost, but that is not a condition.  Later it would be possible to restore that object by reading the series of characters in and translating it back to its original type.  Such a series could be text or it could be binary.  There is no big difference between text and binary, at least not from a technical point of view.  But usually a text representation is mostly readable and usually doesn't contain the character '\0'.  Moreover, a binary representation will in general use much fewer bytes then a text representation.

2.1.4 Inserters: text representation

The inserter function ostream & operator<<(ostream&, type) is a serializer: It converts type into a series of characters and these characters are written to a streambuf.  The role of ostream is to provide a hook to the streambuf class and to store some information about how to serialize internal types.  All predefined ostream& operator<< inserters for the internal types convert into a (readable) text representation. As a programmer you seldom write to the streambuf yourself: Instead you convert your objects into a representation with internal types and use existing inserters to write to the streambuf object.  As a result, the inserter functions << usually serialize into a text representation.  The functions istream& operator>> read from a streambuf and should obviously expect and understand this text representation.

2.1.5 The streambuf interface

The interface of a streambuf for reading and writing is purely binary.  A streambuf has functions to read from the get area and functions to write to the put area.  An area is a contiguous block in memory.  The use of areas allows for the option of non-contiguous buffers.  It can consist of multiple memory blocks, or it could be circular for instance.

2.2 Interface design of dbstreambuf_ct

The normal streambuf object represents a single buffer.  This is not a problem when you are writing to a device that is write-only (an ostream) or reading from a device that is read-only (an istream), see figure 1.


Figure 1. The interfaces of a single streambuf object as needed for ostream and istream objects.  For instance, for a write-only device the read interface would be used by the device while the write interface would be used by ostream& operator<< functions.

A problem occurs when you use an iostream.  An iostream is derived using multiple inheritance, from ostream and istream that share their virtual base class ios which keeps the pointer to the streambuf.  As a result, an iostream uses the same buffer for input and output! For a device like a non-blocking socket this is disastrous.  This is one of the reasons that libcw needs to define its own streambuf class: we need two buffers per iostream object.

Given the facts that

an iostream points to a single streambuf,

a single streambuf has one interface for writing and one interface for reading,

and an iostream needs two buffers, one for reading and one for writing.
we conclude:

One streambuf contains one buffer in order to keep a balance between the number of buffers and number of read/write interfaces.

The put area and get area of the streambuf that an iostream points to need to belong to different buffers.

And from that we finally conclude that an iostream needs two streambuf objects with crosslinked put/get area pointers.  This is shown in figure 2.


Figure 2. The crosslinked interfaces of two streambuf objects as needed for iostream objects.  The interface of streambuf 1 is used by the operators << and >> while the interface of streambuf 2 is be used by a device.

2.3 Buffer design of dbstreambuf_ct

The internal buffer is a deque of allocated memory blocks.  Data is written to the last memory block in the deque, until the end of the block is reached.  At that moment a new block is allocated and inserted at the end of the deque.  Note that if the buffer becomes empty the put area is reset to the beginning of the current block; so if reading keeps pace with writing then allocation of new blocks is not necessary.

When a new block needs to be allocated then its size is calculated to be the number of unread bytes stored in the buffer at that moment, rounded up to the nearest typical malloc size.  The latter causes no memory to be wasted.

Data is never moved in memory.  This makes dbstreambuf_ct as fast as possible: All data is at most written once and reading is done by providing a pointer that points directly into the buffer.  The only exception is when a message is read from the buffer that crosses a block boundary.  In that case a temporal block is allocated and the message is copied to this block in order to make it contiguous.  Note that by specifying a minimum block size that is significantly larger than the average message length that will be read, the extra overhead because of this will be negligible.

2.4 Input, Output and Link buffer

The reason that the same class is used for both input and output buffering is that it is easier to maintain: There is no duplicated code.  Having the same interface for input and output buffers turns out to be very confusing however.  Another disadvantage of using the same type for input and output buffers is that they can be confused by accident.  Therefore all methods except the accessors are protected and two new classes are derived from dbstreambuf_ct that implement a public and more intuitive input and output interface.

For a correct understanding it is important to realize that the difference between an input and output buffer is purely semantic: There is no real difference between an input buffer and an output buffer.  Only the interface has been restricted in order to make the use in its own class less error prone.

There is a third class of buffers that is needed however.  While an input buffer allows to read from a device, it can not write to a device (and vica versa for an output buffer).  Sometimes two devices need to be linked: They need to share the same buffer, one device will write to this buffer while the other device reads from that same buffer.  We will call this type of buffer interface a link buffer.

2.4.1 input_buffer_ct

The input_buffer_ct is used to buffer data that is read from a device.  This means that apart from the normal access to the buffer via the streambuf virtual functions and istream& operator>>(), we need an interface for raw read access and an interface that allows the device to write its data into the buffer.

2.4.1.1 Constructor

input_buffer_ct::input_buffer_ct (size_t minimum_blocksize = default_input_buffer_blocksize_c,
                                  size_t buffer_full_watermark = (size_t)-1,
                                  size_t max_alloc = (size_t)-1)

This constructs a dynamic, blocks input stream buffer.  The minimum number of allocated bytes for one block of the buffer is minimum_blocksize.  The maximum possible number of total allocated bytes of all blocks together is max_alloc.  When this value is reached, overflow() will return EOF.  The accessor buffer_full() returns true when the number of buffered bytes in the input buffer exceed buffer_full_watermark.

A new buffer always needs to be allocated with operator new.

2.4.1.2 Binary read access

char* input_buffer_ct::raw_gptr(void) const

Calls dbstreambuf_ct::igptr(): Returns a pointer to the start of the get area of the input buffer.

void input_buffer_ct::raw_gbump(int n)

Calls dbstreambuf_ct::igbump(n): Advances the get area pointer of the input buffer by n.  No error checking is done.

size_t input_buffer_ct::raw_sgetn(char* s, size_t n)

Returns dbstreambuf_ct::ixsgetn(s, n): Copies at most n characters from the input buffer to the character array s though not more than the number of characters that are in the buffer at that moment.  Returns the number of characters actually copied.

These three functions are typically called from decode_input_ct::new_msg_received(size_t len, bool contiguous) or a reimplemented version of that virtual function.

For example,

void my_input_ct::new_msg_received(size_t message_length, bool contiguous)
{
  if (contiguous)
  {
    char* get_pointer = ibuffer->raw_gptr();
    ibuffer->raw_gbump(message_length);
    decode(get_pointer, message_length);
  }
  else
  {
    char* tmp_msg_buf = new char[message_length];
    AllocTag(tmp_msg_buf, "Buffer to make received message contiguous before decoding it.");
    ibuffer->raw_sgetn(tmp_msg_buf, message_length);
    decode(tmp_msg_buf, message_length);
    delete [] tmp_msg_buf;
  }
}

2.4.1.3 The device interface

size_t input_buffer_ct::dev2buf_contiguous(void) const

Returns the number of contiguous characters that can be written directly to memory at the position returned by dev2buf_ptr().  If this function returns zero, then call dev2buf_contiguous_forced().

size_t input_buffer_ct::dev2buf_contiguous_forced(void)

Returns the number of contiguous characters that can be written directly to memory at the position returned by dev2buf_ptr().  This function does not return zero unless the buffer is full or out of memory.

char* input_buffer_ct::dev2buf_ptr(void) const

Returns streambuf::pptr(): Returns a pointer to the start of the put area of the input buffer.

void input_buffer_ct::dev2buf_bump(int n)

Calls streambuf::pbump(n): Advances the put area pointer of the input buffer by n.  No error checking is done.

These four functions are typically called from read_input_ct::read_from_fd(void) or a reimplemented version of that virtual function.

For example,

void my_input_ct::read_from_fd(void)
{
  size_t contiguous_length, read_length;

  if ((contiguous_length = ibuffer->dev2buf_contiguous()) == 0
      && (contiguous_length = ibuffer->dev2buf_contiguous_forced()) == 0)
  {
    reset_need_read();
    return;
  }
  char* new_data = ibuffer->dev2buf_ptr();
  if ((read_length = ::read(fd(), new_data, contiguous_length)) <= 0) { ... }
  ibuffer->dev2buf_bump(read_length);
  // ...
}

2.4.1.4 Reducing the buffer size

void input_buffer_ct::reduce_buf_if_empty(void)

If the buffer is empty, reduces the buffer to a single block with the minimum blocksize as specified during construction of the buffer.  This function is also typically called from read_input_ct::read_from_fd(void).

2.4.2 output_buffer_ct

The output_buffer_ct is used to buffer data before it is written to a device.  This means that apart from the normal access to the buffer via the streambuf virtual functions and ostream& operator<<(), we need an interface for raw write access and an interface that allows the device to read its data from the buffer.

2.4.2.1 Constructor

output_buffer_ct::output_buffer_ct (size_t minimum_blocksize = default_output_buffer_blocksize_c,
                                    size_t buffer_full_watermark = (size_t)-1,
                                    size_t max_alloc = (size_t)-1)

This constructs a dynamic, blocks output stream buffer.  The minimum number of allocated bytes for one block of the buffer is minimum_blocksize.  The maximum possible number of total allocated bytes of all blocks together is max_alloc.  When this value is reached, overflow() will return EOF.  The accessor buffer_full() returns true when the number of buffered bytes in the output buffer exceed buffer_full_watermark.

A new buffer always needs to be allocated with operator new.

2.4.2.2 Binary write access

char* output_buffer_ct::raw_pptr(void) const

Returns streambuf::pptr(): Returns a pointer to the start of the put area of the output buffer.

void output_buffer_ct::raw_pbump(int n)

Calls streambuf::pbump(n): Advances the put area pointer of the output buffer by n.  No error checking is done.

size_t output_buffer_ct::raw_sputn(char* s, size_t n)

Returns the virtual function streambuf::xsputn(s, n): Copies n characters from the character array s to the buffer.  Returns the number of characters actually copied (always n, unless we did run out of memory).

These three functions are hardly used, it makes much more sense to use operator<< which accesses the buffer via the virtual functions of class streambuf.

2.4.2.3 The device interface

size_t output_buffer_ct::buf2dev_contiguous(void) const

Returns the number of bytes that can be read directly from memory at the position returned by buf2dev_ptr().  If this function returns zero, then call buf2dev_contiguous_forced().

size_t output_buffer_ct::buf2dev_contiguous_forced(void)

Returns the number of bytes that can be read directly from memory at the position returned by buf2dev_ptr().  Does not return 0 unless the buffer is empty.

char* output_buffer_ct::buf2dev_ptr(void) const

Calls dbstreambuf_ct::igptr(): Returns a pointer to the start of the get area of the output buffer.

void output_buffer_ct::buf2dev_bump(int n)

Calls dbstreambuf_ct::igbump(n): Advances the get area pointer of the output buffer by n.  No error checking is done.

These four functions are typically called from write_output_ct::write_to_fd(void) or a reimplemented version of that virtual function.

For example,

void my_output_ct::write_to_fd(void)
{
  size_t contiguous_length, written_length;
  
  if ((contiguous_length = obuffer->buf2dev_contiguous()) == 0
      && (contiguous_length = obuffer->buf2dev_contiguous_forced()) == 0)
  {
    reset_need_write();
    return;
  }
  if ((written_length = ::write(fd(), obuffer->buf2dev_ptr(), contiguous_length)) < 0) { ... }
  obuffer->buf2dev_bump(written_length);
}

2.4.3 link_buffer_ct

The link_buffer_ct is used to buffer data between two devices.  This means that we need an interface that allows a device to write data to the buffer but also an interface that allows a device to read from the buffer.

2.4.3.1 Constructor

link_buffer_ct::link_buffer_ct (size_t minimum_blocksize = default_output_buffer_blocksize_c,
                                size_t buffer_full_watermark = (size_t)-1,
                                size_t max_alloc = (size_t)-1)

This constructs a dynamic, blocks link buffer.  The minimum number of allocated bytes for one block of the buffer is minimum_blocksize.  The maximum possible number of total allocated bytes of all blocks together is max_alloc.  When this value is reached, overflow() will return EOF.  The accessor buffer_full() returns true when the number of buffered bytes in the output buffer exceed buffer_full_watermark.

A new buffer always needs to be allocated with operator new.

2.4.3.2 The device interface

The interface of a link buffer uses the same names for its write interface as the device interface of the input buffer and the same names for its read interface as the device interface of the output buffer.  Please see the respective sections above for details.

2.5 Observer functions

2.5.1 Sizes

size_t minimum_block_size(void) const

Returns the true minimum block size in bytes.  This value is not necessarily equal to the value passed to the constructor because its size is optimized for allocation with malloc.

size_t used_size(void) const
Returns the current number of valid bytes in the buffer. 
Used in read_input_ct::read_from_fd(void)

2.5.2 Internal buffer state

bool buffer_empty() const
Returns true if the buffer is empty. 
Returned by write_output_ct::writebuf_is_really_empty(void) const
bool buffer_full() const
This function returns true when used_size() is larger than buffer_full_watermark passed to the constructor.  This doesn't mean that writing to the buffer will fail yet, but it should be used to for flow control. 
Used in read_input_ct::read_from_fd(void)
bool has_multiple_blocks(void) const
Returns true when this buffer currently has more then one block allocated.  This can be used to speed up read/write access methods. 
Used in read_input_ct::read_from_fd(void)
bool is_contiguous(size_t len) const
Returns true if there is a contiguous string with length len in the current get area of the buffer. 
Used in read_input_ct::read_from_fd(void)

2.5.3 Debug function

When the macro DEBUGMARKER is defined, then the following method becomes available:

void move_outside(debugmalloc_marker_ct* marker) const

Moves the allocations of this buffer outside memory leak detection marker marker.

2.6 Flushing and deleting the buffer

2.6.1 Flushing

Class streambuf uses the virtual function sync() to signal the need for flushing the buffer.  This means that sync() should not return until the buffer is written completely to the associated output device.  While it is obviously a bad idea to use blocking at all in applications that libcw is aiming at (heavy duty network servers), this synchronizing is supported nevertheless; mainly for debugging output.

When dbstreambuf_ct::sync() is called, the buffer needs to call fd_dct::sync(), a member function of the output device.  Therefore, the buffer needs to know to which output device it is linked, if any.  It would be impractical to pass a pointer to this device object as part of the constructor of the buffer: That would demand that the output device is created first, which has consequences that effect the application layer.  Instead, the device pointer can be set with a separate function, which can be hidden from the application layer.  This function is called in another part of libcw whenever a buffer is assigned to a device.

2.6.2 Deleting

When a buffer isn't used anymore, there is only one logical thing to do: delete it.  A buffer can be used by one device (the input and output buffers) or two devices (the link buffer).  Some sort of garbage collection mechanism is needed to determine when the buffer really isn't used anymore.  Therefore devices need to register with the buffer and release the buffer again when they are done with it (mostly when the device itself is deleted).

Registration with a buffer is done with the same function call that is used to pass the device pointer to the buffer; but this time we also need such a function for input devices! For the sake of symmetry this function also takes a device pointer.  See §2.6.3 for a definition of these two function calls.

Releasing a buffer is done with the following method:

bool dbstreambuf_ct::release(fd_dct* device)

Deletes the buffer when this was the last device using it.  Returns true when the buffer was actually deleted.

2.6.3 Setting the device pointers

The device pointers can be set by calling either,

void dbstreambuf_ct::set_input_device(fd_dct* device)

Sets the input device that uses this buffer.  Note that there can be only one input device per buffer, calling this function more then once results in undefined behaviour.  At this moment, device isn't used, but that might change in the future.

or,

void dbstreambuf_ct::set_output_device(fd_dct* device)

Sets the output device that uses this buffer.  The use of streambuf::sync() will cause device->sync() to be called.  Note that there can be only one output device per buffer, calling this function more than once results in undefined behavior.

2.6.4 Avoid blocking

Because sync() is a blocking function, you don't want to use it.  In order to avoid it, you should not use flush and endl, nor call ostream::flush().

For example, instead of doing

myDevice << "Hello World" << endl;	// Write a line with a new-line (and flush it)
myDevice.flush();			// Flush the ostream (?!)
myDevice.del();				// Delete the device object

you should do something like

myDevice << "Hello World\n";		// Write a line with a new-line
myDevice.del();				// Causes the device object to be deleted
					// after everything is written out.

Copyright © 1999 Carlo Wood.  All rights reserved.