Have you ever heard of Coroutines? I haven’t until a few months ago when digibob (from SplashDamage) mentioned them and explained to me what they are. Instead of pasting his explanation from my IRC logs, I’ll instead quote Wikipedia:

In computer science, coroutines are program components that generalize subroutines to allow multiple entry points and suspending and resuming of execution at certain locations.

[…]

Coroutines are more generic than subroutines. The start of a subroutine is the only point of entry; the start of a coroutine is the first point of entry, and places within a coroutine following returns (yields) are subsequent points of entry. Subroutines can return only once; in contrast, coroutines can return (yield) several times. The lifespan of subroutines is dictated by last in, first out (the last subroutine called is the first to return); in contrast, the lifespan of coroutines is dictated entirely by their use and need.

(from http://en.wikipedia.org/wiki/Coroutine)

Let’s continue with an (pseudo-code) example:

coroutine unsigned coFibbonacci() {
    unsigned a, b;
    a = 0;
    b = 1;
    while( 1 ) {
        unsigned c;

        YIELD( b );
        c = a;
        a = b;
        b += c;
    }
}

Calling coFibbonacci repeatedly would return the Fibonacci series. A more advanced example (which I won’t paste here) would possibly consist of game scripts that control scripted sequences and always yield when waiting for operations to be performed, so that the control flow will look natural while basically being event-based. Usually one uses state machines in languages that don’t support coroutines but an implementation using coroutines would be a lot more elegant.

Many scripting languages have coroutine support,  like e.g. Python or Lua, while coroutines  are widely unsupported in most other “normal” languages (notable exception .NET). I have read some of the Python coroutine/generator specs in more detail to figure out some implementation details and the example code is also very neat to see the advantages of coroutines when traversing trees (for example) . In particular I’ve read PEP 255 - Simple Generators and PEP 342 - Coroutines via Enhanced Generators.

Seeing all the advantages I wondered whether and how difficult it would be to implement basic coroutine support in C/C++. The answer depends on several factors. First of all it is possible to write an implementation that doesn’t use evil setjmps or asm and that makes use of some C/C++ features in clever ways. My pure C/C++ implementation makes use of Duff’s Device to implement multiple entry points and static variables to save the “locals” between calls.

It’s pretty small and simple:

#define YIELD( value )
   do {
      yieldIndex = __COUNTER__;
      return (value);
   case __COUNTER__ - 1:
      ;
   } while( 0 )

#define RETURN( value )
   do {
      yieldIndex = -1;
      return (value);
   } while( 0 )

#define COROUTINE( returnType, name )
   returnType name() {
       static int yieldIndex = -1;
       switch( yieldIndex ) {
          case -1:

#define ENDCOROUTINE
         yieldIndex = -1;
      }
   }

PS: it should actually work with __LINE__ instead of __COUNTER__ and __COUNTER__ - 1 but VC somehow doesn’t like it
PSS: the backslashes in the macro definitions were removed by Wordpress

Recently my interest for coroutines has been renewed and I created a macro-based implementation that would allow multiple “generators” to be created and run in parallel. For this the stackframe of the coroutine had to be saved and restored later and normal locals instead of statics had to be used.

The main issue with this implementation is that Duff’s Device couldn’t be used anymore since VC will correctly warn (and error out) that it is told to jump over local variable declarations - which of course would be very bad normally.

Thus I had to write custom __asm inline code to perform the jumps. Moreover when yielding the code has to jump over local deconstructors to avoid early clean-ups of the stackframe.

This version is a lot bigger and because of the use of the __asm keyword not very portable (although the asm code itself is very primitive and should run on any processor):

#define COROUTINE( returnType, name, paramList ) class name {
   size_t yieldOffset;
   void *stackCopy;
   size_t stackSize;

public:
   name() : stackCopy( 0 ), stackSize( 0 ), yieldOffset( 0 ) {
   }

   ~name() {
      /* this is something bad actually - it should always finish but oh well */
      if( stackCopy ) {
         delete stackCopy;
      }
   }

   returnType operator() paramList {
      size_t __yieldExitOffset;
      __asm { mov __yieldExitOffset, offset yieldExit_##name }
      returnType __yieldReturnValue;
      char __frameStart;

      if( stackCopy ) {
         /* restore the stack */
         memcpy( &__frameStart - stackSize, stackCopy, stackSize );
         size_t localYieldOffset = yieldOffset;
         __asm { jmp localYieldOffset }
      }

#define CONCAT( a, b ) a ## b
#define LABEL( line ) CONCAT( __label__, line )

#define YIELD( value ) do {
         char __frameEnd;
         /* copy the stack frame */
         if( stackCopy ) {
            delete stackCopy;
         }
         stackSize = &__frameStart - &__frameEnd;
         stackCopy = new char[ stackSize ];
         memcpy( stackCopy, &__frameEnd, stackSize );

         __yieldReturnValue = (value);
         size_t localYieldOffset;
         __asm { mov localYieldOffset, offset LABEL( __LINE__ ) }
         yieldOffset = localYieldOffset;
         __asm { jmp __yieldExitOffset }
LABEL( __LINE__ ):
         ;
         } while( 0 )

#define RETURN( value )
   do {
      /* remove the stack copy and clean up */
      if( stackCopy ) {
         delete stackCopy;
         stackCopy = 0;
         stackSize = 0;
      }

      yieldOffset = 0;
      return (value);
   } while( 0 )

#define ENDCOROUTINE( name )
yieldExit_##name:
      return __yieldReturnValue;
   } /* function done */
}; /* class done */

PSS: again backslashes were removed by Wordpress

If you’re interested in some example code, here’s a zip with my VC 7.1 project.

The only known and unfixable bug in the latest implementation is that it doesn’t adapt pointers to local variables between yields (but for this real compiler-support would be needed).

Greetings,
     Black