Home » Intel » Bubble, Bubble, Toil and Trouble; Mutex Lock and Buffer Double

Bubble, Bubble, Toil and Trouble; Mutex Lock and Buffer Double


An issue that I’ve been seeing lately inside genomic codes stems from the I/O required. Prior to that you may start processing them, huge knowledge recordsdata of DNA reads must be learn into reminiscence and transformed to a structure to be able to be utilized by the computation, e.g., compressing the byte-sized characters (“A”, “C”, “T”, or “G”) to a 2-bit identical. The pure option to code up this processing is to enter a learn, name some conversion perform, and retailer the outcome. Repeat unless all information were enter.

Whereas it’s recurrently prevalent data that I/O is usually a bottleneck to efficiency, whatever the relative velocity between the enter of a DNA learn and the preliminary processing/storing of that enter, the above serial execution is leaving some simple parallelism on the desk. Clearly, the conversion processing of each and every learn is probably going impartial of the conversion of any other learn, however, extra to my level, the enter of a brand new learn is impartial of the conversion of the earlier learn. Overlapping computation and I/O by using having a separate thread doing every of the 2 duties is an effective way to lift efficiency when it may be carried out.

On the other hand, to maintain the implications of the enter and conversion appropriate I will be able to’t be overwriting the earlier unconverted learn with the aid of a brand new learn (like stabbing a perfectly usable King Duncan in his sleep). Nor do I want to re-process an already converted read (like raging at an empty chair as if Banquo’s ghost was sitting there). To this end, I can declare and use a second input buffer. While the I/O thread is filling up one buffer with the next read, the conversion thread can be working on processing the previous read. In this post I want to sketch out one way you can coordinate the two threads without having them overwrite good data or pull out stale data from the two shared buffers.

You may have realized that the cooperation between the input and conversion threads is an example of the Producer/Consumer pattern. The (pseudo-) code below is an implementation of Producer/Consumer with a shared double buffer structure used to pass items from the Producer thread to the consumer thread. I’ve implemented it with Pthreads to use a condition variable for control of access to the two buffers. First, declarations for the buffers, some index pointers, and the synchronization objects, all of which are visible to both producer and consumer threads.

        #define BUFSIZE 3

        dnaReadType buffer[BUFSIZE];

        int in=1;   // index to store next element

        int out=0;  // index of last removed element

        pthread_mutex_t b_lock=PTHREAD_MUTEX_INITIALIZER;

        pthread_cond_t b_cond=PTHREAD_COND_INITIALIZER;


I’ve allocated an array of three items to hold whatever data or structure, called dnaReadType, I’m inputting from a file. This could just as easily be an array of pointers to allocated memory if the input isn’t a known or constant size. (This latter idea is more in keeping with multiple data buffers. The single array buffer is then simply syntactic sugar to quickly reference one or the other and to easily switch from one data buffer to another.) Either way, the consumer (conversion) thread will be accessing the data from one element while the producer (input) thread fills up another. The in index will refer to the element that is next to receive new data from the producer and the out index will reference the element that last held a read for conversion by the consumer. The mutex, b_lock, and condition variable, b_cond, will be used to control access to the buffer elements and to protect updates to in and out.

The scheme and code I describe below requires that the number of items in buffer (given as BUFSIZE) be one more than the number of data buffers I want to use. This makes the test for a full or empty set of data buffers easier to code, as I’ll explain below.

If the data buffers are full (i.e., two buffer elements contain valid data awaiting conversion) the producer must wait until a slot opens up. Otherwise the new read can simply be put into the open position of buffer. The relevant code for the Producer is given here:

        while (more to produce)

           newRead = getNewRead(inputFile); // produce something


           while (out == in)                // buffer is full

              pthread_cond_wait(&b_cond, &b_lock);                                          

           buffer[in++]= newRead;           // store in buffer

           in %= BUFSIZE;





After inputting some new read from the input file, the mutex b_lock controls access to buffer and protects the reading and writing of the in and out index variables. The conditional expression checks the status of buffer. The buffers are full if out and in reference the same item in the array. This item is the extra element (which is not considered part of the usable buffer) that I included by making BUFSIZE equal to 3 for a set of two data buffers. While this array slot can (and will) hold data at some point, it is not considered a valid entry whenever two data buffers both contain unprocessed reads. Whenever buffer is full, the producer thread goes to “sleep” on the condition variable, b_cond.

At such time as the consumer signals that at least one of the data buffers is available, the producer thread wakes up, stores the new read (or performs the input of new data), increments the in index with a modulus on the BUFSIZE, signals the consumer thread (in case it is waiting on an empty buffer), and releases the mutex to allow the consumer to access buffer.

The consumer thread code would be implemented like this:

        while (more to consume)                                                            


            while (in == ((out+1) % BUFSIZE))  // buffer is empty



            out %= BUFSIZE;

            newRead = buffer[out];            // save data for processing



          ProcessRead(newRead);               // consume the data



As with the producer code, the mutex, b_lock, controls access to buffer and the update of the out and in index variables. For the consumer, if the data buffers are empty, processing of a new read must be blocked. The conditional expression tests for the empty buffer. When the out index is found to be one position “behind” the in position (modulo the BUFSIZE), then buffer has no valid read stored in the next out position.

Once the consumer thread knows that there is at least one element to be converted by either being signaled to wake up or by finding a non-empty buffer (and skipping the call to pthread_cond_wait), the out index is incremented and the read stored in buffer[out] is pulled out for conversion. Before the new read is processed, the consumer must signal the producer (which may be waiting on full data buffers) and then release the mutex.

With any parallel algorithm that requires synchronized coordination between two or more threads, you need to consider whether or not the code actually will work in all cases. Better yet, prove that it works in all cases. For my Producer/Consumer code with a single instance of each, the tricky part is whether or not the tests for full and empty buffer will pan out as expected. As a thought experiment, if you’re unsure about the scheme I’ve just discussed, see if you can convince yourself that the following cases will work as expected: 1) the consumer thread finding the initially empty buffer, 2) the producer finding the initial empty buffer, 3) due to a slow consumer thread, the producer fills up buffer and then has one more item to be stored, and 4) due to an initially quick, but now slow producer thread, the consumer initially encounters a full buffer and proceeds to process all data stored (and wants more) before a new read can be input.

One last thing to note is that the size of the buffer can be much larger than 3 elements. You can also use more consumer threads if the conversion processing time is much slower than the time to input a new read from the input file. Similarly, more than one producer can be added to the above code. The scenarios needed to prove that everything still works as it should become a bit more complex, but nothing that some simple logic and hand-waving can’t accomplish. I mean, it’s not like trying to get the Great Birnam Wood to come to Dunsinane so as to add to Macbeth’s lengthy record of issues.