Home » Intel » Migrating Fortran projects to the Intel(r) Xeon Phi(tm) coprocessor

Migrating Fortran projects to the Intel(r) Xeon Phi(tm) coprocessor


This text makes a speciality of factors of porting Fortran codes to the Intel® Xeon Phi™ coprocessor.  Lots of the documentation for the coprocessor is C/C++ centric. Right here the focal point is on Fortran, which continues to be the dominant language for scientific programming and for which a considerable amount of legacy code exits.

Native versus offload programming fashions

The choice as as to if code will have to be run natively on the coprocessor or the use of the offload programming adaptation is far the identical for Fortran as it’s for C/C++. Legacy Fortran code can normally be run natively on the coprocessor with out change. Then again, for the code to acquire height efficiency, it need to be each vectorized and multithreaded. 

If the code has huge segments of single threaded code interspersed with multithreaded code, one of the best efficiency may well be bought by way of the usage of the offload programming version. An exception to that is when the use of the offload adaptation would require huge arrays to be transferred again and again between the host and coprocessor. There’s no identical to the Intel® Cilk™Plus offload for Fortran. In Intel CilkPlus, handiest these information components whose values have modified might be transferred on the start and finish of an offload part; in Fortran all information listed on the offload directive will probably be transferred on the start and finish of an offload part until the NOCOPY, IN or OUT key phrases are used to limit the switch. On account of this, it is very important use the NOCOPY, IN and OUT choices when conceivable moderately than relying exclusively on the default conduct.


Alternatives for multithreading embrace OpenMP*, Pthreads* and MPI*, all of which can be additionally on hand in C/C++. OpenMP is the most well-liked threading adaptation in legacy Fortran codes, adopted through MPI then Pthreads.

Beginning with model 14.zero of the Intel Fortran compiler, coarrays and the DO CONCURRENT assemble are additionally on hand. Previous to 14.zero, coarrays have been to be had on the Intel Xeon processor, however no longer on the Intel Xeon Phi coprocessor. There are, then again, two restrictions right now: 1) the host running gadget should be Linux and a couple of) simplest the allotted reminiscence adaptation is on hand.

A coarray application can also be run natively on the coprocessor, run with a number of coarray photography on the host and others allotted throughout a number of coprocessors or run on the host with person photography offloading work to a number of coprocessor. The choice as to which variation to make use of is just like the choices made when operating an MPI code. Inserting a couple of coarray photography on a single coprocessor will increase reminiscence necessities however cuts down on verbal exchange prices. As with MPI, OpenMP can be utilized inside person pictures of a coarray software to extend the collection of threads operating on the coprocessor with a purpose to preserve the entire cores busy.


When marking a operate or subroutine or variable to be offloaded when that process is contained in a module, add the road:


instantly ahead of the FUNCTION or SUBROUTINE STATEMENT.

Within the process the place the module is used, it isn’t important so as to add every other attributes directive. The directive throughout the module is seen to the process which is the usage of the module. Whereas it’s not essential to take action, it may be to the programmer’s benefit to put associated capabilities in an effort to be offloaded right into a single module. In a similar fashion, offload attributes for variables throughout the module must be location on the high of the information part for the module and it’s not important so as to add the offload attribute statements for these variables to the process the place that module is used.

It’s also that you can think of to mark an entire module as having the offload attribute. To do that, situation the attribute announcement sooner than the use observation within the process the place the module is used. 

Widespread Blocks

When the use of in style blocks in offloaded code, simplest these variables required via the offloaded sections need to be marked with the offload attribute. If the widespread block is utilized in plenty of offloaded approaches, the attributes directive together with the well-liked block announcement can also be positioned in an embrace file utilized in every process. Then again, if totally different variables are utilized in each and every hobbies, particular person attributes directives can also be positioned instantly in every events and tailor-made to the desires of that process. For an in depth instance of the usage of in style blocks, see “leoF02_global_common.inc” and “leoF02_global.F90”.

Well-liked blocks on the coprocessor apply the Fortran usual. Subsequently, even if all variables in a typical block usually are not marked as having the offload attribute, area for the entire widespread block is allotted on the coprocessor. There may be, on the other hand, no mapping from the variable deal with on the host to the variable handle on the coprocessor for variables that don’t have the offload attribute. Should you attempt to use that variable title on the coprocessor, the compiler will warn you that the variable will not be marked for offload. Even though the compiler will permit you to use an equivalence commentary to get right of entry to that reminiscence, it is a very bad factor to do. Values saved on this house may not be transferred again to the host on the finish of the offload part. When you have been which means to head thru your code and dispose of the equivalence statements, this could be a just right time to take action.


In C/C++, pointers can’t be handed into and out of offload sections. Alternatively, Fortran pointers are a distinct factor. As a result of Fortran pointers are at all times robotically dereference every time they’re used, they may be able to be designated within the knowledge switch component to an offload remark. The pointer can be dereferenced and conformable house will probably be allotted on the coprocessor. 

However what occurs in case your pointer factors to a construction or a component of a derived kind that accommodates a pointer? The conduct on this case is altering with the 14.zero liberate of the compiler.

Asynchronous information transfers

When doing an asynchronous information switch, it is important to add SIGNAL and WAIT choices to the offload directives the place the information is referenced. The worth of the variable you’ll wait on can also be any distinctive integer price. With the intention to keep away from desiring to cross across the price for any explicit tag, it is suggested that the worth be set to the host’s deal with for the primary component of knowledge that was once handed. In Fortran the best way to do that is to make use of the LOC intrinsic. This isn’t a part of the Fortran same old however is recurrently to be had, together with within the Intel® Fortran Compiler.

The next code does asynchronous information transfers the use of the areas of the my_data and my_result to sign. It allocates area on the coprocessor for my_data and my_result and starts offevolved transferring the contents of my_data over to the coprocessor. Whereas that switch is taking place, that you can proceed to do some work on the host. When the code reaches the following offload directive that’s ready on that tag, it exams to make sure the preliminary switch has achieved. As quickly because the switch is finished, it offloads some work to the coprocessor. In the event you add a sign method to that offload directive, the host is free to do some extra work whereas it waits for any output from the offload part to be transferred again from the coprocessor. When the host is finished with its work, it waits unless the output has been transferred again to the host and frees the gap on the coprocessor.

SUBROUTINE async_example(my_data, my_result, cnt)
   INTEGER cnt
   INTEGER my_data[cnt], my_result[cnt]
   INTEGER(INT_PTR_KIND()) my_data_loc, my_result_loc
   my_data_loc = LOC(my_data)
   my_result_loc = LOC(my_result)
   !DIR$ OFFLOAD_TRANSFER TARGET(mic:zero)                                &
      IN(my_data : LENGTH(cnt) ALLOC_IF(.genuine.) FREE_IF(.false.))      &
      NOCOPY(my_result : LENGTH(cnt) ALLOC_IF(.genuine.) FREE_IF(.false.))&
    ...do some work on the host right here...
   !DIR$ OFFLOAD BEGIN TARGET(mic:zero) WAIT(my_data_loc) SIGNAL(my_result_loc) &
      NOCOPY(my_data : ALLOC_IF(.false.) FREE_IF(.false.))                     &
      OUT(my_result : ALLOC_IF(.false.) FREE_IF(.false.))
   ...do your offloaded work right here...
    ...do some work on the host right here...
   !DIR$ OFFLOAD_TRANSFER TARGET(mic:zero) WAIT(my_result_loc)                &
      NOCOPY(my_data, my_result : ALLOC_IF(.false.) FREE_IF(.genuine.))
END SUBROUTINE async_example


Vectorizable Features and Subroutines

The Intel Fortran Compiler gives a technique for writing vectorizable capabilities and subroutines. Often called SIMD-enabled capabilities and subroutines, they’re written the use of the vector attribute:

SUBROUTINE MY_SIMD_ROUTINE(dummy1, dummy2, and many others)

All these strategies can be utilized on the Intel Xeon Phi coprocessor in addition to the Intel Xeon processor.

For code which is run in offload mode, a vectorizable perform or subroutine will need to have the offload attribute, the identical as for every other process. 

When the usage of the vector attribute, it’s not vital so as to add the vector attribute directive to each the process definition and to the process name if each are in the identical file.  Alternatively, it’s not an error to take action. When the process definition and process name exist in separate recordsdata, the vector attribute directive have to be utilized in each areas. For the offload attribute, the offload attribute directive should at all times be particular each within the process definition and earlier than the process name. Due to this fact the next is really helpful when the usage of the vector attribute:

1) Add the offload attribute to the attributes directive:


If preferred, that you can additionally add the structure kind to the vector attribute. Should you accomplish that, you’ll want to specify the structure for each the processor and the coprocessor:


2) Location the directive right away after the SUBROUTINE or FUNCTION remark and within each and every process that calls this SUBROUTINE or FUNCTION.

Array Notation

So much of the documentation for array notation on the coprocessor makes use of Intel CilkPlus array notation. Fortran programmers will have to take coronary heart – the model of array notation in Fortran makes use of the Fortran usual array notation, now not the Intel CilkPlus array notation. Subsequently, within the offload commentary

 !dir$ offload commence in(foo(1:6:2))

 the which means of foo(1:6:2) is: from component 1 to part 6 with a stride of two, as a Fortran programmer would predict.