Pointers and Arrays

Shroud will create code to map between C and Fortran pointers. The interoperability with C features of Fortran 2003 and the call-by-reference feature of Fortran provides most of the features necessary to pass arrays to C++ libraries. Shroud can also provide additional semantic information. Adding the +rank(n) attribute will declare the argument as an assumed-shape array with the given rank: +rank(2) creates arg(:,:). The +dimension(n) attribute will instead give an explicit dimension: +dimension(10,20) creates arg(10,20).

Using dimension on intent(in) arguments will use the dimension shape in the Fortran wrapper instead of assumed-shape. This adds some additional safety since many compiler will warn if the actual argument is too small. This is useful when the C++ function has an assumed shape. For example, it expects a pointer to 16 elements. The Fortran wrapper will pass a pointer to contiguous memory with no explicit shape information.

When a function returns a pointer, the default behavior of Shroud is to convert it into a Fortran variable with the POINTER attribute using c_f_pointer. This can be made explicit by adding +deref(pointer) to the function declaration in the YAML file. For example, int *getData(void) +deref(pointer) creates the Fortran function interface

function get_data() result(rv)
    integer(C_INT), pointer :: rv
end function get_data

The result of the the Fortran function directly accesses the memory returned from the C++ library.

An array can be returned by adding the attribute +dimension(n) to the function. The dimension expression will be used to provide the shape argument to c_f_pointer. The arguments to dimension are C++ expressions which are evaluated after the C++ function is called and can be the name of another argument to the function or call another C++ function. As a simple example, this declaration returns a pointer to a constant sized array.

- decl: int *returnIntPtrToFixedArray(void) +dimension(10)

If the dimension is unknown when the function returns, a type(C_PTR) can be returned with +deref(raw). This will allow the user to call c_f_pointer once the shape is known. Instead of a Fortran pointer to a scalar, a scalar can be returned by adding +deref(scalar).

A common idiom for C++ is to return pointers to memory via arguments. This would be declared as int **arg +intent(out). By default, Shroud treats the argument similar to a function which returns a pointer: it adds the deref(pointer) attribute to treats it as a POINTER to a scalar. The dimension attribute can be used to create an array similar to a function result. If the deref(allocatable) attribute is added, then a Fortran array will be allocated to the size of dimension attribute and the argument will be copied into the Fortran memory.

A function which returns multiple layers of indirection will return a type(C_PTR). This is also true for function arguments beyond int **arg +intent(out). This pointer can represent non-contiguous memory and Shroud has no way to know the extend of each pointer in the array.

A special case is provided for arrays of NULL terminated strings, char **. While this also represents non-contiguous memory, it is a common idiom and can be processed since the length of each string can be found with strlen. See example acceptCharArrayIn.

In Python wrappers, Shroud will allocate intent(out) arguments before calling the function. This requires the dimension attribute which defines the shape and must be known before the function is called. The argument will then be returned by the function along with the function result and other intent(out) arguments. For example, int **arg +intent(out)+dimension(n). The value of the dimension attribute is used to define the shape of the array and must be known before the library function is called. The dimension attribute can include the Fortran intrinsic size to define the shape in terms of another array.

char * functions are treated differently. By default deref attribute will be set to allocatable. After the C++ function returns, a CHARACTER variable will be allocated and the contents copied. This will convert a NULL terminated string into the proper length of Fortran variable. For very long strings or strings with embedded NULL, deref(raw) will return a type(C_PTR).

void * functions return a type(C_PTR) argument and cannot have deref, dimension, or rank attributes. A type(C_PTR) argument will be passed by value. For a void ** argument, the type(C_PTR) will be passed by reference (the default). This will allow the C wrapper to assign a value to the argument. See example passVoidStarStar.

If the C++ library function can also provide the length of the pointer, then its possible to return a Fortran POINTER or ALLOCATABLE variable. This allows the caller to directly use the returned value of the C++ function. However, there is a price; the user will have to release the memory if owner(caller) is set. To accomplish this with POINTER arguments, an additional argument is added to the function which contains information about how to delete the array. If the argument is declared Fortran ALLOCATABLE, then the value of the C++ pointer are copied into a newly allocated Fortran array. The C++ memory is deleted by the wrapper and it is the callers responsibility to deallocate the Fortran array. However, Fortran will release the array automatically under some conditions when the caller function returns. If owner(library) is set, the Fortran caller never needs to release the memory.

See Memory Management for details of the implementation.

A void pointer may also be used in a C function when any type may be passed in. The attribute assumedtype can be used to declare a Fortran argument as assumed-type: type(*).

- decl: int passAssumedType(void *arg+assumedtype)
function pass_assumed_type(arg) &
        result(SHT_rv) &
        bind(C, name="passAssumedType")
    use iso_c_binding, only : C_INT, C_PTR
    implicit none
    type(*) :: arg
    integer(C_INT) :: SHT_rv
end function pass_assumed_type

Memory Management

Shroud will maintain ownership of memory via the owner attribute. It uses the value of the attribute to decided when to release memory.

Use owner(library) when the library owns the memory and the user should not release it. For example, this is used when a function returns const std::string & for a reference to a string which is maintained by the library. Fortran and Python will both get the reference, copy the contents into their own variable (Fortran CHARACTER or Python str), then return without releasing any memory. This is the default behavior.

Use owner(caller) when the library allocates new memory which is returned to the caller. The caller is then responsible to release the memory. Fortran and Python can both hold on to the memory and then provide ways to release it using a C++ callback when it is no longer needed.

For shadow classes with a destructor defined, the destructor will be used to release the memory.

The c_statements may also define a way to destroy memory. For example, std::vector provides the lines:

destructor_name: std_vector_{cxx_T}
destructor:
-  std::vector<{cxx_T}> *cxx_ptr = reinterpret_cast<std::vector<{cxx_T}> *>(ptr);
-  delete cxx_ptr;

Patterns can be used to provide code to free memory for a wrapped function. The address of the memory to free will be in the variable void *ptr, which should be referenced in the pattern:

declarations:
- decl: char * getName() +free_pattern(free_getName)

patterns:
   free_getName: |
      decref(ptr);

Without any explicit destructor_name or pattern, free will be used to release POD pointers; otherwise, delete will be used.

C and Fortran

Fortran keeps track of C++ objects with the struct C_capsule_data_type and the bind(C) equivalent F_capsule_data_type. Their names in the format dictionary default to {C_prefix}SHROUD_capsule_data and {C_prefix}SHROUD_capsule_data. In the Tutorial these types are defined in typesTutorial.h as:

// helper capsule_CLA_Class1
struct s_CLA_Class1 {
    void *addr;     /* address of C++ memory */
    int idtor;      /* index of destructor */
};
typedef struct s_CLA_Class1 CLA_Class1;

And wrapftutorial.f:

! helper capsule_data_helper
type, bind(C) :: CLA_SHROUD_capsule_data
    type(C_PTR) :: addr = C_NULL_PTR  ! address of C++ memory
    integer(C_INT) :: idtor = 0       ! index of destructor
end type CLA_SHROUD_capsule_data

addr is the address of the C or C++ variable, such as a char * or std::string *. idtor is a Shroud generated index of the destructor code defined by destructor_name or the free_pattern attribute. These code segments are collected and written to function C_memory_dtor_function. A value of 0 indicated the memory will not be released and is used with the owner(library) attribute.

Each class creates its own capsule struct for the C wrapper. This is to provide a measure of type safety in the C API. All Fortran classes use the same derived type since the user does not directly access the derived type.

A typical destructor function would look like:

// Release library allocated memory.
void TUT_SHROUD_memory_destructor(TUT_SHROUD_capsule_data *cap)
{
    void *ptr = cap->addr;
    switch (cap->idtor) {
    case 0:   // --none--
    {
        // Nothing to delete
        break;
    }
    case 1:   // new_string
    {
        std::string *cxx_ptr = reinterpret_cast<std::string *>(ptr);
        delete cxx_ptr;
        break;
    }
    default:
    {
        // Unexpected case in destructor
        break;
    }
    }
    cap->addr = nullptr;
    cap->idtor = 0;  // avoid deleting again
}

Character and Arrays

In order to create an allocatable copy of a C++ pointer, an additional structure is involved. For example, getConstStringPtrAlloc returns a pointer to a new string. From strings.yaml:

declarations:
- decl: const std::string * getConstStringPtrAlloc() +owner(library)

The C wrapper calls the function and saves the result along with metadata consisting of the address of the data within the std::string and its length. The Fortran wrappers allocates its return value to the proper length, then copies the data from the C++ variable and deletes it.

The metadata for variables are saved in the C struct C_array_type and the bind(C) equivalent F_array_type.:

// helper array_context
struct s_STR_SHROUD_array {
    STR_SHROUD_capsule_data cxx;      /* address of C++ memory */
    union {
        const void * base;
        const char * ccharp;
    } addr;
    int type;        /* type of element */
    size_t elem_len; /* bytes-per-item or character len in c++ */
    size_t size;     /* size of data in c++ */
    int rank;        /* number of dimensions, 0=scalar */
    long shape[7];
};
typedef struct s_STR_SHROUD_array STR_SHROUD_array;

The union for addr makes some assignments easier by removing the need for casts and also aids debugging. The union is replaced with a single type(C_PTR) for Fortran:

! helper array_context
type, bind(C) :: STR_SHROUD_array
    ! address of C++ memory
    type(STR_SHROUD_capsule_data) :: cxx
    ! address of data in cxx
    type(C_PTR) :: base_addr = C_NULL_PTR
    ! type of element
    integer(C_INT) :: type
    ! bytes-per-item or character len of data in cxx
    integer(C_SIZE_T) :: elem_len = 0_C_SIZE_T
    ! size of data in cxx
    integer(C_SIZE_T) :: size = 0_C_SIZE_T
    ! number of dimensions
    integer(C_INT) :: rank = -1
    integer(C_LONG) :: shape(7) = 0
end type STR_SHROUD_array

The C wrapper does not return a std::string pointer. Instead it passes in a C_array_type pointer as an argument. It calls getConstStringPtrAlloc, saves the results and metadata into the argument. This allows it to be easily accessed from Fortran. Since the attribute is owner(library), cxx.idtor is set to 0 to avoid deallocating the memory.

void STR_get_const_string_ptr_alloc_bufferify(
    STR_SHROUD_array *SHT_rv_cdesc)
{
    // splicer begin function.get_const_string_ptr_alloc_bufferify
    const std::string * SHCXX_rv = getConstStringPtrAlloc();
    ShroudStrToArray(SHT_rv_cdesc, SHCXX_rv, 0);
    // splicer end function.get_const_string_ptr_alloc_bufferify
}

The Fortran wrapper uses the metadata to allocate the return argument to the correct length:

function get_const_string_ptr_alloc() &
        result(SHT_rv)
    character(len=:), allocatable :: SHT_rv
    ! splicer begin function.get_const_string_ptr_alloc
    type(STR_SHROUD_array) :: SHT_rv_cdesc
    call c_get_const_string_ptr_alloc_bufferify(SHT_rv_cdesc)
    allocate(character(len=SHT_rv_cdesc%elem_len):: SHT_rv)
    call STR_SHROUD_copy_string_and_free(SHT_rv_cdesc, SHT_rv, &
        SHT_rv_cdesc%elem_len)
    ! splicer end function.get_const_string_ptr_alloc
end function get_const_string_ptr_alloc

Finally, the helper function SHROUD_copy_string_and_free is called to set the value of the result and possible free memory for owner(caller) or intermediate values:

// helper copy_string
// Copy the char* or std::string in context into c_var.
// Called by Fortran to deal with allocatable character.
void STR_ShroudCopyStringAndFree(STR_SHROUD_array *data, char *c_var, size_t c_var_len) {
    const char *cxx_var = data->addr.ccharp;
    size_t n = c_var_len;
    if (data->elem_len < n) n = data->elem_len;
    std::strncpy(c_var, cxx_var, n);
    STR_SHROUD_memory_destructor(&data->cxx); // delete data->cxx.addr
}

Note

The three steps of call, allocate, copy could be replaced with a single call by using the further interoperability with C features of Fortran 2018 (a.k.a TS 29113). This feature allows Fortran ALLOCATABLE variables to be allocated by C. However, not all compilers currently support that feature. The current Shroud implementation works with Fortran 2003.

Python

NumPy arrays control garbage collection of C++ memory by creating a PyCapsule as the base object of NumPy objects. Once the final reference to the NumPy array is removed, the reference count on the PyCapsule is decremented. When 0, the destructor for the capsule is called and releases the C++ memory. This technique is discussed at [blog1] and [blog2]

Old

Note

C_finalize is replaced by statement.final

Shroud generated C wrappers do not explicitly delete any memory. However a destructor may be automatically called for some C++ stl classes. For example, a function which returns a std::string will have its value copied into Fortran memory since the function’s returned object will be destroyed when the C++ wrapper returns. If a function returns a char * value, it will also be copied into Fortran memory. But if the caller of the C++ function wants to transfer ownership of the pointer to its caller, the C++ wrapper will leak the memory.

The C_finalize variable may be used to insert code before returning from the wrapper. Use C_finalize_buf for the buffer version of wrapped functions.

For example, a function which returns a new string will have to delete it before the C wrapper returns:

std::string * getConstStringPtrLen()
{
    std::string * rv = new std::string("getConstStringPtrLen");
    return rv;
}

Wrapped as:

- decl: const string * getConstStringPtrLen+len=30()
  format:
    C_finalize_buf: delete {cxx_var};

The C buffer version of the wrapper is:

void STR_get_const_string_ptr_len_bufferify(char * SHF_rv, int NSHF_rv)
{
    const std::string * SHCXX_rv = getConstStringPtrLen();
    if (SHCXX_rv->empty()) {
        std::memset(SHF_rv, ' ', NSHF_rv);
    } else {
        ShroudStrCopy(SHF_rv, NSHF_rv, SHCXX_rv->c_str());
    }
    {
        // C_finalize
        delete SHCXX_rv;
    }
    return;
}

The unbuffer version of the function cannot destroy the string since only a pointer to the contents of the string is returned. It would leak memory when called:

const char * STR_get_const_string_ptr_len()
{
    const std::string * SHCXX_rv = getConstStringPtrLen();
    const char * SHC_rv = SHCXX_rv->c_str();
    return SHC_rv;
}

Footnotes

[blog1]http://blog.enthought.com/python/numpy-arrays-with-pre-allocated-memory
[blog2]http://blog.enthought.com/python/numpy/simplified-creation-of-numpy-arrays-from-pre-allocated-memory