rmed

blog

SWIG, C, Python, and other tales

2021-05-09 18:45

Python has lots of libraries available for almost anything imaginable. However, there are times when, for one reason or another, you need to use native code written in C/C++ (e.g. legacy code, hard requirement for a specific project, etc.). Although there are several options to overcome this need, we are going to have a look at SWIG.

Contents

  1. Enter Swig
  2. Look at that tree!
  3. Python wrapper
  4. The quirks
  5. Conclusions
  6. References

Enter SWIG

As described in the official site of SWIG:

SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages. SWIG is used with different types of target languages including common scripting languages such as Javascript, Perl, PHP, Python, Tcl and Ruby.

What SWIG does is create a wrapper that allows our favorite programming language to interact with the native code almost transparently. I say "almost" because there are some quirks that should be taken into account depending on the language you are trying to write a wrapper for. Take Python for instance: how would C pointers and arrays be translated into the Python world, where there is garbage collection and pointers are not really something you see every day?

In this post we will see some of these special cases and how to deal with them.


Look at that tree!

Let's have an example C code defining a tree based on the following points:

  • The tree has a specific age measured in years
  • The tree has a specific height measured in meters
  • The tree is not a cilinder! Therefore, its trunk can have different diameters depending on the height
  • The tree has multiple branches
    • Each branch has a specific length
    • Each branch has a given number of leaves
      • Each leaf can be a specific color: green, orange, red, brown

This is how our tree might look like in C:

typedef enum Color {
    GREEN,
    ORANGE,
    RED,
    BROWN
} Color;

typedef struct Leaf {
    Color color;
} Leaf;

typedef struct Branch {
    double length;
    size_t num_leaves;
    Leaf *leaves;
} Branch;

typedef struct Tree {
    unsigned int age;
    unsigned int height;
    double *diameters;
    size_t num_branches;
    Branch *branches;
} Tree;

For the sake of simplicity, the age and height of the tree will be considered as integer values, and we won't care about where the branches or leaves are in the tree.

Next, we will want to have a function to initialize our tree with some default values:

Tree *new_tree(int age, int height, size_t num_branches) {
    Tree *my_tree = malloc(sizeof(Tree));

    // Trunk
    my_tree->age = age;
    my_tree->height = height;

    int i, j;

    // Trunk diameters
    double *diam = malloc(sizeof(double) * height);

    for (i = 0; i < height; i++) {
        diam[i] = 1.2;
    }

    my_tree->diameters = diam;

    // Branches
    Branch *branches = malloc(sizeof(Branch) * num_branches);

    for (i = 0; i < num_branches; i++) {
        branches[i].length = 1.5;
        branches[i].num_leaves = 42;

        Leaf *leaves = malloc(sizeof(Leaf) * branches[i].num_leaves);

        // Happy little tree
        for (j = 0; j < branches[i].num_leaves; j++) {
            leaves[j].color = GREEN;
        }

        branches[i].leaves = leaves;
    }

    my_tree->num_branches = num_branches;
    my_tree->branches = branches;

    return my_tree;
}

As well as another function that simply prints the tree structure:

void show_tree(Tree *t) {
    int i, j;
    char *leaf_color;

    // Basic info
    printf("My tree is %d years old and %d meters high\n", t->age, t->height);
    printf("These are the diameters of my tree:\n");

    for (i = 0; i < t->height; i++) {
        printf("%f ", t->diameters[i]);
    }

    printf("\n");

    // Branch info
    printf("Don't forget about all the branches in my tree:\n");

    for (i = 0; i < t->num_branches; i++) {
        printf("- Branch %d is %f meters long and has %zu leaves with following colors:\n", i, t->branches[i].length, t->branches[i].num_leaves);

        for (j = 0; j < t->branches[i].num_leaves; j++) {
            switch (t->branches[i].leaves[j].color) {
                case GREEN:
                    leaf_color = "green";
                    break;

                case ORANGE:
                    leaf_color = "orange";
                    break;

                case RED:
                    leaf_color = "red";
                    break;

                case BROWN:
                    leaf_color = "brown";
                    break;

                default:
                    break;
            }
            printf("%s ", leaf_color);
        }

        printf("\n");
    }
}

Our final supertree.c file would look like this:

#include "supertree.h"

Tree *new_tree(int age, int height, size_t num_branches) {
    Tree *my_tree = malloc(sizeof(Tree));

    // Trunk
    my_tree->age = age;
    my_tree->height = height;

    int i, j;

    // Trunk diameters
    double *diam = malloc(sizeof(double) * height);

    for (i = 0; i < height; i++) {
        diam[i] = 1.2;
    }

    my_tree->diameters = diam;

    // Branches
    Branch *branches = malloc(sizeof(Branch) * num_branches);

    for (i = 0; i < num_branches; i++) {
        branches[i].length = 1.5;
        branches[i].num_leaves = 42;

        Leaf *leaves = malloc(sizeof(Leaf) * branches[i].num_leaves);

        // Happy little tree
        for (j = 0; j < branches[i].num_leaves; j++) {
            leaves[j].color = GREEN;
        }

        branches[i].leaves = leaves;
    }

    my_tree->num_branches = num_branches;
    my_tree->branches = branches;

    return my_tree;
}

void show_tree(Tree *t) {
    int i, j;
    char *leaf_color;

    // Basic info
    printf("My tree is %d years old and %d meters high\n", t->age, t->height);
    printf("These are the diameters of my tree:\n");

    for (i = 0; i < t->height; i++) {
        printf("%f ", t->diameters[i]);
    }

    printf("\n");

    // Branch info
    printf("Don't forget about all the branches in my tree:\n");

    for (i = 0; i < t->num_branches; i++) {
        printf("- Branch %d is %f meters long and has %zu leaves with following colors:\n", i, t->branches[i].length, t->branches[i].num_leaves);

        for (j = 0; j < t->branches[i].num_leaves; j++) {
            switch (t->branches[i].leaves[j].color) {
                case GREEN:
                    leaf_color = "green";
                    break;

                case ORANGE:
                    leaf_color = "orange";
                    break;

                case RED:
                    leaf_color = "red";
                    break;

                case BROWN:
                    leaf_color = "brown";
                    break;

                default:
                    break;
            }
            printf("%s ", leaf_color);
        }

        printf("\n");
    }
}


int main() {
    Tree *tree = new_tree(1, 1, 3);

    show_tree(tree);

    free(tree);
    return 0;
}

While supertree.h would look like this:

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

typedef enum Color {
    GREEN,
    ORANGE,
    RED,
    BROWN
} Color;

typedef struct Leaf {
    Color color;
} Leaf;

typedef struct Branch {
    double length;
    size_t num_leaves;
    Leaf *leaves;
} Branch;

typedef struct Tree {
    unsigned int age;
    unsigned int height;
    double *diameters;
    size_t num_branches;
    Branch *branches;
} Tree;

Tree *new_tree(int age, int height, size_t num_branches);
void show_tree(Tree *t);

Compiling and running our code results in the following:

$ gcc supertree.c -o supertree

$ ./supertree
My tree is 1 years old and 1 meters high
These are the diameters of my tree:
1.200000
Don't forget about all the branches in my tree:
- Branch 0 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green
- Branch 1 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green
- Branch 2 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green



Python wrapper

In order to create the Python wrapper as a module we will first create a SWIG input file supertree.i with the following content:

%module supertree

%{
#include "supertree.h"
%}

%include "supertree.h"

Now for the creation of the wrapper itself:

$ swig -python -py3 supertree.i

This will result in the following two files:

  • supertree_wrap.c: generated C code for the interface
  • supertree.py: python module that will serve as the entrypoint for the interface

Although the SWIG-generated code might be a bit overwhelming to read, let's check what our main Tree struct would look like:

class Tree(object):
    thisown = property(lambda x: x.this.own(), lambda x, v: x.this.own(v), doc="The membership flag")
    __repr__ = _swig_repr
    age = property(_supertree.Tree_age_get, _supertree.Tree_age_set)
    height = property(_supertree.Tree_height_get, _supertree.Tree_height_set)
    diameters = property(_supertree.Tree_diameters_get, _supertree.Tree_diameters_set)
    num_branches = property(_supertree.Tree_num_branches_get, _supertree.Tree_num_branches_set)
    branches = property(_supertree.Tree_branches_get, _supertree.Tree_branches_set)

    def __init__(self):
        _supertree.Tree_swiginit(self, _supertree.new_Tree())
    __swig_destroy__ = _supertree.delete_Tree

# Register Tree in _supertree:
_supertree.Tree_swigregister(Tree)


def new_tree(age: "int", height: "int", num_branches: "size_t") -> "Tree *":
    return _supertree.new_tree(age, height, num_branches)

SWIG translates C structs into Python classes, so managing their attributes should be pretty straight-forward when working with an object of this class. Note as well that everything comes from the _supertree_ module, which will be the native module resulting from compiling the wrapper.

The SWIG documentation contains an example on compiling everything using distutils, but here we are going to do it manually. As we are compiling for Python 3, it will be necessary to bave the Python development libraries installed. On Debian/Ubuntu systems:

# apt install python3-dev

Now, for the compilation itself, it can be done in three steps:

$ gcc -O2 -fPIC -c supertree.c
$ gcc -O2 -fPIC -c supertree_wrap.c $(python3-config --includes)
$ gcc -shared supertree.o supertree_wrap.o -o _supertree.so

We should now have a _supertree.so file with our compiled Python module, let's try it out!

>>> import supertree
>>> mytree = supertree.new_tree(1, 1, 3)
>>> supertree.show_tree(mytree)
My tree is 1 years old and 1 meters high
These are the diameters of my tree:
1.200000
Don't forget about all the branches in my tree:
- Branch 0 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green
- Branch 1 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green
- Branch 2 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green

Great, we can now effectively call the underlying C code from Python!



The quirks

As I was mentioning earlier, there are some teeny-tiny things to take into account when using the SWIG wrapper. Let's start by trying to modify some attributes of the Tree object from Python:

>>> import supertree
>>> mytree = supertree.new_tree(1, 1, 3)
>>> mytree.age
1
>>> mytree.age = 12
>>> mytree.age
12

No issues so far, let's try modifying the only diameter value we have (remember, our tree is only 1 meter long):

>>> mytree.diameters
<Swig Object of type 'double *' at 0x7f48ab15eae0>
>>> mytree.diameters[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'SwigPyObject' object is not subscriptable

Huh, apparently we can't access the double pointer to read/modify the data. Let's see what happens with the branches:

>>> mytree.branches
<supertree.Branch; proxy of <Swig Object of type 'Branch *' at 0x7f48ab0a5720> >
>>> mytree.branches[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Branch' object is not subscriptable
>>> mytree.branches.length
1.5
>>> mytree.branches.num_leaves
42

Interesting, we have the same problem as before, but because the Branch C struct is mapped to the Python class, it seems we can only access the first element.

These are the sort of problems I was talking about: there are certain things that simply do not have a direct translation from C to the target language. That being said, SWIG is flexible enough and offers several solutions for these situations. Let's have a look at some of them!



Arrays/pointers of core data types

Previously we saw that we cannot access the pointer of double that contains the diameters of our tree trunk. This is not entirely accurate, as we cannot access it directly. The main problem here is that we are getting a SwigPyObject instance that wraps said pointer, so there is no straightforward way to access the memory from the Python side.

One possible solution comes from the ctypes Python library: ideally, we should be able to create a pointer to the memory address containing the diameters and read/modify the values from Python. Now, you might be wondering "where do we get the memory address from?", well the answer to that is from the SwigPyObject itself!

Let's have a look at the internals of the object:

>>> dir(mytree.diameters)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__int__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'acquire', 'append', 'disown', 'next', 'own']

Most of these are SWIG-specific, but we can also see that the object implements the special __int__ method, which allows the usage of int(obj). Let's see what it returns for this instance:

>>> int(mytree.diameters)
27381776

Although it may seem an incredibly random value at first, that is indeed the memory address of our diameters!. Don't believe me? Check it out:

>>> import ctypes
>>> mem = (ctypes.c_double * mytree.height).from_address(int(mytree.diameters))
>>> pointer = ctypes.pointer(mem)
>>> pointer.contents
<__main__.c_double_Array_1 object at 0x7fc49c1275c0>
>>> pointer.contents[0]
1.2
>>> pointer.contents[0] = 4.5
>>> pointer.contents[0]
4.5

Of course, if we now print the tree structure we should be able to see the changes at the C level:

>>> supertree.show_tree(mytree)
My tree is 1 years old and 1 meters high
These are the diameters of my tree:
4.500000
Don't forget about all the branches in my tree:
- Branch 0 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green
- Branch 1 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green
- Branch 2 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green

Warning

Be careful with this way of accessing memory, as there are no bounds whatsoever and you have to take care of the indexing yourself.



Arrays/pointers of C structures: access by index

The previous way of accessing pointers works wonders for core data types, however it is not viable when dealing with more complex types such as the Branch and Leaf C structures. As we saw previously whenever we try to access these elements we only get the first object in the array, making it a bit hard to work with.

One of the charms of SWIG is that it provides a way of extending the generated code with custom methods. This means that we can modify our supertree.i file to include something like the following:

%extend Branch {
    Leaf *__getitem__(size_t i) {
        return &$self->leaves[i];
    }
}

What we are doing here is implement the special __getitem__ method from Python, which should allow us to access leaves in our branch instances by index. What's more, given that the function returns a pointer, we can also modify the element in-place!

After regenerating the wrapper, and recompiling, we see that the method has been added to our Python class in supertree.py:

class Branch(object):
    thisown = property(lambda x: x.this.own(), lambda x, v: x.this.own(v), doc="The membership flag")
    __repr__ = _swig_repr
    length = property(_supertree.Branch_length_get, _supertree.Branch_length_set)
    num_leaves = property(_supertree.Branch_num_leaves_get, _supertree.Branch_num_leaves_set)
    leaves = property(_supertree.Branch_leaves_get, _supertree.Branch_leaves_set)

    def __getitem__(self, i: "size_t") -> "Leaf *":
        return _supertree.Branch___getitem__(self, i)

    def __init__(self):
        _supertree.Branch_swiginit(self, _supertree.new_Branch())
    __swig_destroy__ = _supertree.delete_Branch

Let's try it out:

>>> import supertree
>>> mytree = supertree.new_tree(1,1,3)
>>> mytree.branches[4]
<supertree.Leaf; proxy of <Swig Object of type 'Leaf *' at 0x7f22157be840> >
>>> mytree.branches[4].color
0

>>> # Try changing the color
>>> for i in range(4,36):
...     mytree.branches[i].color = supertree.RED

>>> supertree.show_tree(mytree)
My tree is 1 years old and 1 meters high
These are the diameters of my tree:
1.200000
Don't forget about all the branches in my tree:
- Branch 0 is 1.500000 meters long and has 42 leaves with following colors:
green green green green red red red red red red red red red red red red red red red red red red red red red red red red red red red red red red red red green green green green green green
- Branch 1 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green
- Branch 2 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green

As you can see, we have effectively changed the color of several leaves of the final object at the same time.

Warning

Be careful with this way of accessing memory, as there are no bounds whatsoever and you have to take care of the indexing yourself.



Arrays/pointers of C structures: access by array

What if we had several arrays in a single structure? Obviously the __getitem__ method would only allow us to access the elements of a single array, so we might need something a bit more powerful than that.

The following Stack Overflow answer provides a very interesting solution. Basically, what we will want to do is tell SWIG to generate a special array class for a specific data type by means of the arrays.i SWIG file and then implement a custom method to return that array. For example:

%extend Tree {
    %pythonappend branch_list() {
        newval = _BranchArray.frompointer(val)
        newval.ptr_retain = val
        val = newval
    }

    Branch *branch_list() {
        return $self->branches;
    }
}

%include <carrays.h>
%array_class(Branch, _BranchArray)

Breaking it down:

  1. We add the arrays.i and array class definitions at the end of the file
  2. We extend the Tree class with a native branch_list() function that returns a pointer to the branches
  3. Because we want a proper list from Python, we use the pythonappend directive to further extend the native method in order to create the array. This has to appear before the definition of the native method itself

Note that pythonappend, as its name implies, will add code after the function body of Branch *branch_list() and just before returning the value:

class Tree(object):
    thisown = property(lambda x: x.this.own(), lambda x, v: x.this.own(v), doc="The membership flag")
    __repr__ = _swig_repr
    age = property(_supertree.Tree_age_get, _supertree.Tree_age_set)
    height = property(_supertree.Tree_height_get, _supertree.Tree_height_set)
    diameters = property(_supertree.Tree_diameters_get, _supertree.Tree_diameters_set)
    num_branches = property(_supertree.Tree_num_branches_get, _supertree.Tree_num_branches_set)
    branches = property(_supertree.Tree_branches_get, _supertree.Tree_branches_set)

    def branch_list(self) -> "Branch *":
        val = _supertree.Tree_branch_list(self)

        newval = _BranchArray.frompointer(val)
        newval.ptr_retain = val
        val = newval


        return val


    def __init__(self):
        _supertree.Tree_swiginit(self, _supertree.new_Tree())
    __swig_destroy__ = _supertree.delete_Tree

If we access the data with this:

>>> import supertree
>>> mytree = supertree.new_tree(1,1,3)
>>> branch_list = mytree.branch_list()
>>> branch_list[0].length
1.5

>>> # Modify an element
>>> modified_branch = branch_list[1]
>>> modified_branch.length = 5
>>> branch_list[1] = modified_branch
>>> supertree.show_tree(mytree)
My tree is 1 years old and 1 meters high
These are the diameters of my tree:
1.200000
Don't forget about all the branches in my tree:
- Branch 0 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green
- Branch 1 is 5.000000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green
- Branch 2 is 1.500000 meters long and has 42 leaves with following colors:
green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green green

With this we now have the tools to read and modify all the information in our tree!

Warning

Be careful with this way of accessing memory, as there are no bounds whatsoever and you have to take care of the indexing yourself.




Conclusion

The examples we saw here might have been a bit rudimentary but I hope they serve as a small introduction into the world of SWIG.

As we saw, SWIG is a very powerful tool that offers many possibilities to allow us to interact with lower-level C code from an external library or program. While it is mostly transparent and most (simple) cases will require no modification to the wrapper, it might still be a good idea to make some adaptations to make the wrapper easier to work with.

Finally, I do recommend checking the SWIG documentation to have a deeper understanding of how more complex constructs would work. Although the learning curve might look a bit steep at first, I believe it will be worth it in the end.


References