Using Rust objects from other languages

Let’s create a Rust object that will tell us how many people live in each USA ZIP code. We want to be able to use this logic in other languages, but we only need to pass simple primitives like integers or strings across the FFI boundary. The object will have both mutable and immutable methods. Because we can not look inside the object, this is often referred to as an opaque object or an opaque pointer.

extern crate libc;

use libc::{c_char, uint32_t};
use std::str;
use std::collections::HashMap;
use std::ffi::CStr;

pub struct ZipCodeDatabase {
    population: HashMap<String, u32>,
}

impl ZipCodeDatabase {
    fn new() -> ZipCodeDatabase {
        ZipCodeDatabase {
            population: HashMap::new(),
        }
    }

    fn populate(&mut self) {
        for i in 0..100000 {
            let zip = format!("{:05}", i);
            self.population.insert(zip, i);
        }
    }

    fn population_of(&self, zip: &str) -> u32 {
        self.population.get(zip).cloned().unwrap_or(0)
    }
}

#[no_mangle]
pub extern fn zip_code_database_new() -> *mut ZipCodeDatabase {
    Box::into_raw(Box::new(ZipCodeDatabase::new()))
}

#[no_mangle]
pub extern fn zip_code_database_free(ptr: *mut ZipCodeDatabase) {
    if ptr.is_null() { return }
    unsafe { Box::from_raw(ptr); }
}

#[no_mangle]
pub extern fn zip_code_database_populate(ptr: *mut ZipCodeDatabase) {
    let database = unsafe {
        assert!(!ptr.is_null());
        &mut *ptr
    };
    database.populate();
}

#[no_mangle]
pub extern fn zip_code_database_population_of(ptr: *const ZipCodeDatabase, zip: *const c_char) -> uint32_t {
    let database = unsafe {
        assert!(!ptr.is_null());
        &*ptr
    };
    let zip = unsafe {
        assert!(!zip.is_null());
        CStr::from_ptr(zip)
    };
    let zip_str = zip.to_str().unwrap();
    database.population_of(zip_str)
}

The struct is defined in a normal way for Rust. One extern function is created for each function of the object. C has no built-in namespacing concept, so it is normal to prefix each function with a package name and/or a type name. For this example, we use zip_code_database. Following normal C conventions, a pointer to the object is always provided as the first argument.

To create a new instance of object, we box the result of the object’s constructor. This places the struct onto the heap where it will have a stable memory address. This address is converted into a raw pointer using Box::into_raw.

This pointer points at memory allocated by Rust; memory allocated by Rust must be deallocated by Rust. We use Box::from_raw to convert the pointer back into a Box<ZipCodeDatabase> when the object is to be freed. Unlike other functions, we do allow NULL to be passed, but simply do nothing in that case. This is a nicety for client programmers.

To create a reference from a raw pointer, you can use the terse syntax &*, which indicates that the pointer should be dereferenced and then re-referenced. Creating a mutable reference is similar, but uses &mut *. Like other pointers, you must ensure that the pointer is not NULL.

Note that a *const T can be freely converted to and from a *mut T and that nothing prevents the client code from never calling the deallocation function, or from calling it more than once. Memory management and safety guarantees are completely in the hands of the programmer.

C

#include <stdio.h>
#include <stdint.h>

typedef struct zip_code_database_S zip_code_database_t;

extern zip_code_database_t *
zip_code_database_new(void);

extern void
zip_code_database_free(zip_code_database_t *);

extern void
zip_code_database_populate(zip_code_database_t *);

extern uint32_t
zip_code_database_population_of(const zip_code_database_t *, const char *zip);

int main(void) {
  zip_code_database_t *database = zip_code_database_new();

  zip_code_database_populate(database);
  uint32_t pop1 = zip_code_database_population_of(database, "90210");
  uint32_t pop2 = zip_code_database_population_of(database, "20500");

  zip_code_database_free(database);

  printf("%d\n", pop1 - pop2);
}

A dummy struct is created to provide a small amount of type-safety.

The const modifier is used on functions where appropriate, even though const-correctness is much more fluid in C than in Rust.

Ruby

require 'ffi'

class ZipCodeDatabase < FFI::AutoPointer
  def self.release(ptr)
    Binding.free(ptr)
  end

  def populate
    Binding.populate(self)
  end

  def population_of(zip)
    Binding.population_of(self, zip)
  end

  module Binding
    extend FFI::Library
    ffi_lib 'objects'

    attach_function :new, :zip_code_database_new,
                    [], ZipCodeDatabase
    attach_function :free, :zip_code_database_free,
                    [ZipCodeDatabase], :void
    attach_function :populate, :zip_code_database_populate,
                    [ZipCodeDatabase], :void
    attach_function :population_of, :zip_code_database_population_of,
                    [ZipCodeDatabase, :string], :uint32
  end
end

database = ZipCodeDatabase::Binding.new

database.populate
pop1 = database.population_of("90210")
pop2 = database.population_of("20500")

puts pop1 - pop2

To wrap the raw functions, we create a small class inheriting from AutoPointer. AutoPointer will ensure that the underlying resource is freed when the object is freed. To do this, the user must define the self.release method.

Unfortunately, because we inherit from AutoPointer, we cannot redefine the initializer. To better group concepts together, we bind the FFI methods in a nested module. We provide shorter names for the bound methods, which enables the client to just call ZipCodeDatabase::Binding.new.

Python

#!/usr/bin/env python3

import sys, ctypes
from ctypes import c_char_p, c_uint32, Structure, POINTER

class ZipCodeDatabaseS(Structure):
    pass

prefix = {'win32': ''}.get(sys.platform, 'lib')
extension = {'darwin': '.dylib', 'win32': '.dll'}.get(sys.platform, '.so')
lib = ctypes.cdll.LoadLibrary(prefix + "objects" + extension)

lib.zip_code_database_new.restype = POINTER(ZipCodeDatabaseS)

lib.zip_code_database_free.argtypes = (POINTER(ZipCodeDatabaseS), )

lib.zip_code_database_populate.argtypes = (POINTER(ZipCodeDatabaseS), )

lib.zip_code_database_population_of.argtypes = (POINTER(ZipCodeDatabaseS), c_char_p)
lib.zip_code_database_population_of.restype = c_uint32

class ZipCodeDatabase:
    def __init__(self):
        self.obj = lib.zip_code_database_new()

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        lib.zip_code_database_free(self.obj)

    def populate(self):
        lib.zip_code_database_populate(self.obj)

    def population_of(self, zip):
        return lib.zip_code_database_population_of(self.obj, zip.encode('utf-8'))

with ZipCodeDatabase() as database:
    database.populate()
    pop1 = database.population_of("90210")
    pop2 = database.population_of("20500")
    print(pop1 - pop2)

We create an empty structure to represent our type. This will only be used in conjunction with the POINTER method, which creates a new type as a pointer to an existing one.

To ensure that memory is properly cleaned up, we use a context manager. This is tied to our class through the __enter__ and __exit__ methods. We use the with statement to start a new context. When the context is over, the __exit__ method will be automatically called, preventing the memory leak.

Haskell

{-# LANGUAGE ForeignFunctionInterface #-}

import Data.Word (Word32)
import Foreign.Ptr
import Foreign.ForeignPtr
import Foreign.C.String (CString(..), newCString)

data ZipCodeDatabase

foreign import ccall unsafe "zip_code_database_new"
  zip_code_database_new :: IO (Ptr ZipCodeDatabase)

foreign import ccall unsafe "&zip_code_database_free"
  zip_code_database_free :: FunPtr (Ptr ZipCodeDatabase -> IO ())

foreign import ccall unsafe "zip_code_database_populate"
  zip_code_database_populate :: Ptr ZipCodeDatabase -> IO ()

foreign import ccall unsafe "zip_code_database_population_of"
  zip_code_database_population_of :: Ptr ZipCodeDatabase -> CString -> Word32

createDatabase :: IO (Maybe (ForeignPtr ZipCodeDatabase))
createDatabase = do
  ptr <- zip_code_database_new
  if ptr /= nullPtr
    then do
      foreignPtr <- newForeignPtr zip_code_database_free ptr
      return $ Just foreignPtr
    else
      return Nothing

populate = zip_code_database_populate

populationOf :: Ptr ZipCodeDatabase -> String -> IO (Word32)
populationOf db zip = do
  zip_str <- newCString zip
  return $ zip_code_database_population_of db zip_str

main :: IO ()
main = do
  db <- createDatabase
  case db of
    Nothing -> putStrLn "Unable to create database"
    Just ptr -> withForeignPtr ptr $ \database -> do
        populate database
        pop1 <- populationOf database "90210"
        pop2 <- populationOf database "20500"
        print (pop1 - pop2)

We start by defining an empty type to refer to the opaque object. When defining the imported functions, we use the Ptr type constructor with this new type as the type of the pointer returned from Rust. We also use IO as allocating, freeing, and populating the object are all functions with side-effects.

As allocation can theoretically fail, we check for NULL and return a Maybe from the constructor. This is likely overkill, as Rust currently aborts the process when the allocator fails.

To ensure that the allocated memory is automatically freed, we use the ForeignPtr type. This takes a raw Ptr and a function to call when the wrapped pointer is deallocated.

When using the wrapped pointer, withForeignPtr is used to unwrap it before passing it back to the FFI functions.

Node.js

const ffi = require('ffi');

const lib = ffi.Library('libobjects', {
  zip_code_database_new: ['pointer', []],
  zip_code_database_free: ['void', ['pointer']],
  zip_code_database_populate: ['void', ['pointer']],
  zip_code_database_population_of: ['uint32', ['pointer', 'string']],
});

const ZipCodeDatabase = function() {
  this.ptr = lib.zip_code_database_new();
};

ZipCodeDatabase.prototype.free = function() {
  lib.zip_code_database_free(this.ptr);
};

ZipCodeDatabase.prototype.populate = function() {
  lib.zip_code_database_populate(this.ptr);
};

ZipCodeDatabase.prototype.populationOf = function(zip) {
  return lib.zip_code_database_population_of(this.ptr, zip);
};

const database = new ZipCodeDatabase();
try {
  database.populate();
  const pop1 = database.populationOf('90210');
  const pop2 = database.populationOf('20500');
  console.log(pop1 - pop2);
} finally {
  database.free();
}

When importing the functions, we simply declare that a pointer type is returned or accepted.

To make accessing the functions cleaner, we create a simple class that maintains the pointer for us and abstracts passing it to the lower-level functions. This also gives us as opportunity to rename the functions with idiomatic JavaScript camel-case.

To ensure that the resources are cleaned up, we use a try block and call the deallocation method in the finally block.

C#

using System;
using System.Runtime.InteropServices;

internal class Native
{
    [DllImport("objects")]
    internal static extern ZipCodeDatabaseHandle zip_code_database_new();
    [DllImport("objects")]
    internal static extern void zip_code_database_free(IntPtr db);
    [DllImport("objects")]
    internal static extern void zip_code_database_populate(ZipCodeDatabaseHandle db);
    [DllImport("objects")]
    internal static extern uint zip_code_database_population_of(ZipCodeDatabaseHandle db, string zip);
}

internal class ZipCodeDatabaseHandle : SafeHandle
{
    public ZipCodeDatabaseHandle() : base(IntPtr.Zero, true) {}

    public override bool IsInvalid
    {
        get { return false; }
    }

    protected override bool ReleaseHandle()
    {
        Native.zip_code_database_free(handle);
        return true;
    }
}

public class ZipCodeDatabase : IDisposable
{
    private ZipCodeDatabaseHandle db;

    public ZipCodeDatabase()
    {
        db = Native.zip_code_database_new();
    }

    public void Populate()
    {
        Native.zip_code_database_populate(db);
    }

    public uint PopulationOf(string zip)
    {
        return Native.zip_code_database_population_of(db, zip);
    }

    public void Dispose()
    {
        db.Dispose();
    }

    static public void Main()
    {
          var db = new ZipCodeDatabase();
          db.Populate();

          var pop1 = db.PopulationOf("90210");
          var pop2 = db.PopulationOf("20500");

          Console.WriteLine("{0}", pop1 - pop2);
    }
}

As the responsibilities for calling the native functions are going to be more spread out, we create a Native class to hold all the definitions.

To ensure that the allocated memory is automatically freed, we create a subclass of the SafeHandle class. This requires implementing IsInvalid and ReleaseHandle. Since our Rust function accepts freeing a NULL pointer, we can say that every pointer is valid.

We can use our safe wrapper ZipCodeDatabaseHandle as the type of the FFI functions except for the free function. The actual pointer will be automatically marshalled to and from the wrapper.

We also allow the ZipCodeDatabase to participate in the IDisposable protocol, forwarding to the safe wrapper.