Rust functions with string arguments

Let’s start on something a little more complex, accepting strings as arguments. In Rust, strings are composed of a slice of u8 and are guaranteed to be valid UTF-8, which allows for NUL bytes in the interior of the string. In C, strings are just pointers to a char and are terminated by a NUL byte (with the integer value 0). Some work is needed to convert between these two representations.

extern crate libc;

use libc::c_char;
use std::ffi::CStr;

#[no_mangle]
pub extern "C" fn how_many_characters(s: *const c_char) -> u32 {
    let c_str = unsafe {
        assert!(!s.is_null());

        CStr::from_ptr(s)
    };

    let r_str = c_str.to_str().unwrap();
    r_str.chars().count() as u32
}

Getting a Rust string slice (&str) requires a few steps:

  1. We have to ensure that the C pointer is not NULL as Rust references are not allowed to be NULL.

  2. Use std::ffi::CStr to wrap the pointer. CStr will compute the length of the string based on the terminating NUL. This requires an unsafe block as we will be dereferencing a raw pointer, which the Rust compiler cannot verify meets all the safety guarantees so the programmer must do it instead.

  3. Ensure that the C string is valid UTF-8 and convert it to a Rust string slice.

  4. Use the string slice.

In this example, we are simply aborting the program if any of our preconditions fail. Each use case must evaluate what are appropriate failure modes, but failing loudly and early is a good initial position.

Ownership and lifetimes

In this example, the Rust code does not own the string slice, and the compiler will only allow the string to live as long as the CStr instance. It is up to the programmer to ensure that this lifetime is sufficiently short.

C

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

extern uint32_t
how_many_characters(const char *str);

int main(void) {
  uint32_t count = how_many_characters("göes to élevên");
  printf("%" PRIu32 "\n", count);
}

The C code declares the function to accept a pointer to a constant string, as the Rust function will not modify it. You can then call the function with a normal C string constant.

Ruby

# coding: utf-8
require 'ffi'

module StringArguments
  extend FFI::Library
  ffi_lib 'string_arguments'
  attach_function :how_many_characters, [:string], :uint32
end

puts StringArguments.how_many_characters("göes to élevên")

The FFI gem automatically converts Ruby strings to the appropriate C string.

Python

#!/usr/bin/env python3
# coding: utf-8

import sys, ctypes
from ctypes import c_uint32, c_char_p

prefix = {'win32': ''}.get(sys.platform, 'lib')
extension = {'darwin': '.dylib', 'win32': '.dll'}.get(sys.platform, '.so')
lib = ctypes.cdll.LoadLibrary(prefix + "string_arguments" + extension)

lib.how_many_characters.argtypes = (c_char_p,)
lib.how_many_characters.restype = c_uint32

print(lib.how_many_characters("göes to élevên".encode('utf-8')))

Python strings must be encoded as UTF-8 to be passed through the FFI boundary.

Haskell

{-# LANGUAGE ForeignFunctionInterface #-}

import Data.Word (Word32)
import Foreign.C.String (CString(..), newCString)

foreign import ccall "how_many_characters"
  how_many_characters :: CString -> Word32

main :: IO ()
main = do
  str <- newCString "göes to élevên"
  print (how_many_characters str)

The Foreign.C.String module has support for converting Haskell’s string representation to C’s packed-byte representation. We can create one with the newCString function, and then pass the CString value to our foreign call.

Node.js

const ffi = require('ffi-napi');

const lib = ffi.Library('libstring_arguments', {
  how_many_characters: ['uint32', ['string']],
});

console.log(lib.how_many_characters('göes to élevên'));

The ffi package automatically converts JavaScript strings to the appropriate C strings.

C#

using System;
using System.Runtime.InteropServices;

class StringArguments
{
    [DllImport("string_arguments", EntryPoint="how_many_characters")]
    public static extern uint HowManyCharacters(string s);

    static public void Main()
    {
        var count = StringArguments.HowManyCharacters("göes to élevên");
        Console.WriteLine(count);
    }
}

Native strings are automatically marshalled to C-compatible strings.

Julia

#!/usr/bin/env julia
using Libdl

libname = "string_arguments"
if !Sys.iswindows()
    libname = "lib$(libname)"
end

libstring_arguments = Libdl.dlopen(libname)
howmanycharacters_sym = Libdl.dlsym(libstring_arguments, :how_many_characters)

howmanycharacters(s:: AbstractString) = ccall(
    howmanycharacters_sym,
    UInt32, (Cstring,),
    s)

println(howmanycharacters("göes to élevên"))

Julia strings (of base type AbstractString) are automatically converted to C strings. The Cstring type from Julia is compatible with the Rust type CStr, as it also assumes a NUL terminator byte and does not allow NUL bytes embedded in the string.