Jake Goulding

A final dip into Ruby's Marshal format

This is the third and last of my posts about the Marshal format. The first part introduced the format and some straight-forward serializations. The second part touched on strings and object links. This post rounds us off with regexes, classes, modules, and instances of objects.

Regexes

/hello/

0408 492f 0a68 656c 6c6f 0006 3a06 4546

Like strings, regexes are surrounded by an IVAR. The typecode 2f is ASCII / and denotes that this object is a regex. The length of the string follows, again encoded as an integer. The regex string is stored as a set of bytes, and must be interpreted with the string encoding from the IVAR. After the string, the regex options are saved.

/hello/imx

0408 492f 0a68 656c 6c6f 0706 3a06 4546

The regex option byte is a bitset of the five possible options. In this example, ignore case, extend, and multiline are set (0x1, 0x2, and 0x4 respectively)

Classes

String

0408 630b 5374 7269 6e67

The typecode 63 is ASCII c and denotes that this object is a class. The length of the class name followed by the class name are next.

Math::DomainError

0408 6316 4d61 7468 3a3a 446f 6d61 696e 4572 726f 72

Namespaces are separated by ::.

Modules

Enumerable

0408 6d0f 456e 756d 6572 6162 6c65

Modules are identical to classes, except the typecode 6d is ASCII m.

Instances of user objects

Let’s define a small class to test with.

1
2
3
4
5
class DumpTest
  def initialize(a)
    @a = a
  end
end

DumpTest.new(nil)

0408 6f3a 0d44 756d 7054 6573 7406 3a07 4061 30

The typecode 6f is ASCII o, and denotes that this is an object. The class name is next, written as a symbol – :DumpTest. The number of instance variables is encoded as an integer, followed by pairs of name, value. This example has 1 pair of instance variables, [:@a, nil].