DBUS Server side library wish list

Ideally it would be great if a DBUS server side library provided

  1. Fully implements the functionality needed for common interfaces (Properties, ObjectManager, Introspectable) in a sane and easy way and doesn’t require you to manually supply the interface XML.
  2. Allows you to register a number of objects simultaneously, so if you have circular references etc.  This avoids race conditions on client.
  3. Ability to auto generate signals when object state changes and represent the state of the object separately for each interface on each object.
  4. Freeze/Thaw on objects or the whole model to minimize number of signals, but without requiring the user to manually code stuff up for signals.
  5. Configurable process/thread model and thread safe.
  6. Incrementally and dynamically add/remove an interface to an object without destroying the object and re-creating and while incrementally adding/removing the state as well.
  7. Handle object path construction issues, like having an object that needs an object path to another object that doesn’t yet exist.  This is alleviated of you have #8.
  8. Ability to create one or more objects without actually registering them with the service, so you can handle issues like #7 easier, especially when coupled with #2, directly needed for #2.  Thus you create 1 or more objects and register them together.
  9. Doesn’t require the use of code generating tools.
  10. Allow you to have multiple paths/name spaces which refer to the same objects. This would be useful for services that implement functionality in other services without requiring the clients to change.
  11. Allows you to remove a dbus object from the model while you are processing a method on it.  eg. client is calling a remove method on the object it wants to remove.

How-to Stratis storage

Introduction

Stratis (which includes stratisd as well as stratis-cli), provides ZFS/Btrfs-style features by integrating layers of existing technology: Linux’s devicemapper subsystem, and the XFS filesystem. The stratisd daemon manages collections of block devices, and exports a D-Bus API. The stratis-cli provides a command-line tool which itself uses the D-Bus API to communicate with stratisd.

1. Installation

# dnf install stratisd stratis-cli

2. Start the service

# systemctl start stratisd
# systemctl enable stratisd
Created symlink /etc/systemd/system/sysinit.target.wants/stratisd.service →
/usr/lib/systemd/system/stratisd.service.

3. Locate a block device that is empty. You can use something like lsblk and blkid to locate, eg.

 
# lsblk
    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sda      8:0    0   28G  0 disk 
    ├─sda1   8:1    0    1G  0 part /boot
    ├─sda2   8:2    0  2.8G  0 part [SWAP]
    └─sda3   8:3    0   15G  0 part /
    sdb      8:16   0    1T  0 disk  

# blkid -p /dev/sda
  /dev/sda: PTUUID="b7168b63" PTTYPE="dos"

Not empty

# blkid -p /dev/sdb

Empty

4. Create a Stratis pool

# stratis pool create stratis_howto /dev/sdb
# stratis pool list
  Name             Total Physical Size  Total Physical Used
  stratis_howto                  1 TiB               52 MiB

5. Create a file system from the pool

# stratis filesystem create stratis_howto fs_howto
# stratis filesystem list
Pool Name      Name      Used     Created            Device                               
stratis_howto  fs_howto  546 MiB  Nov 02 2018 07:55  /stratis/stratis_howto/fs_howto

6. Mount FS

# mount /stratis/stratis_howto/fs_howto /mnt

7. Add mount point to /etc/fstab using file system UUID

# blkid -p /stratis/stratis_howto/fs_howto
/stratis/stratis_howto/fs_howto: UUID="a38780e5-04e3-49da-8b95-2575d77e947c" TYPE="xfs" USAGE="filesystem"

# echo "UUID=a38780e5-04e3-49da-8b95-2575d77e947c /mnt xfs defaults 0 0" >> /etc/fstab

Debugging gobject reference count problems

Debugging gobject reference leaks can be difficult, very difficult according to the official documentation.  If you google this subject you will find numerous hits.  A tool called RefDbg was created to address this specific need.  It however appears to have lost effectiveness because (taken from the docs):

Beginning with glib 2.6 the default build does not allow functions to
be overridden using LD_PRELOAD (for performance reasons).  This is the
method that refdbg uses to intercept calls to the GObject reference
count related functions and therefore refdbg will not work.  The
solution is to build glib 2.6.x with the '--disable-visibility'
configure flag or use an older version of glib (<= 2.4.x).  Its often
helpful to build a debug version of glib anyways to get more useful
stack back traces.

I actually tried the tool and got the error “LD_PRELOAD function override not working. Need to
build glib with –disable-visibility?”
, so I kept looking.  Another post lead me to investigate using systemtap which seemed promising, so I looked into it further.  This approach eventually got me the information I needed to find and fix reference count problems.  I’m outlining what I did in hopes that it will be beneficial to others as well.  For the record, I used this approach on storaged, but it should work for any code base that uses gobjects.

The overall approach is:

  1. Identify what is leaking
  2. For each object that is leaking in step 1, collect the needed information to help figure out why

Seems simple…

Step 1, identification

Step 1 is pretty much straight out of my initial information gathered on this topic and using systemtap.  This is my version of the system tap file I used to figure out what was leaking.

global alive
probe gobject.object_new {
    alive[type]++
    printf ("CREATED:%s:%d:%p\n", type, tid(), object)
    print_ubacktrace_brief()
    print("\n")
}

probe gobject.object_finalize {
    alive[type]--
    printf ("DELETED:%s:%d:%p\n", type, tid(), object)
    print_ubacktrace_brief()
    print("\n")
}

probe end {
    printf ("Alive objects: \n")
    foreach (a in alive) {
        if (alive[a] > 0)
            printf ("%d\t%s\n", alive[a], a)
    }
}

To utilize this you need to run your application via systemtap, eg.

stap /home/tasleson/alive_stack.stp \
    -d /usr/lib64/libgio-2.0.so.0.4800.2 \
    -d /usr/lib64/libglib-2.0.so.0.4800.2 \
    -d /usr/lib64/libgudev-1.0.so \
    -d /usr/lib64/libc-2.23.so \
    -d /usr/lib64/libpthread-2.23.so \
    -d /usr/lib64/libpolkit-gobject-1.so \
    -d udisks/.libs/libudisks2.so \
    -d modules/lvm2/.libs/libudisks2_lvm2.so \
    -d modules/libudisks2_lsm.so \
    -d modules/libudisks2_btrfs.so \
    -d modules/libudisks2_iscsi.so \
    -d modules/libudisks2_bcache.so \
    -d modules/libudisks2_zram.so \
    -d src/.libs/lt-udisksd \
    -c "src/udisksd -r --uninstalled commandline --force-load-modules" -o /tmp/identify_leaks.txt

In this example I’m dumping the output to a file /tmp/identify_leaks.txt.  If you look at the output you will see something like:

...

CREATED:GParamObject:16451:0x7f028c005660
g_type_create_instance+0x20b
g_param_spec_internal+0x1e3
g_param_spec_object+0x72
g_socket_connection_class_intern_init+0xe7
g_type_class_ref+0xa37
g_type_class_ref+0x2e5
g_object_new_valist+0x4ac
g_object_new+0xf1
g_socket_client_connect+0x140
g_dbus_address_try_connect_one+0x1fa
g_dbus_address_get_stream_sync+0xf8
initable_init+0xe7
async_init_thread+0x2e
g_task_thread_pool_thread+0x5d
g_thread_pool_thread_proxy+0x9e
g_thread_proxy+0x55
start_thread+0xca
clone+0x6d

CREATED:GUnixConnection:16451:0x7f028c0056f0
g_type_create_instance+0x20b
g_object_new_internal+0x2cb
g_object_new_valist+0x44e
g_object_new+0xf1
g_socket_client_connect+0x140
g_dbus_address_try_connect_one+0x1fa
g_dbus_address_get_stream_sync+0xf8
initable_init+0xe7
async_init_thread+0x2e
g_task_thread_pool_thread+0x5d
g_thread_pool_thread_proxy+0x9e
g_thread_proxy+0x55
start_thread+0xca
clone+0x6d

DELETED:GSocketAddressAddressEnumerator:16451:0x13fc660
g_object_unref+0x1a0
g_socket_client_connect+0x375
g_dbus_address_try_connect_one+0x1fa
g_dbus_address_get_stream_sync+0xf8
initable_init+0xe7
async_init_thread+0x2e
g_task_thread_pool_thread+0x5d
g_thread_pool_thread_proxy+0x9e
g_thread_proxy+0x55
start_thread+0xca
clone+0x6d

DELETED:GSocketClient:16451:0x1406a20
g_object_unref+0x1a0
g_dbus_address_try_connect_one+0x211
g_dbus_address_get_stream_sync+0xf8
initable_init+0xe7
async_init_thread+0x2e
g_task_thread_pool_thread+0x5d
g_thread_pool_thread_proxy+0x9e
g_thread_proxy+0x55
start_thread+0xca
clone+0x6d

...

Alive objects: 
53      GParamBoolean
1       GTask
50      GParamObject
58      GParamString
6       GParamFlags
1       GDBusConnection
12      GParamEnum
7       GParamBoxed
1       GUnixSocketAddress
10      GParamUInt
12      GParamInt
1       GSocket
1       GUnixConnection
1       GDBusAuth
1       GSocketInputStream
1       GSocketOutputStream
1       GCancellable
1       GDBusMessage
7       GIOModule
1       GLocalVfs
127     GParamOverride
1       GInotifyFileMonitor
1       GUdevClient
1       UDisksLinuxProvider
1       UDisksObjectSkeleton
1       UDisksLinuxManager
1       UDisksLinuxManagerBTRFS
1       UDisksLinuxManagerZRAM
1       UDisksLinuxManagerLSM
1       UDisksLinuxManagerISCSIInitiator
1       UDisksLinuxManagerBcache
1       UDisksLinuxManagerLVM2
1556    GUdevDevice
1556    UDisksLinuxDevice
17      UDisksLinuxDriveObject
3       GParamVariant
16      GParamUInt64
17      UDisksLinuxDrive
3       GParamDouble
1       GParamInt64
17      UDisksLinuxDriveAta
20      UDisksLinuxBlockObject
20      UDisksLinuxBlock
2       UDisksLinuxPartitionTable
3       UDisksLinuxFilesystem
3       UDisksLinuxPartition
1       UDisksLinuxSwapspace
6       UDisksLinuxPhysicalVolume
107     UDisksLinuxLogicalVolumeObject
107     UDisksLinuxLogicalVolume

The end summary shows you a list of objects that are still present in memory when we exit the process.  In this particular case we can see that something looks wrong as we have 1556 instances of GUdevDevice and UDisksLinuxDevice which is unexpected.  At this point we have an idea of what’s leaking and where they were created and some destroyed, but not much else.  What’s really needed is a full detail of every place an object reference gets incremented and decremented.

Step 2, getting more detailed information

To debug a reference counting leak you ultimately need to identify:

  • Where an object was created
  • Each place a reference count was incremented and decremented

My first approach was to edit my above systemtap file and simply dump all of this, eg:

global alive
probe gobject.object_new {
    alive[type]++
    printf ("CREATED:%s tid=%d\n", type, tid())
    print_ubacktrace_brief()
    print("\n")
}

probe gobject.object_ref {
    printf("INC REF:%s\n", type)
    print_ubacktrace_brief()
    print("\n")
}

probe gobject.object_unref {
    printf("DEC REF:%s\n", type)
    print_ubacktrace_brief()
    print("\n")
}

probe gobject.object_finalize {
    alive[type]--
    printf ("DELETED:%s tid=%d\n", type, tid())
    print_ubacktrace_brief()
    print("\n")
}
probe end {
    printf ("Alive objects: \n")
    foreach (a in alive) {
        if (alive[a] > 0)
            printf ("%d\t%s\n", alive[a], a)
    }
}

But this failed as this dumps far too much information causing errors to be thrown by sytemtap. So instead I changed it to be specific:

global alive
global leaking = "GUdevDevice"

probe gobject.object_new {
    if (type == leaking) {
        alive[type]++
        printf ("CREATED:%s:%d:%p\n", type, tid(), object)
        print_ubacktrace_brief()
        print("\n")
    }
}

probe gobject.object_finalize {
    if (type == leaking) {
        alive[type]--
        printf ("DELETED:%s:%d:%p\n", type, tid(), object)
        print_ubacktrace_brief()
        print("\n")
    }
}

probe gobject.object_ref {
    if (type == leaking) {
        printf ("INC:%s:%d:%p\n", type, tid(), object)
	print_ubacktrace_brief()
        print("\n")
    }
}

probe gobject.object_unref {
    if (type == leaking) {
        printf ("DEC:%s:%d:%p\n", type, tid(), object)
	print_ubacktrace_brief()
        print("\n")
    }
}

probe end {
    printf ("Alive objects: \n")
    foreach (a in alive) {
        if (alive[a] > 0)
            printf ("%d\t%s\n", alive[a], a)
    }
}

However, even with all this data things weren’t totally clear, there was tons of data to look at. I spent time looking at back traces and eventually I realized that some leaking objects had multiple different paths in which they were created. How could I focus my attention to the correct ones? To help with this I wrote a python script that took the output of the first systemtap and summarized it:

#!/usr/bin/env python

import sys
import md5

# Return a list of tuples (symbol, addr) for the stack trace
def process_stack(s):
    rc = []
    sl = s.readline()

    while sl:
        if len(sl):
            sl = sl.rstrip()
            if '+' in sl:
                (sym, addr) = sl.split('+')
                rc.append((sym, addr))
            else:
                return rc

        sl = s.readline()
    return rc


def gen_report(stap_file):

    gobjects = {}

    with open(stap_file, 'r') as f:

        line = f.readline()
        while line:
            if len(line):
                line = line.rstrip()
                # Skip the summary to the end
                if line == 'Alive objects:':
                    return gobjects

                if ':' in line:
                    try:
                        # We have a CREATE|DELETE
                        (op, obj_type, tid, addr) = line.split(':')

                        st = process_stack(f)

                        if op == 'CREATED':
                            key = "%s:%s" % (obj_type, addr)
                            assert key not in gobjects
                            gobjects[key] = (obj_type, tid, addr, st)

                        else:
                            # Object deleted
                            key = "%s:%s" % (obj_type, addr)
                            assert key in gobjects
                            if key in gobjects:
                                del gobjects[key]

                    except ValueError as e:
                        print('line in error= "%s"' % line)
                        raise e

            line = f.readline()
    return gobjects


def signature(st):
    m = md5.new()
    for i in st:
        m.update(i[0])
    return m.hexdigest()


def leak_count(h):
    count = 0
    for v in h.values():
        count += v['count']
    return count


def summarize(s):
    # We will store a hash of hashes, where first level is object type and
    # second is signature of stack trace
    t = {}

    for r in s.values():
        (obj_type, tid, addr, st) = r
        st_sig = signature(st)

        if obj_type in t:
            if st_sig in t[obj_type]:
                # We've seen this st before, lets increment count
                t[obj_type][st_sig]['count'] += 1
            else:
                t[obj_type][st_sig] = dict(count=1, meta=st)
        else:
            t[obj_type] = dict()
            t[obj_type][st_sig] = dict(count=1, meta=st)

    # Lets build a list of keys sorted by who leaks the most
    items = [(leak_count(v), k) for k, v in t.items()]
    items.sort()
    items.reverse()


    for i in items:
        # Lets dump what we have
        v = t[i[1]]
        for values in v.values():
            print('LEAK %s %d times (%d total diff. ways)' % (i[1], values['count'], len(v)))
            for s in values['meta']:
                print("\t%s:%s" % (s[0], s[1]))

    items = [(len(v), k) for k, v in t.items()]
    items.sort()
    items.reverse()
    print('Number of unique stack traces for a given type that leaks')
    for i in items:
        print('%d: %s' % (i[0], i[1]))


if __name__ == '__main__':
    result = gen_report('/tmp/identify_leaks.txt')
    summarize(result)

The python script ultimately gives us the ability to determine how many different code paths exist for the creation of an object and which path leaks the most. Some sample output:

LEAK UDisksLinuxDevice 1556 times (1 total diff. ways)
        g_type_create_instance:0x20b
        g_object_new_internal:0x2cb
        g_object_newv:0x1dd
        g_object_new:0x104
LEAK GUdevDevice 40 times (2 total diff. ways)
        g_type_create_instance:0x20b
        g_object_new_internal:0x2cb
        g_object_newv:0x1dd
        g_object_new:0x104
        _g_udev_device_new:0x1b
        g_udev_client_query_by_subsystem:0xce
LEAK GUdevDevice 1516 times (2 total diff. ways)
        g_type_create_instance:0x20b
        g_object_new_internal:0x2cb
        g_object_newv:0x1dd
        g_object_new:0x104
        _g_udev_device_new:0x1b
        monitor_event:0x29
        g_main_context_dispatch:0x15a
        g_main_context_iterate.isra.29:0x1f0
        g_main_loop_run:0xc2
LEAK GParamOverride 1 times (14 total diff. ways)
        g_type_create_instance:0x20b
        g_param_spec_internal:0x1e3
        g_param_spec_override:0x81
        g_object_class_override_property:0xde
        udisks_partition_table_override_properties:0x2d
        udisks_partition_table_skeleton_class_init:0x6e
        udisks_partition_table_skeleton_class_intern_init:0x48
        g_type_class_ref:0xa37
        g_type_class_ref:0x2e5
        g_object_newv:0x278
        g_object_new:0x104

...

Number of unique stack traces for a given type that leaks
23: GParamObject
18: GParamString
18: GParamBoolean
14: GParamOverride
7: GParamUInt64
7: GParamEnum
7: GParamBoxed
6: GParamInt
5: GParamUInt
5: GParamFlags
3: GParamVariant
2: GUdevDevice
2: GParamDouble
1: UDisksObjectSkeleton
1: UDisksLinuxSwapspace
...

We suspected that the biggest leaking object was GUdevDevice and based on this information we can see that there are two distinct paths where an object is created which leak, with one of them being the vast majority (lucky for me).

LEAK GUdevDevice 40 times (2 total diff. ways)
        g_type_create_instance:0x20b
        g_object_new_internal:0x2cb
        g_object_newv:0x1dd
        g_object_new:0x104
        _g_udev_device_new:0x1b
        g_udev_client_query_by_subsystem:0xce
LEAK GUdevDevice 1516 times (2 total diff. ways)
        g_type_create_instance:0x20b
        g_object_new_internal:0x2cb
        g_object_newv:0x1dd
        g_object_new:0x104
        _g_udev_device_new:0x1b
        monitor_event:0x29
        g_main_context_dispatch:0x15a
        g_main_context_iterate.isra.29:0x1f0
        g_main_loop_run:0xc2

It’s interesting how many different code paths exist for some objects which are leaking, 23 different code paths, oh fun!  With this new information we can look at the output from the second specific systemtap output, find an object instance that was created which matches the same code path we selected from the summary and then proceed to examine all the places where the object reference count was incremented and decremented, an example (back traces omitted for brevity):

CREATED:GUdevDevice:11766:0x7f82cc005470
INC:GUdevDevice:11766:0x7f82cc005470
INC:GUdevDevice:11766:0x7f82cc005470
INC:GUdevDevice:11766:0x7f82cc005470
DEC:GUdevDevice:11766:0x7f82cc005470
INC:GUdevDevice:11766:0x7f82cc005470
DEC:GUdevDevice:11766:0x7f82cc005470
INC:GUdevDevice:11766:0x7f82cc005470
DEC:GUdevDevice:11766:0x7f82cc005470
INC:GUdevDevice:11766:0x7f82cc005470
DEC:GUdevDevice:11766:0x7f82cc005470
INC:GUdevDevice:11766:0x7f82cc005470
INC:GUdevDevice:11766:0x7f82cc005470
INC:GUdevDevice:11766:0x7f82cc005470
DEC:GUdevDevice:11766:0x7f82cc005470
DEC:GUdevDevice:11766:0x7f82cc005470
DEC:GUdevDevice:11766:0x7f82cc005470

At this point I wish I could share some magical way to identify the exact places where we are missing a reference count decrement and fix the code, but unfortunately you just end up looking at a lot of code and evaluating where the increment occurred, but a decrement is missing.  Sometimes it will be easy because the incremented lifespan is short, but in others the span is large and across thread boundaries.  This is where it would be good for developers to comment on the intended life cycle of the object in question when the code is being written.

Final thoughts

Fixing gobject reference count leaks is indeed very difficult.  Even more so when you are working on code that others developed.  If you find yourself writing reference counting code that has non-trivial lifespans, please do others a favor and document the intended lifespan.  If you can’t document it, it’s a pretty good indication that you don’t know or understand for sure and likely the code will be plagued with leaks and/or segmentation faults during it’s lifetime.

 

Security considerations with github continuous integration

Continuous integration (CI) support in github is a very useful addition.  Not only can you utilize existing services like Travis CI, you can utilize the github API and roll your own, which is exactly what we did for libStorageMgmt.  LibStorageMgmt needs to run tests for hardware specific plugins, so we created our own tooling to hook up github and our hardware which is geographically located across the US.  However, shortly after getting all this in place and working it became pretty obvious that we provided a nice attack vector…

All it takes is a malicious user to fork our repository, add some malicious code and submit a pull request and *boom*!  We just executed whatever code they wanted on each of our CI test nodes.  The worst part is some of our tests require the node that is running the tests to run with root authority, so total box ownership is assured.  The test nodes are stripped down boxes with nothing of value on them data wise, but just providing hardware and network resources in which to execute arbitrary code is indeed valuable.

Our solution was to limit automated CI testing to only those we trust with white listing.  I don’t envy Travis CI and all the security issues that have to consider.  They are basically providing a service which allows virtually anyone to execute arbitrary code on their boxes.

None of this is terribly new, just potentially new to those writing their own CI service.

 

D-bus signaling performance

While working on lvm-dubstep the question was posed if D-bus could handle the number of changes that could happen in a short period of time, especially PropertiesChanged signals when a large number of logical volumes or physical volumes were present on the system (eg. 120K PVs and 10K+ LVs).  To test this idea I put together a simple server and client which simply tries to send an arbitrary number of signals as fast as it possibly can.  The number settled upon was 10K because during early testing I was running into time-out exceptions when trying to send more in a row.  Initial testing was done using the dbus-python library and even though numbers seemed sufficient, people asked about sd-bus and sd-bus utilizing kdbus, so the experiment was expanded to include these as well.  Source code for the testing is available here.

 

Test configuration

  • One client listening for signals.
  • Python tests were run on a Fedora 22 VM.
  • Sd-bus was run on a similarly configured VM running Fedora 23 alpha utilizing systemd 222-2.  Kdbus was built as a kernel module from the latest systemd/kdbus repo.  F23 kernel version: 4.2.0-0.rc8.git0.1.fc23.x86_64.
  • I tried running all the tests on the F23 VM, but the python tests were failing horribly with journal entries: Sep 01 10:53:14 sdbus systemd-bus-proxyd[663]: Dropped messages due to queue overflow of local peer (pid: 2615 uid: 0)

 

The results:

  • Python utilizing dbus-python library was able to send and receive 10K signals in a row, varying payload size from 32-128K without any issues.  As I mentioned before if I try to send more than that in a row, especially at larger payload sizes I do get time-outs on the send.
  • sd-bus without using kdbus I was only able to send up to 512 byte payloads before the server would error out with: Failed to emit signal: No buffer space available.
  • sd-bus with kdbus the test completes, seemingly without error, but the number of signals received is not as expected and appears to vary with payload size.

Messages/second is the total number of messages divided by the total time to receive them.

l_dbus_msg_sec

 

MiB/Second is the messages per second multiplied by the payload size.

l_dbus_mib_sec

 

Average time delta is the time difference from when the signal was placed on the bus until is was read by the signal handler.  This shows quite a bit of buffering for the python test case at larger payload sizes.

l_dbus_time_delta

 

Percentage of signals that were received by the signal handler.  As you can see once the payload > 2048, kdbus appears to silently discard signals away as nothing appeared in kernel output and the return codes were all good in user space.

l_dbus_percent_received

Conclusions

  • The C implementation using sd-bus without kdbus performs slightly poorer than the dbus-python which surprised me.
  • kdbus is by far the best performing with payloads < 2048 bytes
  • Signals are by definition sent without a response.  Having a reliable way for the sender to know that a signal was not sent seems pretty important, so kdbus seems to have a bug in that no error was being returned to the transmitter or that there is a bug in my test code.
  • Code likely has bugs and/or does something sub optimal, please do a pull request and I will incorporate the changes and re-run the tests.

libtool library versioning ( -version-info ‘current[:revision[:age]]’ )

The documentation on the gnu libtool manual states:

The following explanation may help to understand the above rules a bit better: consider that there are three possible kinds of reactions from users of your library to changes in a shared library:

  1. Programs using the previous version may use the new version as drop-in replacement, and programs using the new version can also work with the previous one. In other words, no recompiling nor relinking is needed. In this case, bump revision only, don’t touch current nor age.
  2. Programs using the previous version may use the new version as drop-in replacement, but programs using the new version may use APIs not present in the previous one. In other words, a program linking against the new version may fail with “unresolved symbols” if linking against the old version at runtime: set revision to 0, bump current and age.
  3. Programs may need to be changed, recompiled, relinked in order to use the new version. Bump current, set revision and age to 0.

This was confusing to me until I played around with the -version-info value and looked at the library output on my linux development system.


-version-info 0:0:0 => libfoo.so.0.0.0
-version-info 1:0:1 => libfoo.so.0.1.0

-version-info 1:0:0 => libfoo.so.1.0.0
-version-info 2:0:1 => libfoo.so.1.1.0
-version-info 2.1.1 => libfoo.so.1.1.1
-version-info 3.0.2 => libfoo.so.1.2.0

-version-info 4:0:0 => libfoo.so.4.0.0

The current:revision:age is an abstraction which allows libtool to do the correct thing for the platform you are building and deploying on. This wasn’t immediately evident when reading the documentation.

 

Using google authenticator with OpenBSD SSH logins

For an introduction on two factor authentication please see: http://en.wikipedia.org/wiki/Two-step_verification


NOTE: Make sure you leave a terminal with root access if you are using a remote system until you have tested that you can indeed authenticate to it!

Steps tested on OpenBSD 5.4, used tools on EL6 client to generate QR code. I’m mainly documenting this here so I can remember how to do this again.

1. Install: login_oauth-.tgz eg.

# pkg_add -v http://mirror.planetunix.net/pub/OpenBSD/5.4/packages/i386/login_oath-0.8p1.tgz

Make sure to look over the readme: /usr/local/share/doc/pkg-readmes/login_oath-version

2. Add a new login class by editing /etc/login.conf and add change user(s) to use it.

# The TOTPPW login class requires both TOTP and passwd; the user must
# supply the password in the format OTP/password, e.g. 816721/axlotl.

totppw:\
        :auth=-totp-and-pwd:\
        :tc=default:

Change user(s) login class eg.

# usermod -L totppw username

3. Generate a random key in the users home directory

$ openssl rand -hex 20 > ~/.totp-key

Note: Make sure your home directory and this file are not world readable! Otherwise you will be prevented from logging in.

4. Convert hex string key to base32

Documents show perl, I have supplied some code in python.

import binascii
import base64
import sys

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print 'Syntax: %s ' % ( sys.argv[0])
        sys.exit(1)

    print base64.b32encode(binascii.unhexlify(sys.argv[1]))

5. Generate QR code so you don’t have to type in a big random sequence on your smart phone. Free web based ones exist, but only use if you trust.

$ qrencode -o ~/.totp-key.png 
"otpauth://totp/?secret=BASE 32 SECRET&issuer=Your name, etc."

More info: http://code.google.com/p/google-authenticator/wiki/KeyUriFormat

6. Using google authenticator create a new software key using your created QR code. If you don’t want to create a QR code, then create a new entry, set to time based and enter the 64 characters. I’m guessing it may take a few tries to get it correct.

7. Test that google authenticator and oathtool –totp `cat ~/.totp-key` match. If they don’t make sure both the phone and the computer dates and time match.

8. Using a separate terminal, ssh to remote system with just the password, this should fail. Then try using OTP/password for your password and you should get authenticated.

Buffer bloat mitigation with OpenBSD pf

For an introduction to buffer bloat read more here http://en.wikipedia.org/wiki/Bufferbloat .

My home network utilizes OpenBSD and the built in packet filter (pf). I use cable for broadband internet and found that if I tried to upload a large file my internet connecting became very unable with high amounts of latency. After utilizing tools such as http://netalyzr.icsi.berkeley.edu/blog/ it became obvious that I was suffering from buffer bloat.

After doing some searching I came across using altq support in pf to try some configuration changes to reduce the buffer bloat in my configuration.

I added a queue for my external network card with the following:

altq on $ext_if bandwidth 1Mb hfsc queue { bb }
queue bb bandwidth 100% qlimit 9 hfsc ( default )

and the corresponding rule to tag outgoing traffic for that interface to this queue

pass out on $ext_if keep state queue( bb )

My home connection is advertised as 5Mb down, 1Mb up. In testing I get about 1.1Mb up so I setup my outgoing queue to limit outgoing to 1Mb. A typical setting would be 97% of maximum. One of the most important values in the queue setup is the number of buffers. Mine is currently set at 9. This is how I determined what the value should be.

Maximum upstream bandwidth in packets is upstream bandwidth in bytes / size of a packet. In my case 1000000/8 = bandwidth in bytes / 1460 (size of packet) which yields 85 packets a second. So if I set my buffer size to 85 I should have about 1 second of latency. In my case I like my latency low so I divided by 10 to try and get a 100ms latency under full upstream use which is 8.5, which I rounded up to 9.

So how does it work?

Average pings to slashdot.org in milliseconds

Idle connection:                 23.8   ( 0% packet loss)
Maxed upstream use:              95.1   ( 0% packet loss)
Maxed upstream without altq:   2083.4   (10% packet loss)

Quite the improvement!

Thanks to the information on https://calomel.org/pf_hfsc.html for helpful tips.