Blog entries

  • Mini-DebConf Paris 2012

    2012/11/29 by Julien Cristau

    Last week-end, I attended the mini-DebConf organized at EPITA (near Paris) by the French Debian association and sponsored by Logilab.

    http://www.logilab.org/file/112649?vid=download

    The event was a great success, with a rather large number of attendees, including people coming from abroad such as Debian kernel maintainers Ben Hutchings and Maximilian Attems, who talked about their work with Linux.

    Among the other speakers were Loïc Dachary about OpenStack and its packaging in Debian, and Josselin Mouette about his work deploying Debian/GNOME desktops in a large enterprise environment at EDF R&D.

    On my part I gave a talk on Saturday about Debian's release team, and the current state of the wheezy (to-be Debian 7.0) release.

    On Sunday I presented together with Vladimir Daric the work we did to migrate a computation cluster from Red Hat to Debian. Attendees had quite a few questions about our use of ZFS on Linux for storage, and salt for configuration management and deployment.

    Slides for the talks are available on the mini-DebConf web page (wheezy state, migration to debian cluster also viewable on slideshare), and videos will soon be on http://video.debian.net/.

    Now looking forward to next summer's DebConf13 in Switzerland, and hopefully next year's edition of the Paris event.


  • Openstack, Wheezy and ZFS on Linux

    2012/12/19 by David Douard

    Openstack, Wheezy and ZFS on Linux

    A while ago, I started the install of an OpenStack cluster at Logilab, so our developers can play easily with any kind of environment. We are planning to improve our Apycot automatic testing platform so it can use "elastic power". And so on.

    http://www.openstack.org/themes/openstack/images/open-stack-cloud-computing-logo-2.png

    I first tried a Ubuntu Precise based setup, since at that time, Debian packages were not really usable. The setup never reached a point where it could be relased as production ready, due to the fact I tried a too complex and bleeding edge configuration (involving Quantum, openvswitch, sheepdog)...

    Meanwhile, we went really short of storage capacity. For now, it mainly consists in hard drives distributed in our 19" Dell racks (generally with hardware RAID controllers). So I recently purchased a low-cost storage bay (SuperMicro SC937 with a 6Gb/s JBOD-only HBA) with 18 spinning hard drives and 4 SSDs. This storage bay being driven by ZFS on Linux (tip: the SSD-stored ZIL is a requirement to get decent performances). This storage setup is still under test for now.

    http://zfsonlinux.org/images/zfs-linux.png

    I also went to the last Mini-DebConf in Paris, where Loic Dachary presented the status of the OpenStack packaging effort in Debian. This gave me the will to give a new try to OpenStack using Wheezy and a bit simpler setup. But I could not consider not to use my new ZFS-based storage as a nova volume provider. It is not available for now in OpenStack (there is a backend for Solaris, but not for ZFS on Linux). However, this is Python and in fact, the current ISCSIDriver backend needs very little to make it work with zfs instead of lvm as "elastics" block-volume provider and manager.

    So, I wrote a custom nova volume driver to handle this. As I don't want the nova-volume daemon to run on my ZFS SAN, I wrote this backend mixing the SanISCSIDriver (which manages the storage system via SSH) and the standard ISCSIDriver (which uses standard Linux isci target tools). I'm not very fond of the API of the VolumeDriver (especially the fact that the ISCSIDriver is responsible for 2 roles: managing block-level volumes and exporting block-level volumes). This small design flaw (IMHO) is the reason I had to duplicate some code (not much but...) to implement my ZFSonLinuxISCSIDriver...

    So here is the setup I made:

    Infrastructure

    My OpenStack Essex "cluster" consists for now in:

    • one control node, running in a "normal" libvirt-controlled virtual machine; it is a Wheezy that runs:
      • nova-api
      • nova-cert
      • nova-network
      • nova-scheduler
      • nova-volume
      • glance
      • postgresql
      • OpenStack dashboard
    • one computing node (Dell R310, Xeon X3480, 32G, Wheezy), which runs:
      • nova-api
      • nova-network
      • nova-compute
    • ZFS-on-Linux SAN (3x raidz1 poools made of 6 1T drives, 2x (mirrored) 32G SLC SDDs, 2x 120G MLC SSDs for cache); for now, the storage is exported to the SAN via one 1G ethernet link.

    OpensStack Essex setup

    I mainly followed the Debian HOWTO to setup my private cloud. I mainly tuned the network settings to match my environement (and the fact my control node lives in a VM, with VLAN stuff handled by the host).

    I easily got a working setup (I must admit that I think my previous experiment with OpenStack helped a lot when dealing with custom configurations... and vocabulary; I'm not sure I would have succeded "easily" following the HOWTO, but hey, it is a functionnal HOWTO, meaning if you do not follow the instructions because you want special tunings, don't blame the HOWTO).

    Compared to the HOWTO, my nova.conf looks like (as of today):

    [DEFAULT]
    logdir=/var/log/nova
    state_path=/var/lib/nova
    lock_path=/var/lock/nova
    root_helper=sudo nova-rootwrap
    auth_strategy=keystone
    dhcpbridge_flagfile=/etc/nova/nova.conf
    dhcpbridge=/usr/bin/nova-dhcpbridge
    sql_connection=postgresql://novacommon:XXX@control.openstack.logilab.fr/nova
    
    ##  Network config
    # A nova-network on each compute node
    multi_host=true
    # VLan manger
    network_manager=nova.network.manager.VlanManager
    vlan_interface=eth1
    # My ip
    my-ip=172.17.10.2
    public_interface=eth0
    # Dmz & metadata things
    dmz_cidr=169.254.169.254/32
    ec2_dmz_host=169.254.169.254
    metadata_host=169.254.169.254
    
    ## More general things
    # The RabbitMQ host
    rabbit_host=control.openstack.logilab.fr
    
    ## Glance
    image_service=nova.image.glance.GlanceImageService
    glance_api_servers=control.openstack.logilab.fr:9292
    use-syslog=true
    ec2_host=control.openstack.logilab.fr
    
    novncproxy_base_url=http://control.openstack.logilab.fr:6080/vnc_auto.html
    vncserver_listen=0.0.0.0
    vncserver_proxyclient_address=127.0.0.1
    

    Volume

    I had a bit more work to do to make nova-volume work. First, I got hit by this nasty bug #695791 which is trivial to fix... when you know how to fix it (I noticed the bug report after I fixed it by myself).

    Then, as I wanted the volumes to be stored and exported by my shiny new ZFS-on-Linux setup, I had to write my own volume driver, which was quite easy, since it is Python, and the logic to implement was already provided by the ISCSIDriver class on the one hand, and by the SanISCSIDrvier on the other hand. So I ended with this firt implementation. This file should be copied to nova volumes package directory (nova/volume/zol.py):

    # vim: tabstop=4 shiftwidth=4 softtabstop=4
    
    # Copyright 2010 United States Government as represented by the
    # Administrator of the National Aeronautics and Space Administration.
    # Copyright 2011 Justin Santa Barbara
    # Copyright 2012 David DOUARD, LOGILAB S.A.
    # All Rights Reserved.
    #
    #    Licensed under the Apache License, Version 2.0 (the "License"); you may
    #    not use this file except in compliance with the License. You may obtain
    #    a copy of the License at
    #
    #         http://www.apache.org/licenses/LICENSE-2.0
    #
    #    Unless required by applicable law or agreed to in writing, software
    #    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
    #    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
    #    License for the specific language governing permissions and limitations
    #    under the License.
    """
    Driver for ZFS-on-Linux-stored volumes.
    
    This is mainly a custom version of the ISCSIDriver that uses ZFS as
    volume provider, generally accessed over SSH.
    """
    
    import os
    
    from nova import exception
    from nova import flags
    from nova import utils
    from nova import log as logging
    from nova.openstack.common import cfg
    from nova.volume.driver import _iscsi_location
    from nova.volume import iscsi
    from nova.volume.san import SanISCSIDriver
    
    
    LOG = logging.getLogger(__name__)
    
    san_opts = [
        cfg.StrOpt('san_zfs_command',
                   default='/sbin/zfs',
                   help='The ZFS command.'),
        ]
    
    FLAGS = flags.FLAGS
    FLAGS.register_opts(san_opts)
    
    
    class ZFSonLinuxISCSIDriver(SanISCSIDriver):
        """Executes commands relating to ZFS-on-Linux-hosted ISCSI volumes.
    
        Basic setup for a ZoL iSCSI server:
    
        XXX
    
        Note that current implementation of ZFS on Linux does not handle:
    
          zfs allow/unallow
    
        For now, needs to have root access to the ZFS host. The best is to
        use a ssh key with ssh authorized_keys restriction mechanisms to
        limit root access.
    
        Make sure you can login using san_login & san_password/san_private_key
        """
        ZFSCMD = FLAGS.san_zfs_command
    
        _local_execute = utils.execute
    
        def _getrl(self):
            return self._runlocal
        def _setrl(self, v):
            if isinstance(v, basestring):
                v = v.lower() in ('true', 't', '1', 'y', 'yes')
            self._runlocal = v
        run_local = property(_getrl, _setrl)
    
        def __init__(self):
            super(ZFSonLinuxISCSIDriver, self).__init__()
            self.tgtadm.set_execute(self._execute)
            LOG.info("run local = %s (%s)" % (self.run_local, FLAGS.san_is_local))
    
        def set_execute(self, execute):
            LOG.debug("override local execute cmd with %s (%s)" %
                      (repr(execute), execute.__module__))
            self._local_execute = execute
    
        def _execute(self, *cmd, **kwargs):
            if self.run_local:
                LOG.debug("LOCAL execute cmd %s (%s)" % (cmd, kwargs))
                return self._local_execute(*cmd, **kwargs)
            else:
                LOG.debug("SSH execute cmd %s (%s)" % (cmd, kwargs))
                check_exit_code = kwargs.pop('check_exit_code', None)
                command = ' '.join(cmd)
                return self._run_ssh(command, check_exit_code)
    
        def _create_volume(self, volume_name, sizestr):
            zfs_poolname = self._build_zfs_poolname(volume_name)
    
            # Create a zfs volume
            cmd = [self.ZFSCMD, 'create']
            if FLAGS.san_thin_provision:
                cmd.append('-s')
            cmd.extend(['-V', sizestr])
            cmd.append(zfs_poolname)
            self._execute(*cmd)
    
        def _volume_not_present(self, volume_name):
            zfs_poolname = self._build_zfs_poolname(volume_name)
            try:
                out, err = self._execute(self.ZFSCMD, 'list', '-H', zfs_poolname)
                if out.startswith(zfs_poolname):
                    return False
            except Exception as e:
                # If the volume isn't present
                return True
            return False
    
        def create_volume_from_snapshot(self, volume, snapshot):
            """Creates a volume from a snapshot."""
            zfs_snap = self._build_zfs_poolname(snapshot['name'])
            zfs_vol = self._build_zfs_poolname(snapshot['name'])
            self._execute(self.ZFSCMD, 'clone', zfs_snap, zfs_vol)
            self._execute(self.ZFSCMD, 'promote', zfs_vol)
    
        def delete_volume(self, volume):
            """Deletes a volume."""
            if self._volume_not_present(volume['name']):
                # If the volume isn't present, then don't attempt to delete
                return True
            zfs_poolname = self._build_zfs_poolname(volume['name'])
            self._execute(self.ZFSCMD, 'destroy', zfs_poolname)
    
        def create_export(self, context, volume):
            """Creates an export for a logical volume."""
            self._ensure_iscsi_targets(context, volume['host'])
            iscsi_target = self.db.volume_allocate_iscsi_target(context,
                                                                volume['id'],
                                                          volume['host'])
            iscsi_name = "%s%s" % (FLAGS.iscsi_target_prefix, volume['name'])
            volume_path = self.local_path(volume)
    
            # XXX (ddouard) this code is not robust: does not check for
            # existing iscsi targets on the host (ie. not created by
            # nova), but fixing it require a deep refactoring of the iscsi
            # handling code (which is what have been done in cinder)
            self.tgtadm.new_target(iscsi_name, iscsi_target)
            self.tgtadm.new_logicalunit(iscsi_target, 0, volume_path)
    
            if FLAGS.iscsi_helper == 'tgtadm':
                lun = 1
            else:
                lun = 0
            if self.run_local:
                iscsi_ip_address = FLAGS.iscsi_ip_address
            else:
                iscsi_ip_address = FLAGS.san_ip
            return {'provider_location': _iscsi_location(
                    iscsi_ip_address, iscsi_target, iscsi_name, lun)}
    
        def remove_export(self, context, volume):
            """Removes an export for a logical volume."""
            try:
                iscsi_target = self.db.volume_get_iscsi_target_num(context,
                                                               volume['id'])
            except exception.NotFound:
                LOG.info(_("Skipping remove_export. No iscsi_target " +
                           "provisioned for volume: %d"), volume['id'])
                return
    
            try:
                # ietadm show will exit with an error
                # this export has already been removed
                self.tgtadm.show_target(iscsi_target)
            except Exception as e:
                LOG.info(_("Skipping remove_export. No iscsi_target " +
                           "is presently exported for volume: %d"), volume['id'])
                return
    
            self.tgtadm.delete_logicalunit(iscsi_target, 0)
            self.tgtadm.delete_target(iscsi_target)
    
        def check_for_export(self, context, volume_id):
            """Make sure volume is exported."""
            tid = self.db.volume_get_iscsi_target_num(context, volume_id)
            try:
                self.tgtadm.show_target(tid)
            except exception.ProcessExecutionError, e:
                # Instances remount read-only in this case.
                # /etc/init.d/iscsitarget restart and rebooting nova-volume
                # is better since ensure_export() works at boot time.
                LOG.error(_("Cannot confirm exported volume "
                            "id:%(volume_id)s.") % locals())
                raise
    
        def local_path(self, volume):
            zfs_poolname = self._build_zfs_poolname(volume['name'])
            zvoldev = '/dev/zvol/%s' % zfs_poolname
            return zvoldev
    
        def _build_zfs_poolname(self, volume_name):
            zfs_poolname = '%s%s' % (FLAGS.san_zfs_volume_base, volume_name)
            return zfs_poolname
    

    To configure my nova-volume instance (which runs on the control node, since it's only a manager), I added these to my nova.conf file:

    # nove-volume config
    volume_driver=nova.volume.zol.ZFSonLinuxISCSIDriver
    iscsi_ip_address=172.17.1.7
    iscsi_helper=tgtadm
    san_thin_provision=false
    san_ip=172.17.1.7
    san_private_key=/etc/nova/sankey
    san_login=root
    san_zfs_volume_base=data/openstack/volume/
    san_is_local=false
    verbose=true
    

    Note that the private key (/etc/nova/sankey here) is stored in clear and that it must be readable by the nova user.

    This key being stored in clear and giving root acces to my ZFS host, I have limited a bit this root access by using a custom command wrapper in the .ssh/authorized_keys file.

    Something like (naive implementation):

    [root@zfshost ~]$ cat /root/zfswrapper
    #!/bin/sh
    CMD=`echo $SSH_ORIGINAL_COMMAND | awk '{print $1}'`
    if [ "$CMD" != "/sbin/zfs" && "$CMD" != "tgtadm" ]; then
      echo "Can do only zfs/tgtadm stuff here"
      exit 1
    fi
    
    echo "[`date`] $SSH_ORIGINAL_COMMAND" >> .zfsopenstack.log
    exec $SSH_ORIGINAL_COMMAND
    

    Using this in root's .ssh/authorized_keys file:

    [root@zfshost ~]$ cat /root/.ssh/authorized_keys | grep control
    from="control.openstack.logilab.fr",no-pty,no-port-forwarding,no-X11-forwarding, \
          no-agent-forwarding,command="/root/zfswrapper" ssh-rsa AAAA[...] root@control
    

    I had to set the iscsi_ip_address (the ip address of the ZFS host), but I think this is a result of something mistakenly implemented in my ZFSonLinux driver.

    Using this config, I can boot an image, create a volume on my ZFS storage, and attach it to the running image.

    I have to test things like snapshot, (live?) migration and so. This is a very first draft implementation which needs to be refined, improved and tested.

    What's next

    Besides the fact that it needs more tests, I plan to use salt for my OpenStack deployment (first to add more compute nodes in my cluster), and on the other side, I'd like to try the salt-cloud so I have a bunch of Debian images that "just work" (without the need of porting the cloud-init Ubuntu package).

    On the side of my zol driver, I need to port it to Cinder, but I do not have a Folsom install to test it...