Blog entries

Setup your project with cloudenvy and OpenStack

2013/10/03 by Arthur Lutz

One nice way of having a reproducible development or test environment is to "program" a virtual machine to do the job. If you have a powerful machine at hand you might use Vagrant in combination with VirtualBox. But if you have an OpenStack setup at hand (which is our case), you might want to setup and destroy your virtual machines on such a private cloud (or public cloud if you want or can). Sure, Vagrant has some plugins that should add OpenStack as a provider, but, here at Logilab, we have a clear preference for python over ruby. So this is where cloudenvy comes into play.

http://www.openstack.org/themes/openstack/images/open-stack-cloud-computing-logo-2.png

Cloudenvy is written in python and with some simple YAML configuration can help you setup and provision some virtual machines that contain your tests or your development environment.

http://www.python.org/images/python-logo.gif

Setup your authentication in ~/.cloudenvy.yml :

cloudenvy:
  clouds:
    cloud01:
      os_username: username
      os_password: password
      os_tenant_name: tenant_name
      os_auth_url: http://keystone.example.com:5000/v2.0/

Then create an Envyfile.yml at the root of your project

project_config:
  name: foo
  image: debian-wheezy-x64

  # Optional
  #remote_user: ec2-user
  #flavor_name: m1.small
  #auto_provision: False
  #provision_scripts:
    #- provision_script.sh
  #files:
    # files copied from your host to the VM
    #local_file : destination

Now simply type envy up. Cloudenvy does the rest. It "simply" creates your machine, copies the files, runs your provision script and gives you it's IP address. You can then run envy ssh if you don't want to be bothered with IP addresses and such nonsense (forget about copy and paste from the OpenStack web interface, or your nova show commands).

Little added bonus : you know your machine will run a web server on port 8080 at some point, set it up in your environment by defining in the same Envyfile.yml your access rules

sec_groups: [
    'tcp, 22, 22, 0.0.0.0/0',
    'tcp, 80, 80, 0.0.0.0/0',
    'tcp, 8080, 8080, 0.0.0.0/0',
  ]

As you might know (or I'll just recommend it), you should be able to scratch and restart your environment without loosing anything, so once in a while you'll just do envy destroy to do so. You might want to have multiples VM with the same specs, then go for envy up -n second-machine.

Only downside right now : cloudenvy isn't packaged for debian (which is usually a prerequisite for the tools we use), but let's hope it gets some packaging soon (or maybe we'll end up doing it).

Don't forget to include this configuration in your project's version control so that a colleague starting on the project can just type envy up and have a working setup.

In the same order of ideas, we've been trying out salt-cloud <https://github.com/saltstack/salt-cloud> because provisioning machines with SaltStack is the way forward. A blog about this is next.


Notes suite au 4ème rendez-vous OpenStack à Paris

2013/06/27 by Paul Tonelli

Ce 4ème rendez-vous s'est déroulé à Paris le 24 juin 2013.

http://www.logilab.org/file/149646/raw/openstack-cloud-software-vertical-small.png

Retours sur l'utilisation d'OpenStack en production par eNovance

Monitoring

Parmi les outils utiles pour surveiller des installation OpenStack on peut citer OpenNMS (traps SNMP...). Cependant, beaucoup d'outils posent des problèmes, notamment dans le cadre de l’arrêt ou de la destruction de machines virtuelles (messages d'erreur non souhaités). Il ne faut pas hésiter à aller directement se renseigner dans la base SQL, c'est souvent le seul moyen d'accéder à certaines infos. C'est aussi un bon moyen pour exécuter correctement certaines opérations qui ne se sont pas déroulées convenablement dans l'interface. L'utilisation possible de SuperNova est recommandée si on utilise plusieurs instances de Nova (multi-environnement).

Problèmes durant les migrations

Citation : "plus on avance en version, mieux la procédure se déroule".

La pire migration a été celle de Diablo vers Essex (problèmes d'IPs fixes qui passent en dynamique, problèmes de droits). Même si tout semble bien se passer sur le moment, faire un redémarrage des nœuds hardware / nœud(s) de contrôle est recommandé pour vérifier que tout fonctionne correctement après le redémarrage.

Si on veut supprimer un nœud qui fait tourner des machines, une commande existe pour ne pas recevoir de nouvelles machines (en cas de panne matérielle par exemple).

Afin d'avoir de la redondance, ne pas utiliser les nœuds hébergeant les machines à plus de 60% permet d'avoir de la place pour héberger des machines déplacées en cas de panne / disparition d'un des nœuds nova-compute.

Au niveau migration, ce qui est bien documenté pour KVM est la migration avec du stockage partagé ou 'live storage' (le système de fichier contenant le disque de la machine est disponible sur la source de la machine et sur la destination). Il existe un second mode, avec migration par 'block' qui permet de migrer des disques s'ils ne sont pas sur un système de fichier partagé. Ce second mécanisme a été présent, puis rendu obsolète, mais une nouvelle version semble aujourd'hui fonctionnelle.

À propos de la fonction terminate

La fonction terminate détruit définitivement une machine virtuelle, sans possibilité de retour en arrière. Il n'est pas évident pour une personne qui commence sur OpenStack de le comprendre : elle voit souvent ce bouton comme "juste" éteindre la machine. Pour éviter les gros problèmes (tuer une machine client par erreur et sans retour en arrière possible), il existe 2 solutions :

Pour les admins il existe un paramètre dans la configuration d'une instance :

UPDATE SED disable_terminate '1'

Pour les autres (non-admins), on peut utiliser une fonction de verrouillage (nova lock) d'une machine afin d'éviter le problème. (Les deux utilisent le même mécanisme).

Token

Les tokens servent à authentifier / autoriser une transaction sur les différentes versions d'OpenStack. Il est important que la base qui les stocke soit indexée pour améliorer les performances. Cette base grossit très vite, notamment avec grizzly où les tokens reposent sur un système PKI. Il est utile de faire une sauvegarde régulière de cette base avant de la flusher (pour permettre à un client de récupérer un ancien token notamment).

Espace de stockage

Ce n'est pas conforme aux bonnes pratiques, mais avoir juste un /var énorme pour le stockage fonctionne bien, notamment quand on a besoin de stocker des snapshots de machines Windows ou de grosses machines.

Ceph

https://www.logilab.org/file/149647/raw/logo.png

Spécial citation: "c'est du libre, du vrai".

Ceph est un projet permettant d'avoir un stockage redondant sur plusieurs machines, et offrant trois opérations principales : get, put, delete. Le système repose sur l'utilisation par le client d'une carte, fournie par les serveurs ceph, qui lui permet de trouver et de récupérer facilement un objet stocké.

Le pilote de stockage virtualisé est parfaitement fonctionnel et tourne déjà en production. Ce pilote permet d'utiliser Ceph comme périphérique de stockage pour des machines virtuelles. Il existe un remplacement (Rados Gateway Daemon) qui implémente l'interface swift pour OpenStack et permet d'utiliser Ceph en backend. Ce pilote est qualifié de stable.

Le pilote noyau (librbd) permettant de voir du stockage Ceph comme des disques est encore en développement, mais devrait fonctionner à peu près correctement si le noyau est >= 3.8 (3.10 conseillé).

Performances

En écriture, une division par deux des performances est à prévoir par rapport à une utilisation directe des disques (il faut que les N copies ait été stockées avant de valider la transaction). En lecture, Ceph est plus rapide vu l'agrégation possible des ressources de stockage.

Traduction de Openstack

La communauté cherche des gens motivés pour traduire le texte des différents outils Openstack.

Neutron (anciennement Quantum)

Pas mal de travail en cours, la version Havana devrait avoir la capacité de passer mieux à l'échelle et permettre la haute disponibilité (ça fonctionne toujours même quand une des machines qui hébergent le service plante) pour tout ce qui est DHCP / L3.

Sur l'organisation d'une communauté OpenStack en France

Différents acteurs cherchent actuellement à se regrouper, mais rien n'est encore décidé.


About salt-ami-cloud-builder

2013/06/07 by Paul Tonelli

What

At Logilab we are big fans of SaltStack, we use it quite extensivelly to centralize, configure and automate deployments.

http://www.logilab.org/file/145398/raw/SaltStack-Logo.png

We've talked on this blog about how to build a Debian AMI "by hand" and we wanted to automate this fully. Hence the salt way seemed to be the obvious way to go.

So we wrote salt-ami-cloud-builder. It is mainly glue between existing pieces of software that we use and like. If you already have some definition of a type of host that you provision using salt-stack, salt-ami-cloud-builder should be able to generate the corresponding AMI.

http://www.logilab.org/file/145397/raw/open-stack-cloud-computing-logo-2.png

Why

Building a Debian based OpenStack private cloud using salt made us realize that we needed a way to generate various flavours of AMIs for the following reasons:

  • Some of our openstack users need "preconfigured" AMIs (for example a Debian system with Postgres 9.1 and the appropriate Python bindings) without doing the modifications by hand or waiting for an automated script to do the job at AMI boot time.
  • Some cloud use cases require that you boot many (hundreds for instance) machines with the same configuration. While tools like salt automate the job, waiting while the same download and install takes place hundreds of times is a waste of resources. If the modifications have already been integrated into a specialized ami, you save a lot of computing time. And especially in the Amazon (or other pay-per-use cloud infrastructures), these resources are not free.
  • Sometimes one needs to repeat a computation on an instance with the very same packages and input files, possibly years after the first run. Freezing packages and files in one preconfigured AMI helps this a lot. When relying only on a salt configuration the installed packages may not be (exactly) the same from one run to the other.

Relation to other projects

While multiple tools like build-debian-cloud exist, their objective is to build a vanilla AMI from scratch. The salt-ami-cloud-builder starts from such vanilla AMIs to create variations. Other tools like salt-cloud focus instead on the boot phase of the deployment of (multiple) machines.

Chef & Puppet do the same job as Salt, however Salt being already extensively deployed at Logilab, we continue to build on it.

Get it now !

Grab the code here: http://hg.logilab.org/master/salt-ami-cloud-builder

The project page is http://www.logilab.org/project/salt-ami-cloud-builder

The docs can be read here: http://docs.logilab.org/salt-ami-cloud-builder

We hope you find it useful. Bug reports and contributions are welcome.

The logilab-salt-ami-cloud-builder team :)


Building Debian images for an OpenStack (private) cloud

2012/12/23 by David Douard

Now I have a working OpenStack cloud at Logilab, I want to provide my fellow collegues a bunch of ready-made images to create instances.

Strangely, there are no really usable ready-made UEC Debian images available out there. There have been recent efforts made to provide Debian images on Amazon Market Place, and the toolsuite used to build these is available as a collection of bash shell scripts from a github repository. There are also some images for Eucalyptus, but I have not been able to make them boot properly on my kvm-based OpenStack install.

So I have tried to build my own set of Debian images to upload in my glance shop.

Vocabulary

A bit of vocabulary may be useful for the one not very accustomed with OpenStack nor AWS jargons.

When you want to create an instance of an image, ie. boot a virtual machine in a cloud, you generally choose from a set of ready made system images, then you choose a virtual machine flavor (ie. a combination of a number of virtual CPUs, an amount of RAM, and a harddrive size used as root device). Generally, you have to choose between tiny (1 CPU, 512MB, no disk), small (1 CPU, 2G of RAM, 20G of disk), etc.

In the cloud world, an instance is not meant to be sustainable. What is sustainable is a volume that can be attached to a running instance.

If you want your instance to be sustainable, there are 2 choices:

  • you can snapshot a running instance and upload it as a new image ; so it is not really a sustainable instance, instead, it's the ability to configure an instance that is then the base for booting other instances,
  • or you can boot an instance from a volume (which is the sustainable part of a virtual machine in a cloud).

In the Amazon world, a "standard" image (the one that is instanciated when creating a new instance) is called an instance store-backed AMI images, also called an UEC image, and a volume image is called an EBS-backed AMI image (EBS stands for Elastic Block Storage). So an AMI images stored in a volume cannot be instanciated, it can be booted once and only once at a time. But it is sustainable. Different usage.

An UEC or AMI image consist in a triplet: a kernel, an init ramdisk and a root file system image. An EBS-backed image is just the raw image disk to be booted on a virtulization host (a kvm raw or qcow2 image, etc.)

Images in OpenStack

In OpenStack, when you create an instance from a given image, what happens depends on the kind of image.

In fact, in OpenStack, one can upload traditional UEC AMI images (need to upload the 3 files, the kernel, the initial ramdisk and the root filesystem as a raw image). But one can also upload bare images. These kind of images are booted directly by the virtualization host. So it is some kind of hybrid between a boot from volume (an EBS-backed boot in the Amazon world) and the traditional instanciation from an UEC image.

Instanciating an AMI image

When one creates an instance from an AMI image in an OpenStack cloud:

  • the kernel is copied to the virtualization host,
  • the initial ramdisk is copied to the virtualization host,
  • the root FS image is copied to the virtualization host,
  • then, the root FS image is :
    • duplicated (instanciated),
    • resized (the file is increased if needed) to the size of the asked instance flavor,
    • the file system is resized to the new size of the file,
    • the contained filesystem is mounted (using qemu-nbd) and the configured SSH acces key is added to /root/.ssh/authorized_keys
    • the nbd volume is then unmounted
  • a libvirt domain is created, configured to boot from the given kernel and init ramdisk, using the resized and modified image disk as root filesystem,
  • the libvirt domain is then booted.

Instantiating a BARE image

When one creates an instance from a BARE image in an OpenStack cloud:

  • the VM image file is copied on the virtualization host,
  • the VM image file is duplicated (instantiated),
  • a libvirt domain is created, configured to boot from this copied image disk as root filesystem,
  • the libvirt domain is then booted.

Differences between the 2 instantiation methods

Instantiating a BARE image:
  • Involves a much simpler process.
  • Allows to boot a non-linux system (depends on the virtualization system, especially true when using kvm vitualization).
  • Is slower to boot and consumes more resources, since the virtual machine image must be the size of the required/wanted virtual machine (but can remain minimal if using a qcow2 image format). If you use a 10G raw image, then 10G of data will be copied from the image provider to the virtualization host, and this big file will be duplicated each time you instantiate this image.
  • The root filesystem size corresponding to the flavor of the instance is not honored; the filesystem size is the one of the BARE images.
Instantiating an AMI image:
  • Honours the flavor.
  • Generally allows quicker instance creation process.
  • Less resource consumption.
  • Can only boot Linux guests.

If one wants to boot a Windows guest in OpenStack, the only solution (as far as I know) is to use a BARE image of an installed Windows system. It works (I have succeeded in doing so), but a minimal Windows 7 install is several GB, so instantiating such a BARE image is very slow, because the image needs to be uploaded on the virtualization host.

Building a Debian AMI image

So I wanted to provide a minimal Debian image in my cloud, and to provide it as an AMI image so the flavor is honoured, and so the standard cloud injection mechanisms (like setting up the ssh key to access the VM) work without having to tweak the rc.local script or use cloud-init in my guest.

Here is what I did.

1. Install a Debian system in a standard libvirt/kvm guest.

david@host:~$ virt-install  --connect qemu+tcp://virthost/system   \
                 -n openstack-squeeze-amd64 -r 512 \
                 -l http://ftp2.fr.debian.org/pub/debian/dists/stable/main/installer-amd64/ \
                 --disk pool=default,bus=virtio,type=qcow2,size=5 \
                 --network bridge=vm7,model=virtio  --nographics  \
                 --extra-args='console=tty0 console=ttyS0,115200'

This creates a new virtual machine, launch the Debian installer directly downloaded from a Debian mirror, and start the usual Debian installer in a virtual serial console (I don't like VNC very much).

I then followed the installation procedure. When asked for the partitioning and so, I chose to create only one primary partition (ie. with no swap partition; it wont be necessary here). I also chose only "Default system" and "SSH server" to be installed.

2. Configure the system

After the installation process, the VM is rebooted, I log into it (by SSH or via the console), so I can configure a bit the system.

david@host:~$ ssh root@openstack-squeeze-amd64.vm.logilab.fr
Linux openstack-squeeze-amd64 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Dec 23 20:14:24 2012 from 192.168.1.34
root@openstack-squeeze-amd64:~# apt-get update
root@openstack-squeeze-amd64:~# apt-get install vim curl parted # install some must have packages
[...]
root@openstack-squeeze-amd64:~# dpkg-reconfigure locales # I like to have fr_FR and en_US in my locales
[...]
root@openstack-squeeze-amd64:~# echo virtio_baloon >> /etc/modules
root@openstack-squeeze-amd64:~# echo acpiphp >> /etc/modules
root@openstack-squeeze-amd64:~# update-initramfs -u
root@openstack-squeeze-amd64:~# apt-get clean
root@openstack-squeeze-amd64:~# rm /etc/udev/rules.d/70-persistent-net.rules
root@openstack-squeeze-amd64:~# rm .bash_history
root@openstack-squeeze-amd64:~# poweroff

What we do here is to install some packages, do some configurations. The important part is adding the acpiphp module so the volume attachment will work in our instances. We also clean some stuffs up before shutting the VM down.

3. Convert the image into an AMI image

Since I created the VM image as a qcow2 image, I needed to convert it back to a raw image:

david@host:~$ scp root@virthost:/var/lib/libvirt/images/openstack-squeeze-amd64.img .
david@host:~$ qemu-img convert -O raw openstack-squeeze-amd64.img openstack-squeeze-amd64.raw

Then, as I want a minimal-sized disk image, the filesystem must be resized to minimal. I did this like described below, but I think there are simpler methods to do so.

david@host:~$ fdisk -l openstack-squeeze-amd64.raw  # display the partition location in the disk

Disk openstack-squeeze-amd64.raw: 5368 MB, 5368709120 bytes
149 heads, 8 sectors/track, 8796 cylinders, total 10485760 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0001fab7

                   Device Boot      Start         End      Blocks   Id  System
debian-squeeze-amd64.raw1            2048    10483711     5240832   83  Linux
david@host:~$ # extract the filesystem from the image
david@host:~$ dd if=openstack-squeeze-amd64.raw of=openstack-squeeze-amd64.ami bs=1024 skip=1024 count=5240832
david@host:~$ losetup /dev/loop1 openstack-squeeze-amd64.ami
david@host:~$ mkdir /tmp/img
david@host:~$ mount /dev/loop1 /tmp/img
david@host:~$ cp /tmp/img/boot/vmlinuz-2.6.32-5-amd64 .
david@host:~$ cp /tmp/img/boot/initrd.img-2.6.32-5-amd64 .
david@host:~$ umount /tmp/img
david@host:~$ e2fsck -f /dev/loop1 # required before a resize

e2fsck 1.42.5 (29-Jul-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/loop1: 26218/327680 files (0.2% non-contiguous), 201812/1310208 blocks
david@host:~$ resize2fs -M /dev/loop1 # minimize the filesystem

resize2fs 1.42.5 (29-Jul-2012)
Resizing the filesystem on /dev/loop1 to 191461 (4k) blocks.
The filesystem on /dev/loop1 is now 191461 blocks long.
david@host:~$ # note the new size ^^^^ and the block size above (4k)
david@host:~$ losetup -d /dev/loop1 # detach the lo device
david@host:~$ dd if=debian-squeeze-amd64.ami of=debian-squeeze-amd64-reduced.ami bs=4096 count=191461

4. Upload in OpenStack

After all this, you have a kernel image, a init ramdisk file and a minimized root filesystem image file. So you just have to upload them to your OpenStack image provider (glance):

david@host:~$ glance add disk_format=aki container_format=aki name="debian-squeeze-uec-x86_64-kernel" \
                 < vmlinuz-2.6.32-5-amd64
Uploading image 'debian-squeeze-uec-x86_64-kernel'
==================================================================================[100%] 24.1M/s, ETA  0h  0m  0s
Added new image with ID: 644e59b8-1503-403f-a4fe-746d4dac2ff8
david@host:~$ glance add disk_format=ari container_format=ari name="debian-squeeze-uec-x86_64-initrd" \
                 < initrd.img-2.6.32-5-amd64
Uploading image 'debian-squeeze-uec-x86_64-initrd'
==================================================================================[100%] 26.7M/s, ETA  0h  0m  0s
Added new image with ID: 6f75f1c9-1e27-4cb0-bbe0-d30defa8285c
david@host:~$ glance add disk_format=ami container_format=ami name="debian-squeeze-uec-x86_64" \
                 kernel_id=644e59b8-1503-403f-a4fe-746d4dac2ff8 ramdisk_id=6f75f1c9-1e27-4cb0-bbe0-d30defa8285c \
                 < debian-squeeze-amd64-reduced.ami
Uploading image 'debian-squeeze-uec-x86_64'
==================================================================================[100%] 42.1M/s, ETA  0h  0m  0s
Added new image with ID: 4abc09ae-ea34-44c5-8d54-504948e8d1f7
http://www.logilab.org/file/115220?vid=download

And that's it (!). I now have a working Debian squeeze image in my cloud that works fine:

http://www.logilab.org/file/115221?vid=download

Openstack, Wheezy and ZFS on Linux

2012/12/19 by David Douard

Openstack, Wheezy and ZFS on Linux

A while ago, I started the install of an OpenStack cluster at Logilab, so our developers can play easily with any kind of environment. We are planning to improve our Apycot automatic testing platform so it can use "elastic power". And so on.

http://www.openstack.org/themes/openstack/images/open-stack-cloud-computing-logo-2.png

I first tried a Ubuntu Precise based setup, since at that time, Debian packages were not really usable. The setup never reached a point where it could be relased as production ready, due to the fact I tried a too complex and bleeding edge configuration (involving Quantum, openvswitch, sheepdog)...

Meanwhile, we went really short of storage capacity. For now, it mainly consists in hard drives distributed in our 19" Dell racks (generally with hardware RAID controllers). So I recently purchased a low-cost storage bay (SuperMicro SC937 with a 6Gb/s JBOD-only HBA) with 18 spinning hard drives and 4 SSDs. This storage bay being driven by ZFS on Linux (tip: the SSD-stored ZIL is a requirement to get decent performances). This storage setup is still under test for now.

http://zfsonlinux.org/images/zfs-linux.png

I also went to the last Mini-DebConf in Paris, where Loic Dachary presented the status of the OpenStack packaging effort in Debian. This gave me the will to give a new try to OpenStack using Wheezy and a bit simpler setup. But I could not consider not to use my new ZFS-based storage as a nova volume provider. It is not available for now in OpenStack (there is a backend for Solaris, but not for ZFS on Linux). However, this is Python and in fact, the current ISCSIDriver backend needs very little to make it work with zfs instead of lvm as "elastics" block-volume provider and manager.

So, I wrote a custom nova volume driver to handle this. As I don't want the nova-volume daemon to run on my ZFS SAN, I wrote this backend mixing the SanISCSIDriver (which manages the storage system via SSH) and the standard ISCSIDriver (which uses standard Linux isci target tools). I'm not very fond of the API of the VolumeDriver (especially the fact that the ISCSIDriver is responsible for 2 roles: managing block-level volumes and exporting block-level volumes). This small design flaw (IMHO) is the reason I had to duplicate some code (not much but...) to implement my ZFSonLinuxISCSIDriver...

So here is the setup I made:

Infrastructure

My OpenStack Essex "cluster" consists for now in:

  • one control node, running in a "normal" libvirt-controlled virtual machine; it is a Wheezy that runs:
    • nova-api
    • nova-cert
    • nova-network
    • nova-scheduler
    • nova-volume
    • glance
    • postgresql
    • OpenStack dashboard
  • one computing node (Dell R310, Xeon X3480, 32G, Wheezy), which runs:
    • nova-api
    • nova-network
    • nova-compute
  • ZFS-on-Linux SAN (3x raidz1 poools made of 6 1T drives, 2x (mirrored) 32G SLC SDDs, 2x 120G MLC SSDs for cache); for now, the storage is exported to the SAN via one 1G ethernet link.

OpensStack Essex setup

I mainly followed the Debian HOWTO to setup my private cloud. I mainly tuned the network settings to match my environement (and the fact my control node lives in a VM, with VLAN stuff handled by the host).

I easily got a working setup (I must admit that I think my previous experiment with OpenStack helped a lot when dealing with custom configurations... and vocabulary; I'm not sure I would have succeded "easily" following the HOWTO, but hey, it is a functionnal HOWTO, meaning if you do not follow the instructions because you want special tunings, don't blame the HOWTO).

Compared to the HOWTO, my nova.conf looks like (as of today):

[DEFAULT]
logdir=/var/log/nova
state_path=/var/lib/nova
lock_path=/var/lock/nova
root_helper=sudo nova-rootwrap
auth_strategy=keystone
dhcpbridge_flagfile=/etc/nova/nova.conf
dhcpbridge=/usr/bin/nova-dhcpbridge
sql_connection=postgresql://novacommon:XXX@control.openstack.logilab.fr/nova

##  Network config
# A nova-network on each compute node
multi_host=true
# VLan manger
network_manager=nova.network.manager.VlanManager
vlan_interface=eth1
# My ip
my-ip=172.17.10.2
public_interface=eth0
# Dmz & metadata things
dmz_cidr=169.254.169.254/32
ec2_dmz_host=169.254.169.254
metadata_host=169.254.169.254

## More general things
# The RabbitMQ host
rabbit_host=control.openstack.logilab.fr

## Glance
image_service=nova.image.glance.GlanceImageService
glance_api_servers=control.openstack.logilab.fr:9292
use-syslog=true
ec2_host=control.openstack.logilab.fr

novncproxy_base_url=http://control.openstack.logilab.fr:6080/vnc_auto.html
vncserver_listen=0.0.0.0
vncserver_proxyclient_address=127.0.0.1

Volume

I had a bit more work to do to make nova-volume work. First, I got hit by this nasty bug #695791 which is trivial to fix... when you know how to fix it (I noticed the bug report after I fixed it by myself).

Then, as I wanted the volumes to be stored and exported by my shiny new ZFS-on-Linux setup, I had to write my own volume driver, which was quite easy, since it is Python, and the logic to implement was already provided by the ISCSIDriver class on the one hand, and by the SanISCSIDrvier on the other hand. So I ended with this firt implementation. This file should be copied to nova volumes package directory (nova/volume/zol.py):

# vim: tabstop=4 shiftwidth=4 softtabstop=4

# Copyright 2010 United States Government as represented by the
# Administrator of the National Aeronautics and Space Administration.
# Copyright 2011 Justin Santa Barbara
# Copyright 2012 David DOUARD, LOGILAB S.A.
# All Rights Reserved.
#
#    Licensed under the Apache License, Version 2.0 (the "License"); you may
#    not use this file except in compliance with the License. You may obtain
#    a copy of the License at
#
#         http://www.apache.org/licenses/LICENSE-2.0
#
#    Unless required by applicable law or agreed to in writing, software
#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
#    License for the specific language governing permissions and limitations
#    under the License.
"""
Driver for ZFS-on-Linux-stored volumes.

This is mainly a custom version of the ISCSIDriver that uses ZFS as
volume provider, generally accessed over SSH.
"""

import os

from nova import exception
from nova import flags
from nova import utils
from nova import log as logging
from nova.openstack.common import cfg
from nova.volume.driver import _iscsi_location
from nova.volume import iscsi
from nova.volume.san import SanISCSIDriver


LOG = logging.getLogger(__name__)

san_opts = [
    cfg.StrOpt('san_zfs_command',
               default='/sbin/zfs',
               help='The ZFS command.'),
    ]

FLAGS = flags.FLAGS
FLAGS.register_opts(san_opts)


class ZFSonLinuxISCSIDriver(SanISCSIDriver):
    """Executes commands relating to ZFS-on-Linux-hosted ISCSI volumes.

    Basic setup for a ZoL iSCSI server:

    XXX

    Note that current implementation of ZFS on Linux does not handle:

      zfs allow/unallow

    For now, needs to have root access to the ZFS host. The best is to
    use a ssh key with ssh authorized_keys restriction mechanisms to
    limit root access.

    Make sure you can login using san_login & san_password/san_private_key
    """
    ZFSCMD = FLAGS.san_zfs_command

    _local_execute = utils.execute

    def _getrl(self):
        return self._runlocal
    def _setrl(self, v):
        if isinstance(v, basestring):
            v = v.lower() in ('true', 't', '1', 'y', 'yes')
        self._runlocal = v
    run_local = property(_getrl, _setrl)

    def __init__(self):
        super(ZFSonLinuxISCSIDriver, self).__init__()
        self.tgtadm.set_execute(self._execute)
        LOG.info("run local = %s (%s)" % (self.run_local, FLAGS.san_is_local))

    def set_execute(self, execute):
        LOG.debug("override local execute cmd with %s (%s)" %
                  (repr(execute), execute.__module__))
        self._local_execute = execute

    def _execute(self, *cmd, **kwargs):
        if self.run_local:
            LOG.debug("LOCAL execute cmd %s (%s)" % (cmd, kwargs))
            return self._local_execute(*cmd, **kwargs)
        else:
            LOG.debug("SSH execute cmd %s (%s)" % (cmd, kwargs))
            check_exit_code = kwargs.pop('check_exit_code', None)
            command = ' '.join(cmd)
            return self._run_ssh(command, check_exit_code)

    def _create_volume(self, volume_name, sizestr):
        zfs_poolname = self._build_zfs_poolname(volume_name)

        # Create a zfs volume
        cmd = [self.ZFSCMD, 'create']
        if FLAGS.san_thin_provision:
            cmd.append('-s')
        cmd.extend(['-V', sizestr])
        cmd.append(zfs_poolname)
        self._execute(*cmd)

    def _volume_not_present(self, volume_name):
        zfs_poolname = self._build_zfs_poolname(volume_name)
        try:
            out, err = self._execute(self.ZFSCMD, 'list', '-H', zfs_poolname)
            if out.startswith(zfs_poolname):
                return False
        except Exception as e:
            # If the volume isn't present
            return True
        return False

    def create_volume_from_snapshot(self, volume, snapshot):
        """Creates a volume from a snapshot."""
        zfs_snap = self._build_zfs_poolname(snapshot['name'])
        zfs_vol = self._build_zfs_poolname(snapshot['name'])
        self._execute(self.ZFSCMD, 'clone', zfs_snap, zfs_vol)
        self._execute(self.ZFSCMD, 'promote', zfs_vol)

    def delete_volume(self, volume):
        """Deletes a volume."""
        if self._volume_not_present(volume['name']):
            # If the volume isn't present, then don't attempt to delete
            return True
        zfs_poolname = self._build_zfs_poolname(volume['name'])
        self._execute(self.ZFSCMD, 'destroy', zfs_poolname)

    def create_export(self, context, volume):
        """Creates an export for a logical volume."""
        self._ensure_iscsi_targets(context, volume['host'])
        iscsi_target = self.db.volume_allocate_iscsi_target(context,
                                                            volume['id'],
                                                      volume['host'])
        iscsi_name = "%s%s" % (FLAGS.iscsi_target_prefix, volume['name'])
        volume_path = self.local_path(volume)

        # XXX (ddouard) this code is not robust: does not check for
        # existing iscsi targets on the host (ie. not created by
        # nova), but fixing it require a deep refactoring of the iscsi
        # handling code (which is what have been done in cinder)
        self.tgtadm.new_target(iscsi_name, iscsi_target)
        self.tgtadm.new_logicalunit(iscsi_target, 0, volume_path)

        if FLAGS.iscsi_helper == 'tgtadm':
            lun = 1
        else:
            lun = 0
        if self.run_local:
            iscsi_ip_address = FLAGS.iscsi_ip_address
        else:
            iscsi_ip_address = FLAGS.san_ip
        return {'provider_location': _iscsi_location(
                iscsi_ip_address, iscsi_target, iscsi_name, lun)}

    def remove_export(self, context, volume):
        """Removes an export for a logical volume."""
        try:
            iscsi_target = self.db.volume_get_iscsi_target_num(context,
                                                           volume['id'])
        except exception.NotFound:
            LOG.info(_("Skipping remove_export. No iscsi_target " +
                       "provisioned for volume: %d"), volume['id'])
            return

        try:
            # ietadm show will exit with an error
            # this export has already been removed
            self.tgtadm.show_target(iscsi_target)
        except Exception as e:
            LOG.info(_("Skipping remove_export. No iscsi_target " +
                       "is presently exported for volume: %d"), volume['id'])
            return

        self.tgtadm.delete_logicalunit(iscsi_target, 0)
        self.tgtadm.delete_target(iscsi_target)

    def check_for_export(self, context, volume_id):
        """Make sure volume is exported."""
        tid = self.db.volume_get_iscsi_target_num(context, volume_id)
        try:
            self.tgtadm.show_target(tid)
        except exception.ProcessExecutionError, e:
            # Instances remount read-only in this case.
            # /etc/init.d/iscsitarget restart and rebooting nova-volume
            # is better since ensure_export() works at boot time.
            LOG.error(_("Cannot confirm exported volume "
                        "id:%(volume_id)s.") % locals())
            raise

    def local_path(self, volume):
        zfs_poolname = self._build_zfs_poolname(volume['name'])
        zvoldev = '/dev/zvol/%s' % zfs_poolname
        return zvoldev

    def _build_zfs_poolname(self, volume_name):
        zfs_poolname = '%s%s' % (FLAGS.san_zfs_volume_base, volume_name)
        return zfs_poolname

To configure my nova-volume instance (which runs on the control node, since it's only a manager), I added these to my nova.conf file:

# nove-volume config
volume_driver=nova.volume.zol.ZFSonLinuxISCSIDriver
iscsi_ip_address=172.17.1.7
iscsi_helper=tgtadm
san_thin_provision=false
san_ip=172.17.1.7
san_private_key=/etc/nova/sankey
san_login=root
san_zfs_volume_base=data/openstack/volume/
san_is_local=false
verbose=true

Note that the private key (/etc/nova/sankey here) is stored in clear and that it must be readable by the nova user.

This key being stored in clear and giving root acces to my ZFS host, I have limited a bit this root access by using a custom command wrapper in the .ssh/authorized_keys file.

Something like (naive implementation):

[root@zfshost ~]$ cat /root/zfswrapper
#!/bin/sh
CMD=`echo $SSH_ORIGINAL_COMMAND | awk '{print $1}'`
if [ "$CMD" != "/sbin/zfs" && "$CMD" != "tgtadm" ]; then
  echo "Can do only zfs/tgtadm stuff here"
  exit 1
fi

echo "[`date`] $SSH_ORIGINAL_COMMAND" >> .zfsopenstack.log
exec $SSH_ORIGINAL_COMMAND

Using this in root's .ssh/authorized_keys file:

[root@zfshost ~]$ cat /root/.ssh/authorized_keys | grep control
from="control.openstack.logilab.fr",no-pty,no-port-forwarding,no-X11-forwarding, \
      no-agent-forwarding,command="/root/zfswrapper" ssh-rsa AAAA[...] root@control

I had to set the iscsi_ip_address (the ip address of the ZFS host), but I think this is a result of something mistakenly implemented in my ZFSonLinux driver.

Using this config, I can boot an image, create a volume on my ZFS storage, and attach it to the running image.

I have to test things like snapshot, (live?) migration and so. This is a very first draft implementation which needs to be refined, improved and tested.

What's next

Besides the fact that it needs more tests, I plan to use salt for my OpenStack deployment (first to add more compute nodes in my cluster), and on the other side, I'd like to try the salt-cloud so I have a bunch of Debian images that "just work" (without the need of porting the cloud-init Ubuntu package).

On the side of my zol driver, I need to port it to Cinder, but I do not have a Folsom install to test it...