|
Blog entries december 2012 [4]
Now I have a working OpenStack cloud at Logilab, I want to provide
my fellow collegues a bunch of ready-made images to create instances.
Strangely, there are no really usable ready-made UEC Debian images
available out there. There have been recent efforts made to provide
Debian images on Amazon Market Place, and the toolsuite used to
build these is available as a collection of bash shell scripts from
a github repository. There are also some images for Eucalyptus,
but I have not been able to make them boot properly on my kvm-based
OpenStack install.
So I have tried to build my own set of Debian images to upload in my
glance shop.
A bit of vocabulary may be useful for the one not very accustomed with
OpenStack nor AWS jargons.
When you want to create an instance of an image, ie. boot a virtual
machine in a cloud, you generally choose from a set of ready made
system images, then you choose a virtual machine flavor (ie. a
combination of a number of virtual CPUs, an amount of RAM, and a
harddrive size used as root device). Generally, you have to choose
between tiny (1 CPU, 512MB, no disk), small (1 CPU, 2G of RAM, 20G
of disk), etc.
In the cloud world, an instance is not meant to be sustainable. What
is sustainable is a volume that can be attached to a running instance.
If you want your instance to be sustainable, there are 2 choices:
- you can snapshot a running instance and upload it as a new image ;
so it is not really a sustainable instance, instead, it's the
ability to configure an instance that is then the base for booting
other instances,
- or you can boot an instance from a volume (which is the
sustainable part of a virtual machine in a cloud).
In the Amazon world, a "standard" image (the one that is instanciated
when creating a new instance) is called an instance store-backed AMI
images, also called an UEC image, and a volume image is called an
EBS-backed AMI image (EBS stands for Elastic Block Storage). So an AMI
images stored in a volume cannot be instanciated, it can be booted
once and only once at a time. But it is sustainable. Different usage.
An UEC or AMI image consist in a triplet: a kernel, an init ramdisk
and a root file system image. An EBS-backed image is just the raw
image disk to be booted on a virtulization host (a kvm raw or qcow2
image, etc.)
In OpenStack, when you create an instance from a given image, what
happens depends on the kind of image.
In fact, in OpenStack, one can upload traditional UEC AMI images (need
to upload the 3 files, the kernel, the initial ramdisk and the root
filesystem as a raw image). But one can also upload bare
images. These kind of images are booted directly by the
virtualization host. So it is some kind of hybrid between a boot from
volume (an EBS-backed boot in the Amazon world) and the traditional
instanciation from an UEC image.
When one creates an instance from an AMI image in an OpenStack cloud:
- the kernel is copied to the virtualization host,
- the initial ramdisk is copied to the virtualization host,
- the root FS image is copied to the virtualization host,
- then, the root FS image is :
- duplicated (instanciated),
- resized (the file is increased if needed) to the size of the asked
instance flavor,
- the file system is resized to the new size of the file,
- the contained filesystem is mounted (using qemu-nbd) and the
configured SSH acces key is added to
/root/.ssh/authorized_keys
- the nbd volume is then unmounted
- a libvirt domain is created, configured to boot from the given
kernel and init ramdisk, using the resized and modified image disk
as root filesystem,
- the libvirt domain is then booted.
When one creates an instance from a BARE image in an OpenStack cloud:
- the VM image file is copied on the virtualization host,
- the VM image file is duplicated (instantiated),
- a libvirt domain is created, configured to boot from this copied
image disk as root filesystem,
- the libvirt domain is then booted.
- Instantiating a BARE image:
- Involves a much simpler process.
- Allows to boot a non-linux system (depends on the virtualization
system, especially true when using kvm vitualization).
- Is slower to boot and consumes more resources, since the virtual
machine image must be the size of the required/wanted virtual
machine (but can remain minimal if using a qcow2 image format). If
you use a 10G raw image, then 10G of data will be copied from the
image provider to the virtualization host, and this big file will
be duplicated each time you instantiate this image.
- The root filesystem size corresponding to the flavor of the
instance is not honored; the filesystem size is the one of the
BARE images.
- Instantiating an AMI image:
- Honours the flavor.
- Generally allows quicker instance creation process.
- Less resource consumption.
- Can only boot Linux guests.
If one wants to boot a Windows guest in OpenStack, the only solution
(as far as I know) is to use a BARE image of an installed Windows
system. It works (I have succeeded in doing so), but a minimal Windows
7 install is several GB, so instantiating such a BARE image is very
slow, because the image needs to be uploaded on the virtualization
host.
So I wanted to provide a minimal Debian image in my cloud, and to
provide it as an AMI image so the flavor is honoured, and so the
standard cloud injection mechanisms (like setting up the ssh key to
access the VM) work without having to tweak the rc.local script or use
cloud-init in my guest.
Here is what I did.
david@host:~$ virt-install --connect qemu+tcp://virthost/system \
-n openstack-squeeze-amd64 -r 512 \
-l http://ftp2.fr.debian.org/pub/debian/dists/stable/main/installer-amd64/ \
--disk pool=default,bus=virtio,type=qcow2,size=5 \
--network bridge=vm7,model=virtio --nographics \
--extra-args='console=tty0 console=ttyS0,115200'
This creates a new virtual machine, launch the Debian installer
directly downloaded from a Debian mirror, and start the usual Debian
installer in a virtual serial console (I don't like VNC very much).
I then followed the installation procedure. When asked for the
partitioning and so, I chose to create only one primary partition
(ie. with no swap partition; it wont be necessary here). I also chose
only "Default system" and "SSH server" to be installed.
Since I created the VM image as a qcow2 image, I needed to convert it back to a raw image:
david@host:~$ scp root@virthost:/var/lib/libvirt/images/openstack-squeeze-amd64.img .
david@host:~$ qemu-img convert -O raw openstack-squeeze-amd64.img openstack-squeeze-amd64.raw
Then, as I want a minimal-sized disk image, the filesystem must be
resized to minimal. I did this like described below, but I think there
are simpler methods to do so.
david@host:~$ fdisk -l openstack-squeeze-amd64.raw # display the partition location in the disk
Disk openstack-squeeze-amd64.raw: 5368 MB, 5368709120 bytes
149 heads, 8 sectors/track, 8796 cylinders, total 10485760 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0001fab7
Device Boot Start End Blocks Id System
debian-squeeze-amd64.raw1 2048 10483711 5240832 83 Linux
david@host:~$ # extract the filesystem from the image
david@host:~$ dd if=openstack-squeeze-amd64.raw of=openstack-squeeze-amd64.ami bs=1024 skip=1024 count=5240832
david@host:~$ losetup /dev/loop1 openstack-squeeze-amd64.ami
david@host:~$ mkdir /tmp/img
david@host:~$ mount /dev/loop1 /tmp/img
david@host:~$ cp /tmp/img/boot/vmlinuz-2.6.32-5-amd64 .
david@host:~$ cp /tmp/img/boot/initrd.img-2.6.32-5-amd64 .
david@host:~$ umount /tmp/img
david@host:~$ e2fsck -f /dev/loop1 # required before a resize
e2fsck 1.42.5 (29-Jul-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/loop1: 26218/327680 files (0.2% non-contiguous), 201812/1310208 blocks
david@host:~$ resize2fs -M /dev/loop1 # minimize the filesystem
resize2fs 1.42.5 (29-Jul-2012)
Resizing the filesystem on /dev/loop1 to 191461 (4k) blocks.
The filesystem on /dev/loop1 is now 191461 blocks long.
david@host:~$ # note the new size ^^^^ and the block size above (4k)
david@host:~$ losetup -d /dev/loop1 # detach the lo device
david@host:~$ dd if=debian-squeeze-amd64.ami of=debian-squeeze-amd64-reduced.ami bs=4096 count=191461
After all this, you have a kernel image, a init ramdisk file and a
minimized root filesystem image file. So you just have to upload them to
your OpenStack image provider (glance):
david@host:~$ glance add disk_format=aki container_format=aki name="debian-squeeze-uec-x86_64-kernel" \
< vmlinuz-2.6.32-5-amd64
Uploading image 'debian-squeeze-uec-x86_64-kernel'
==================================================================================[100%] 24.1M/s, ETA 0h 0m 0s
Added new image with ID: 644e59b8-1503-403f-a4fe-746d4dac2ff8
david@host:~$ glance add disk_format=ari container_format=ari name="debian-squeeze-uec-x86_64-initrd" \
< initrd.img-2.6.32-5-amd64
Uploading image 'debian-squeeze-uec-x86_64-initrd'
==================================================================================[100%] 26.7M/s, ETA 0h 0m 0s
Added new image with ID: 6f75f1c9-1e27-4cb0-bbe0-d30defa8285c
david@host:~$ glance add disk_format=ami container_format=ami name="debian-squeeze-uec-x86_64" \
kernel_id=644e59b8-1503-403f-a4fe-746d4dac2ff8 ramdisk_id=6f75f1c9-1e27-4cb0-bbe0-d30defa8285c \
< debian-squeeze-amd64-reduced.ami
Uploading image 'debian-squeeze-uec-x86_64'
==================================================================================[100%] 42.1M/s, ETA 0h 0m 0s
Added new image with ID: 4abc09ae-ea34-44c5-8d54-504948e8d1f7
And that's it (!). I now have a working Debian squeeze image in my cloud that works fine:
Nazca is a python library aiming to
help you to align data. But, what does “align data” mean? For instance,
you have a list of cities, described by their name and their country and you
would like to find their URI on dbpedia to have more information about them, as
the longitude and the latitude. If you have two or three cities, it can be done
with bare hands, but it could not if there are hundreds or thousands cities.
Nazca provides you all the stuff we need to do it.
This blog post aims to introduce you how this library works and can be used.
Once you have understood the main concepts behind this library, don't hesitate
to try Nazca online !
The alignment process is divided into three main steps:
- Gather and format the data we want to align.
In this step, we define two sets called the alignset and the
targetset. The alignset contains our data, and the
targetset contains the data on which we would like to make the links.
- Compute the similarity between the items gathered. We compute a distance
matrix between the two sets according to a given distance.
- Find the items having a high similarity thanks to the distance matrix.
- Let's define alignset and targetset as simple python lists.
alignset = ['Victor Hugo', 'Albert Camus']
targetset = ['Albert Camus', 'Guillaume Apollinaire', 'Victor Hugo']
Now, we have to compute the similarity between each items. For that purpose, the
Levenshtein distance
, which is well accurate to compute the distance between few words, is used.
Such a function is provided in the nazca.distance module.
The next step is to compute the distance matrix according to the Levenshtein
distance. The result is given in the following table.
| |
Albert Camus
|
Guillaume Apollinaire
|
Victor Hugo
|
Victor Hugo
|
6
|
9
|
0
|
Albert Camus
|
0
|
8
|
6
|
The alignment process is ended by reading the matrix and saying items having a
value inferior to a given threshold are identical.
The previous case was simple, because we had only one attribute to align (the
name), but it is frequent to have a lot of attributes to align, such as the name
and the birth date and the birth city. The steps remain the same, except that
three distance matrices will be computed, and items will be represented as
nested lists. See the following example:
alignset = [['Paul Dupont', '14-08-1991', 'Paris'],
['Jacques Dupuis', '06-01-1999', 'Bressuire'],
['Michel Edouard', '18-04-1881', 'Nantes']]
targetset = [['Dupond Paul', '14/08/1991', 'Paris'],
['Edouard Michel', '18/04/1881', 'Nantes'],
['Dupuis Jacques ', '06/01/1999', 'Bressuire'],
['Dupont Paul', '01-12-2012', 'Paris']]
In such a case, two distance functions are used, the Levenshtein one for the
name and the city and a temporal one for the birth date .
The cdist function of nazca.distances enables us to compute those
matrices :
>>> nazca.matrix.cdist([a[0] for a in alignset], [t[0] for t in targetset],
>>> 'levenshtein', matrix_normalized=False)
array([[ 1., 6., 5., 0.],
[ 5., 6., 0., 5.],
[ 6., 0., 6., 6.]], dtype=float32)
| |
Dupond Paul |
Edouard Michel |
Dupuis Jacques |
Dupont Paul |
| Paul Dupont |
1 |
6 |
5 |
0 |
| Jacques Dupuis |
5 |
6 |
0 |
5 |
| Edouard Michel |
6 |
0 |
6 |
6 |
>>> nazca.matrix.cdist([a[1] for a in alignset], [t[1] for t in targetset],
>>> 'temporal', matrix_normalized=False)
array([[ 0., 40294., 2702., 7780.],
[ 2702., 42996., 0., 5078.],
[ 40294., 0., 42996., 48074.]], dtype=float32)
| |
14/08/1991 |
18/04/1881 |
06/01/1999 |
01-12-2012 |
| 14-08-1991 |
0 |
40294 |
2702 |
7780 |
| 06-01-1999 |
2702 |
42996 |
0 |
5078 |
| 18-04-1881 |
40294 |
0 |
42996 |
48074 |
>>> nazca.matrix.cdist([a[2] for a in alignset], [t[2] for t in targetset],
>>> 'levenshtein', matrix_normalized=False)
array([[ 0., 4., 8., 0.],
[ 8., 9., 0., 8.],
[ 4., 0., 9., 4.]], dtype=float32)
| |
Paris |
Nantes |
Bressuire |
Paris |
| Paris |
0 |
4 |
8 |
0 |
| Bressuire |
8 |
9 |
0 |
8 |
| Nantes |
4 |
0 |
9 |
4 |
The next step is gathering those three matrices into a global one, called the
global alignment matrix. Thus we have :
| |
0 |
1 |
2 |
3 |
| 0 |
1 |
40304 |
2715 |
7780 |
| 1 |
2715 |
43011 |
0 |
5091 |
| 2 |
40304 |
0 |
43011 |
48084 |
Allowing some misspelling mistakes (for example Dupont and Dupond are very
closed), the matching threshold can be set to 1 or 2. Thus we can see that the
item 0 in our alignset is the same that the item 0 in the targetset, the
1 in the alignset and the 2 of the targetset too : the links can be
done !
It's important to notice that even if the item 0 of the alignset and the 3
of the targetset have the same name and the same birthplace they are
unlikely identical because of their very different birth date.
You may have noticed that working with matrices as I did for the example is a
little bit boring. The good news is that Nazca makes all this job for you. You just
have to give the sets and distance functions and that's all. An other good news
is the project comes with the needed functions to build the sets !
Just before we start, we will assume the following imports have been done:
from nazca import dataio as aldio #Functions for input and output data
from nazca import distances as ald #Functions to compute the distances
from nazca import normalize as aln #Functions to normalize data
from nazca import aligner as ala #Functions to align data
On wikipedia, we can find the Goncourt prize winners, and we
would like to establish a link between the winners and their URI on dbpedia
(Let's imagine the Goncourt prize winners category does not exist in dbpedia)
We simply copy/paste the winners list of wikipedia into a file and replace all
the separators (- and ,) by #. So, the beginning of our file is :
1903#John-Antoine Nau#Force ennemie (Plume)
1904#Léon Frapié#La Maternelle (Albin Michel)
1905#Claude Farrère#Les Civilisés (Paul Ollendorff)
1906#Jérôme et Jean Tharaud#Dingley, l'illustre écrivain (Cahiers de la Quinzaine)
When using the high-level functions of this library, each item must have at
least two elements: an identifier (the name, or the URI) and the attribute to
compare. With the previous file, we will use the name (so the column number 1)
as identifier (we don't have an URI here as identifier) and attribute to align.
This is told to python thanks to the following code:
alignset = aldio.parsefile('prixgoncourt', indexes=[1, 1], delimiter='#')
So, the beginning of our alignset is:
>>> alignset[:3]
[[u'John-Antoine Nau', u'John-Antoine Nau'],
[u'Léon Frapié', u'Léon, Frapié'],
[u'Claude Farrère', u'Claude Farrère']]
Now, let's build the targetset thanks to a sparql query and the dbpedia
end-point. We ask for the list of the French novelists, described by their URI
and their name in French:
query = """
SELECT ?writer, ?name WHERE {
?writer <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:French_novelists>.
?writer rdfs:label ?name.
FILTER(lang(?name) = 'fr')
}
"""
targetset = aldio.sparqlquery('http://dbpedia.org/sparql', query)
Both functions return nested lists as presented before. Now, we have to define
the distance function to be used for the alignment. This is done thanks to a
python dictionary where the keys are the columns to work on, and the values are
the treatments to apply.
treatments = {1: {'metric': ald.levenshtein}} # Use a levenshtein on the name
# (column 1)
Finally, the last thing we have to do, is to call the alignall function:
alignments = ala.alignall(alignset, targetset,
0.4, #This is the matching threshold
treatments,
mode=None,#We'll discuss about that later
uniq=True #Get the best results only
)
This function returns an iterator over the different alignments done. You can
see the results thanks to the following code :
for a, t in alignments:
print '%s has been aligned onto %s' % (a, t)
It may be important to apply some pre-treatment on the data to align. For
instance, names can be written with lower or upper characters, with extra
characters as punctuation or unwanted information in parenthesis and so on. That
is why we provide some functions to normalize your data. The most useful may
be the simplify() function (see the docstring for more information). So the
treatments list can be given as follow:
def remove_after(string, sub):
""" Remove the text after ``sub`` in ``string``
>>> remove_after('I like cats and dogs', 'and')
'I like cats'
>>> remove_after('I like cats and dogs', '(')
'I like cats and dogs'
"""
try:
return string[:string.lower().index(sub.lower())].strip()
except ValueError:
return string
treatments = {1: {'normalization': [lambda x:remove_after(x, '('),
aln.simply],
'metric': ald.levenshtein
}
}
The previous case with the Goncourt prize winners was pretty simply because
the number of items was small, and the computation fast. But in a more real use
case, the number of items to align may be huge (some thousands or millions…). In
such a case it's unthinkable to build the global alignment matrix because it
would be too big and it would take (at least...) fews days to achieve the computation.
So the idea is to make small groups of possible similar data to compute smaller
matrices (i.e. a divide and conquer approach).
For this purpose, we provide some functions to group/cluster data. We have
functions to group text and numerical data.
This is the code used, we will explain it:
targetset = aldio.rqlquery('http://demo.cubicweb.org/geonames',
"""Any U, N, LONG, LAT WHERE X is Location, X name
N, X country C, C name "France", X longitude
LONG, X latitude LAT, X population > 1000, X
feature_class "P", X cwuri U""",
indexes=[0, 1, (2, 3)])
alignset = aldio.sparqlquery('http://dbpedia.inria.fr/sparql',
"""prefix db-owl: <http://dbpedia.org/ontology/>
prefix db-prop: <http://fr.dbpedia.org/property/>
select ?ville, ?name, ?long, ?lat where {
?ville db-owl:country <http://fr.dbpedia.org/resource/France> .
?ville rdf:type db-owl:PopulatedPlace .
?ville db-owl:populationTotal ?population .
?ville foaf:name ?name .
?ville db-prop:longitude ?long .
?ville db-prop:latitude ?lat .
FILTER (?population > 1000)
}""",
indexes=[0, 1, (2, 3)])
treatments = {1: {'normalization': [aln.simply],
'metric': ald.levenshtein,
'matrix_normalized': False
}
}
results = ala.alignall(alignset, targetset, 3, treatments=treatments, #As before
indexes=(2, 2), #On which data build the kdtree
mode='kdtree', #The mode to use
uniq=True) #Return only the best results
Let's explain the code. We have two files, containing a list of cities we want
to align, the first column is the identifier, and the second is the name of the city
and the last one is location of the city (longitude and latitude), gathered into
a single tuple.
In this example, we want to build a kdtree on the couple (longitude, latitude)
to divide our data in few candidates. This clustering is coarse, and is only
used to reduce the potential candidats without loosing any more refined possible
matchs.
So, in the next step, we define the treatments to apply.
It is the same as before, but we ask for a non-normalized matrix
(ie: the real output of the levenshtein distance).
Thus, we call the alignall function. indexes is a tuple saying the
position of the point on which the kdtree must be built, mode is the mode
used to find neighbours .
Finally, uniq ask to the function to return the best
candidate (ie: the one having the shortest distance below the given threshold)
The function outputs a generator yielding tuples where the first element is the
identifier of the alignset item and the second is the targetset one (It
may take some time before yielding the first tuples, because all the computation
must be done…)
We have also made this little application of Nazca, using Cubicweb. This application provides a user interface for
Nazca, helping you to choose what you want to align. You can use sparql or rql
queries, as in the previous example, or import your own cvs file . Once you
have choosen what you want to align, you can click the Next step button to
customize the treatments you want to apply, just as you did before in python !
Once done, by clicking the Next step, you start the alignment process. Wait a
little bit, and you can either download the results in a csv or rdf file, or
directly see the results online choosing the html output.
Openstack, Wheezy and ZFS on Linux
A while ago, I started the install of an OpenStack cluster at
Logilab, so our developers can play easily with any kind of
environment. We are planning to improve our Apycot automatic testing
platform so it can use "elastic power". And so on.
I first tried a Ubuntu Precise based setup, since at that time,
Debian packages were not really usable. The setup never reached a point
where it could be relased as production ready, due to the fact I tried a
too complex and bleeding edge configuration (involving Quantum,
openvswitch, sheepdog)...
Meanwhile, we went really short of storage capacity. For now, it
mainly consists in hard drives distributed in our 19" Dell racks
(generally with hardware RAID controllers). So I recently purchased a
low-cost storage bay (SuperMicro SC937 with a 6Gb/s JBOD-only HBA)
with 18 spinning hard drives and 4 SSDs. This storage bay being driven
by ZFS on Linux (tip: the SSD-stored ZIL is a requirement to
get decent performances). This storage setup is still under test for
now.
I also went to the last Mini-DebConf in Paris, where Loic Dachary
presented the status of the OpenStack packaging effort in
Debian. This gave me the will to give a new try to OpenStack using
Wheezy and a bit simpler setup. But I could not
consider not to use my new ZFS-based storage as a nova volume
provider. It is not available for now in OpenStack (there is a backend
for Solaris, but not for ZFS on Linux). However, this is Python and in
fact, the current ISCSIDriver backend needs very little to
make it work with zfs instead of lvm as "elastics" block-volume
provider and manager.
So, I wrote a custom nova volume driver to handle this. As I don't
want the nova-volume daemon to run on my ZFS SAN, I wrote this backend
mixing the SanISCSIDriver (which manages the storage system via
SSH) and the standard ISCSIDriver (which uses standard Linux isci
target tools). I'm not very fond of the API of the VolumeDriver
(especially the fact that the ISCSIDriver is responsible for 2 roles:
managing block-level volumes and exporting block-level volumes). This
small design flaw (IMHO) is the reason I had to duplicate some code
(not much but...) to implement my ZFSonLinuxISCSIDriver...
So here is the setup I made:
My OpenStack Essex "cluster" consists for now in:
- one control node, running in a "normal" libvirt-controlled virtual
machine; it is a Wheezy that runs:
- nova-api
- nova-cert
- nova-network
- nova-scheduler
- nova-volume
- glance
- postgresql
- OpenStack dashboard
- one computing node (Dell R310, Xeon X3480, 32G, Wheezy), which runs:
- nova-api
- nova-network
- nova-compute
- ZFS-on-Linux SAN (3x raidz1 poools made of 6 1T drives, 2x
(mirrored) 32G SLC SDDs, 2x 120G MLC SSDs for cache); for now, the storage is
exported to the SAN via one 1G ethernet link.
I mainly followed the Debian HOWTO to setup my private cloud. I
mainly tuned the network settings to match my environement (and the
fact my control node lives in a VM, with VLAN stuff handled by the
host).
I easily got a working setup (I must admit that I think my
previous experiment with OpenStack helped a lot when dealing with
custom configurations... and vocabulary; I'm not sure I would have
succeded "easily" following the HOWTO, but hey, it is a functionnal
HOWTO, meaning if you do not follow the instructions because you want
special tunings, don't blame the HOWTO).
Compared to the HOWTO, my nova.conf looks like (as of today):
[DEFAULT]
logdir=/var/log/nova
state_path=/var/lib/nova
lock_path=/var/lock/nova
root_helper=sudo nova-rootwrap
auth_strategy=keystone
dhcpbridge_flagfile=/etc/nova/nova.conf
dhcpbridge=/usr/bin/nova-dhcpbridge
sql_connection=postgresql://novacommon:XXX@control.openstack.logilab.fr/nova
## Network config
# A nova-network on each compute node
multi_host=true
# VLan manger
network_manager=nova.network.manager.VlanManager
vlan_interface=eth1
# My ip
my-ip=172.17.10.2
public_interface=eth0
# Dmz & metadata things
dmz_cidr=169.254.169.254/32
ec2_dmz_host=169.254.169.254
metadata_host=169.254.169.254
## More general things
# The RabbitMQ host
rabbit_host=control.openstack.logilab.fr
## Glance
image_service=nova.image.glance.GlanceImageService
glance_api_servers=control.openstack.logilab.fr:9292
use-syslog=true
ec2_host=control.openstack.logilab.fr
novncproxy_base_url=http://control.openstack.logilab.fr:6080/vnc_auto.html
vncserver_listen=0.0.0.0
vncserver_proxyclient_address=127.0.0.1
I had a bit more work to do to make nova-volume work. First, I got hit
by this nasty bug #695791 which is trivial to fix... when you know
how to fix it (I noticed the bug report after I fixed it by myself).
Then, as I wanted the volumes to be stored and exported by my shiny
new ZFS-on-Linux setup, I had to write my own volume driver, which was
quite easy, since it is Python, and the logic to implement was already
provided by the ISCSIDriver class on the one hand, and by the
SanISCSIDrvier on the other hand. So I ended with this firt
implementation. This file should be copied to nova volumes package
directory (nova/volume/zol.py):
# vim: tabstop=4 shiftwidth=4 softtabstop=4
# Copyright 2010 United States Government as represented by the
# Administrator of the National Aeronautics and Space Administration.
# Copyright 2011 Justin Santa Barbara
# Copyright 2012 David DOUARD, LOGILAB S.A.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
"""
Driver for ZFS-on-Linux-stored volumes.
This is mainly a custom version of the ISCSIDriver that uses ZFS as
volume provider, generally accessed over SSH.
"""
import os
from nova import exception
from nova import flags
from nova import utils
from nova import log as logging
from nova.openstack.common import cfg
from nova.volume.driver import _iscsi_location
from nova.volume import iscsi
from nova.volume.san import SanISCSIDriver
LOG = logging.getLogger(__name__)
san_opts = [
cfg.StrOpt('san_zfs_command',
default='/sbin/zfs',
help='The ZFS command.'),
]
FLAGS = flags.FLAGS
FLAGS.register_opts(san_opts)
class ZFSonLinuxISCSIDriver(SanISCSIDriver):
"""Executes commands relating to ZFS-on-Linux-hosted ISCSI volumes.
Basic setup for a ZoL iSCSI server:
XXX
Note that current implementation of ZFS on Linux does not handle:
zfs allow/unallow
For now, needs to have root access to the ZFS host. The best is to
use a ssh key with ssh authorized_keys restriction mechanisms to
limit root access.
Make sure you can login using san_login & san_password/san_private_key
"""
ZFSCMD = FLAGS.san_zfs_command
_local_execute = utils.execute
def _getrl(self):
return self._runlocal
def _setrl(self, v):
if isinstance(v, basestring):
v = v.lower() in ('true', 't', '1', 'y', 'yes')
self._runlocal = v
run_local = property(_getrl, _setrl)
def __init__(self):
super(ZFSonLinuxISCSIDriver, self).__init__()
self.tgtadm.set_execute(self._execute)
LOG.info("run local = %s (%s)" % (self.run_local, FLAGS.san_is_local))
def set_execute(self, execute):
LOG.debug("override local execute cmd with %s (%s)" %
(repr(execute), execute.__module__))
self._local_execute = execute
def _execute(self, *cmd, **kwargs):
if self.run_local:
LOG.debug("LOCAL execute cmd %s (%s)" % (cmd, kwargs))
return self._local_execute(*cmd, **kwargs)
else:
LOG.debug("SSH execute cmd %s (%s)" % (cmd, kwargs))
check_exit_code = kwargs.pop('check_exit_code', None)
command = ' '.join(cmd)
return self._run_ssh(command, check_exit_code)
def _create_volume(self, volume_name, sizestr):
zfs_poolname = self._build_zfs_poolname(volume_name)
# Create a zfs volume
cmd = [self.ZFSCMD, 'create']
if FLAGS.san_thin_provision:
cmd.append('-s')
cmd.extend(['-V', sizestr])
cmd.append(zfs_poolname)
self._execute(*cmd)
def _volume_not_present(self, volume_name):
zfs_poolname = self._build_zfs_poolname(volume_name)
try:
out, err = self._execute(self.ZFSCMD, 'list', '-H', zfs_poolname)
if out.startswith(zfs_poolname):
return False
except Exception as e:
# If the volume isn't present
return True
return False
def create_volume_from_snapshot(self, volume, snapshot):
"""Creates a volume from a snapshot."""
zfs_snap = self._build_zfs_poolname(snapshot['name'])
zfs_vol = self._build_zfs_poolname(snapshot['name'])
self._execute(self.ZFSCMD, 'clone', zfs_snap, zfs_vol)
self._execute(self.ZFSCMD, 'promote', zfs_vol)
def delete_volume(self, volume):
"""Deletes a volume."""
if self._volume_not_present(volume['name']):
# If the volume isn't present, then don't attempt to delete
return True
zfs_poolname = self._build_zfs_poolname(volume['name'])
self._execute(self.ZFSCMD, 'destroy', zfs_poolname)
def create_export(self, context, volume):
"""Creates an export for a logical volume."""
self._ensure_iscsi_targets(context, volume['host'])
iscsi_target = self.db.volume_allocate_iscsi_target(context,
volume['id'],
volume['host'])
iscsi_name = "%s%s" % (FLAGS.iscsi_target_prefix, volume['name'])
volume_path = self.local_path(volume)
# XXX (ddouard) this code is not robust: does not check for
# existing iscsi targets on the host (ie. not created by
# nova), but fixing it require a deep refactoring of the iscsi
# handling code (which is what have been done in cinder)
self.tgtadm.new_target(iscsi_name, iscsi_target)
self.tgtadm.new_logicalunit(iscsi_target, 0, volume_path)
if FLAGS.iscsi_helper == 'tgtadm':
lun = 1
else:
lun = 0
if self.run_local:
iscsi_ip_address = FLAGS.iscsi_ip_address
else:
iscsi_ip_address = FLAGS.san_ip
return {'provider_location': _iscsi_location(
iscsi_ip_address, iscsi_target, iscsi_name, lun)}
def remove_export(self, context, volume):
"""Removes an export for a logical volume."""
try:
iscsi_target = self.db.volume_get_iscsi_target_num(context,
volume['id'])
except exception.NotFound:
LOG.info(_("Skipping remove_export. No iscsi_target " +
"provisioned for volume: %d"), volume['id'])
return
try:
# ietadm show will exit with an error
# this export has already been removed
self.tgtadm.show_target(iscsi_target)
except Exception as e:
LOG.info(_("Skipping remove_export. No iscsi_target " +
"is presently exported for volume: %d"), volume['id'])
return
self.tgtadm.delete_logicalunit(iscsi_target, 0)
self.tgtadm.delete_target(iscsi_target)
def check_for_export(self, context, volume_id):
"""Make sure volume is exported."""
tid = self.db.volume_get_iscsi_target_num(context, volume_id)
try:
self.tgtadm.show_target(tid)
except exception.ProcessExecutionError, e:
# Instances remount read-only in this case.
# /etc/init.d/iscsitarget restart and rebooting nova-volume
# is better since ensure_export() works at boot time.
LOG.error(_("Cannot confirm exported volume "
"id:%(volume_id)s.") % locals())
raise
def local_path(self, volume):
zfs_poolname = self._build_zfs_poolname(volume['name'])
zvoldev = '/dev/zvol/%s' % zfs_poolname
return zvoldev
def _build_zfs_poolname(self, volume_name):
zfs_poolname = '%s%s' % (FLAGS.san_zfs_volume_base, volume_name)
return zfs_poolname
To configure my nova-volume instance (which runs on the control node,
since it's only a manager), I added these to my nova.conf file:
# nove-volume config
volume_driver=nova.volume.zol.ZFSonLinuxISCSIDriver
iscsi_ip_address=172.17.1.7
iscsi_helper=tgtadm
san_thin_provision=false
san_ip=172.17.1.7
san_private_key=/etc/nova/sankey
san_login=root
san_zfs_volume_base=data/openstack/volume/
san_is_local=false
verbose=true
Note that the private key (/etc/nova/sankey here) is stored
in clear and that it must be readable by the nova user.
This key being stored in clear and giving root acces to my ZFS host, I
have limited a bit this root access by using a custom command wrapper
in the .ssh/authorized_keys file.
Something like (naive implementation):
[root@zfshost ~]$ cat /root/zfswrapper
#!/bin/sh
CMD=`echo $SSH_ORIGINAL_COMMAND | awk '{print $1}'`
if [ "$CMD" != "/sbin/zfs" && "$CMD" != "tgtadm" ]; then
echo "Can do only zfs/tgtadm stuff here"
exit 1
fi
echo "[`date`] $SSH_ORIGINAL_COMMAND" >> .zfsopenstack.log
exec $SSH_ORIGINAL_COMMAND
Using this in root's .ssh/authorized_keys file:
[root@zfshost ~]$ cat /root/.ssh/authorized_keys | grep control
from="control.openstack.logilab.fr",no-pty,no-port-forwarding,no-X11-forwarding, \
no-agent-forwarding,command="/root/zfswrapper" ssh-rsa AAAA[...] root@control
I had to set the iscsi_ip_address (the ip address of the ZFS
host), but I think this is a result of something mistakenly
implemented in my ZFSonLinux driver.
Using this config, I can boot an image, create a volume on my ZFS
storage, and attach it to the running image.
I have to test things like snapshot, (live?) migration and so. This is a
very first draft implementation which needs to be refined, improved
and tested.
Besides the fact that it needs more tests, I plan to use salt for my OpenStack
deployment (first to add more compute nodes in my cluster), and on the
other side, I'd like to try the salt-cloud so I have a bunch of
Debian images that "just work" (without the need of porting the
cloud-init Ubuntu package).
On the side of my zol driver, I need to port it to Cinder, but I do not have a Folsom install to test it...
Pylint - the world renowned Python code static checker - now has a
landing page : http://www.pylint.org
We've tried to summarize all the things a newcomer should know about
pylint. We hope it reflects the diversity of uses and support canals
for pylint.
Note that pylint is not hosted on github or another well-known forge, since we firmly believe in a decentralized architecture for the web.
This applies especially to open source software development. Pylint's development is self-hosted on a forge and its code is version-controlled with mercurial, a distributed version control system (DVCS). Both tools are free software written in python.
We know centralized (and closed source) platforms for managing
software projects can make things easier for contributors. We have
enabled a mirror on bitbucket (and pylint-brain) so as to ease forks and
pull requests. Pull requests can be made there and even from a
self-hosted mercurial (with a quick email on the mailing-list).
Feel free to add your comments or feedback below.
|