The previous case was simple, because we had only one attribute to align (the
name), but it is frequent to have a lot of attributes to align, such as the name
and the birth date and the birth city. The steps remain the same, except that
three distance matrices will be computed, and items will be represented as
nested lists. See the following example:
alignset = [['Paul Dupont', '14081991', 'Paris'],
['Jacques Dupuis', '06011999', 'Bressuire'],
['Michel Edouard', '18041881', 'Nantes']]
targetset = [['Dupond Paul', '14/08/1991', 'Paris'],
['Edouard Michel', '18/04/1881', 'Nantes'],
['Dupuis Jacques ', '06/01/1999', 'Bressuire'],
['Dupont Paul', '01122012', 'Paris']]
In such a case, two distance functions are used, the Levenshtein one for the
name and the city and a temporal one for the birth date .
The cdist function of nazca.distances enables us to compute those
matrices :
>>> nazca.matrix.cdist([a[0] for a in alignset], [t[0] for t in targetset],
>>> 'levenshtein', matrix_normalized=False)
array([[ 1., 6., 5., 0.],
[ 5., 6., 0., 5.],
[ 6., 0., 6., 6.]], dtype=float32)

Dupond Paul 
Edouard Michel 
Dupuis Jacques 
Dupont Paul 
Paul Dupont 
1 
6 
5 
0 
Jacques Dupuis 
5 
6 
0 
5 
Edouard Michel 
6 
0 
6 
6 
>>> nazca.matrix.cdist([a[1] for a in alignset], [t[1] for t in targetset],
>>> 'temporal', matrix_normalized=False)
array([[ 0., 40294., 2702., 7780.],
[ 2702., 42996., 0., 5078.],
[ 40294., 0., 42996., 48074.]], dtype=float32)

14/08/1991 
18/04/1881 
06/01/1999 
01122012 
14081991 
0 
40294 
2702 
7780 
06011999 
2702 
42996 
0 
5078 
18041881 
40294 
0 
42996 
48074 
>>> nazca.matrix.cdist([a[2] for a in alignset], [t[2] for t in targetset],
>>> 'levenshtein', matrix_normalized=False)
array([[ 0., 4., 8., 0.],
[ 8., 9., 0., 8.],
[ 4., 0., 9., 4.]], dtype=float32)

Paris 
Nantes 
Bressuire 
Paris 
Paris 
0 
4 
8 
0 
Bressuire 
8 
9 
0 
8 
Nantes 
4 
0 
9 
4 
The next step is gathering those three matrices into a global one, called the
global alignment matrix. Thus we have :

0 
1 
2 
3 
0 
1 
40304 
2715 
7780 
1 
2715 
43011 
0 
5091 
2 
40304 
0 
43011 
48084 
Allowing some misspelling mistakes (for example Dupont and Dupond are very
closed), the matching threshold can be set to 1 or 2. Thus we can see that the
item 0 in our alignset is the same that the item 0 in the targetset, the
1 in the alignset and the 2 of the targetset too : the links can be
done !
It's important to notice that even if the item 0 of the alignset and the 3
of the targetset have the same name and the same birthplace they are
unlikely identical because of their very different birth date.
You may have noticed that working with matrices as I did for the example is a
little bit boring. The good news is that Nazca makes all this job for you. You just
have to give the sets and distance functions and that's all. An other good news
is the project comes with the needed functions to build the sets !