// ************************************************************************** // // // // eses eses // // eses eses // // eses eseses esesese eses Embedded Systems Group // // ese ese ese ese ese // // ese eseseses eseseses ese Department of Computer Science // // eses eses ese eses // // eses eseses eseseses eses University of Kaiserslautern // // eses eses // // // // ************************************************************************** // // The Levenshtein distance is a metric to measure the difference of two given// // sequences. The Levenshtein distance of sequences a[0..M-1] and b[0..N-1] is// // defined as the minimum number of insertions, deletions, or substitutions to// // transform one string into the other one. The algorithm below computes the // // distances d[i][j] of all prefixes a[0..i-1] and b[0..j-1] according to // // recursive definition according to Wagner and Fischer; 1974. // // The Quartz implementation below performs all computations in a single macro// // step by following the data dependencies. The algorithm has a total work of // // O(M*N) micro steps and requires also O(M*N) space. Further references are // // [Dame64,Leve66,WaFi74], and see also the other Quartz implementations. // // ************************************************************************** // macro M = 4; // size of sequence a macro N = 5; // size of sequence b macro C = 8; // sequence elements have type nat{C} macro max(x,y) = (x<y ? y : x); macro min(x,y) = (x<y ? x : y); macro min3(x,y,z) = min(x,min(y,z)); module EditDistance([M]nat{C} ?a,[N]nat{C} ?b, nat{C} dist) { [M+1][N+1]nat{max(M,N)+1} d; // distances from a[0..i-1] to b[0..-1] (i.e., b is empty) for(i=0..M) d[i][0] = i; // distances from a[0..-1] to b[0..j-1] (i.e., a is empty) for(j=0..N) d[0][j] = j; // recursive definition of the Levenshtein distance for(i=1..M) for(j=1..N) d[i][j] = min3(d[i-1][j-1] + (a[i-1]==b[j-1]?0:1), d[i][j-1] + 1, d[i-1][j] + 1); // final result is in the lower right corner of d dist = d[M][N]; } drivenby d00 { // dist=3 since a[0]=2;a[1]=0;a[4]=5; a = [1,3,3,4]; b = [2,0,3,4,5]; assert(dist==3); } drivenby d01 { // dist=2 since a[0]=3;a[4]=5; a = [1,2,3,4]; b = [3,2,3,4,5]; assert(dist==2); } drivenby d02 { // dist=4 since a[0]=1;a[1]=5;a[2]=5;a[4]=5; a = [1,3,2,4]; b = [3,5,5,4,5]; assert(dist==4); } drivenby d03 { // dist=3 since a[0]=5;a[2]=6;a[4]=5; a = [1,3,5,4]; b = [5,3,6,4,5]; assert(dist==3); } drivenby d04 { // dist=3 since we insert at a[1]<-5 a = [5,3,3,4]; b = [5,5,3,3,4]; assert(dist==1); } drivenby d05 { // dist=3 since a[0]=4;a[4]=3; a = [5,2,3,3]; b = [4,2,3,3,3]; assert(dist==2); } drivenby d06 { // dist=3 since remove a[0]<-4 and add 1,6 a = [4,1,2,3]; b = [1,2,3,1,6]; assert(dist==3); } drivenby d07 { // dist=3 since a[1]=a[2]=a[3]=a[4]=1; a = [1,3,4,5]; b = [1,1,1,1,1]; assert(dist==4); } drivenby d08 { // dist=3 since a[0]=3;a[1]=2;a[2]=3;a[4]=6; a = [1,1,1,1]; b = [3,2,3,1,3]; assert(dist==4); } drivenby d09 { // dist=3 since all characters are different a = [3,3,3,3]; b = [0,2,2,5,6]; assert(dist==5); }