// ************************************************************************** // // // // eses eses // // eses eses // // eses eseses esesese eses Embedded Systems Group // // ese ese ese ese ese // // ese eseseses eseseses ese Department of Computer Science // // eses eses ese eses // // eses eseses eseseses eses University of Kaiserslautern // // eses eses // // // // ************************************************************************** // // The module below implements a one-dimensional systolic array for computing // // the edit distance of a list of pairs of sequences a[0..M-1] and b[0..N-1]. // // Similar to module EditDistance1D, the array elements compute the values of // // the diagonals of the corresponding distance matrix d (where d[i][j] is the // // distance between a[0..i-1] and b[0..j-1]). // // Since the computation of the values of a diagonal requires the values of // // the previous two diagonals, we have to forward the values of each diagonal // // to its next two successor diagonals. For this reason, the array elements // // alternate with two behaviors: either the previously computed value d[i][j] // // is simply stored and the incoming values of the sequence elements are piped// // through or these values are used to compute a new diagonal element. // // Sequence a is piped through from right to left while sequence b is piped // // through from left to right, such that only every other cycle can be used to// // pipe in an element of the sequences. According to Lipton & Lopresti (1985),// // we interleave the sequence elements with their indices to have the left // // column and the upper row of the distance matrix at hand to compute the left// // and right borders of the diagonals. Thus, the following is a typical state:// // // // 3 a[2] 2 a[1] 1 a[0] 0 -> // // <- 0 b[2] 1 b[1] 2 b[0] 3 ... // // // // We use an additional boolean stream m so that m[i]=true holds whenever the // // corresponding values of a[i] and b[i] are sequence elements instead of // // indices. Since m[i]=!m[i+1] must therefore hold, and we wish to pipe in the// // streams synchronously, the length of the array must be an odd number. // // The position of the sequence elements at even and odd points of time are // // therefore as follows: // // // // t=2i: // // i a[i-1] i-1 ... a[0] 0 ... 0 b[0] ... b[i-1] i // // [] [] [] ... [] [] ... [] [] ... [] [] // // 0 1 2 ... 2i-1 2i L-2i-1 L-2i L-2 L-1 // // // // t=2i+1: // // a[i] i ... a[0] 0 ... 0 b[0] ... i b[i] // // [] [] ... [] [] ... [] [] ... [] [] // // 0 1 ... 2i 2i+1 ... L-2i-2 L-2i-1 L-2 L-1 // // // // Using a length L, we can determine the point of time t0 where the first // // elements a[0] and b[0] meet in the array: // // // // t0=2i --> 2i-1 = L-2i, i.e., 4i = L+1, i.e. L+1 = 2*t0 // // t0=2i+1 --> 2i = L-2i-1, i.e., 4i = L-1, i.e. L+1 = 2*t0 // // // // Thus, we have t0=(L+1)/2 and a[0] and b[0] meet at this point of time at // // the array element with index t0-1. Elements a[0..i] meet elements b[i..0] // // at time t0+i at indices t0-i-1,…,t0+i-1. // // To determine the minimal length L of the array, we have to make sure that// // L is large enough so that each a[i] meets every element of b[j] and vice // // versa. We moreover demand that L must be an odd number so that all array // // elements that received sequence elements also have m[i]=true since we want // // to pipe in the sequence elements synchronously. Since we must also ensure // // that there are left and right neighbors for the computations, a[0] must // // meet b[N-1] at latest at L-2 at time t=2i, so that 2i-1=L-2 and i-1=N-1, // // thus, i=N and L=2N+1 holds. Analogously, b[0] must meet a[M-1] at latest at// // array element 1 at time t=2i, so that M-1=i-1 and 1=L-2i, i.e., i=M and // // L=2M+1 follows. Thus, the minimal length of the array is L:=2max(M,N)+1, // // and therefore a[0] and b[0] meet at t0:=max(M,N)+1 in i0:=max(M,N). // // At time t0, the upper left corner element d[1][1] of the diagonal matrix // // is computed, but assigned with a delayed assignment. Hence, the values of // // the first diagonal are available at time t0+1, and in general, diagonal i // // is available at time t0+i. Since there are M+N-1 diagonals, the final value// // is obtained at time t0+M+N-1 = max(M,N)+M+N = 2*max(M,N)+min(M,N). // // To summarize, the following facts of a computation whose first elements // // entered the array at time t_off (so that both occurrences entered at time // // t_off+1) can be easily derived with MAX=max(M,N): // // (1) The elements a[0] and b[0] match at time MAX+1+t_off in i0:=MAX // // (2) The final index MAX enters at time 2*MAX+t_off. // // (2) Thus, the first elements of the next computation could have already // // been entered at 2*MAX+1+t_off. However, at that point of time the // // modus m[0] will be false, so that we have to wait one step further. // // Thus, computations are repeated with period 2*MAX+2. // // (3) The values of diagonal i=1,…,M+N-1 are available in array x at the // // times MAX+i+1+t_off. // // (4) The final edit distance is available at time MAX+M+N+t_off // // at index p:=MAX+N-M (see argumentation below). // // (5) The elements a[0] and b[0] of the next computation match at time // // 3*MAX+3+t_off due to (1). Thus, there is a gap where no diagonal // // elements are in x of length 3*MAX+3+t_off-(MAX+MAX+MIN+t_off), i.e. // // MAX-MIN+3. // // // // Finally, we have to determine the index p of the array element where the // // result can be finally obtained. To this end, note that at any point of time// // t=2i, the sequence element a[i-k] is in array element 2k-1, so that at time// // t=MAX+M+N-1 (when the final match takes place), we have i-k=M-1, p=2k-1, // // and t=2i=MAX+M+N-1. Thus, p=2i+1-2M=MAX+M+N-1+1-2M=MAX+N-M. At any point of// // time t=2i+1, the sequence element a[i-k] is in array element 2k, so that at// // time t=MAX+M+N-1, we have i-k=M-1, p=2k, and t=2i+1=MAX+M+N-1. Hence, we // // have p=2k=2i-2M+2=2i+1-2M+1=MAX+M+N-1-2M+1=MAX+N-M. Hence, in all cases, // // the index where the final result can be found is p:=MAX+N-M. // // // // The picture below shows one computation of the edit distance for M=4 and // // N=5: // // // // t==0: [ ] [ ] [ ] [ ] [ ] [d00] [ ] [ ] [ ] [ ] [ ] // // t==1: [ ] [ ] [ ] [ ] [d10] [e00] [d01] [ ] [ ] [ ] [ ] // // t==2: [ ] [ ] [ ] [d20] [e10] [d11] [e01] [d02] [ ] [ ] [ ] // // t==3: [ ] [ ] [d30] [e20] [d21] [e11] [d12] [e02] [d03] [ ] [ ] // // t==4: [ ] [ ] [ ] [d31] [e21] [d22] [e12] [d13] [e03] [d04] [ ] // // t==5: [ ] [ ] [ ] [ ] [d32] [e22] [d23] [e13] [d14] [ ] [ ] // // t==6: [ ] [ ] [ ] [ ] [ ] [d33] [e23] [d24] [ ] [ ] [ ] // // t==7: [ ] [ ] [ ] [ ] [ ] [ ] [d34] [ ] [ ] [ ] [ ] // // t==8: [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] // // // // For example, computing the edit distance of a = 1334 and b = 20345 followed// // by computing the edit distance of a = 1234 and b = 32345 starting at time // // t_off=2 yields the following computations of the diagonals (the distance // // matrices are shown on the right: // // step 8: match of a[0] and b[0] // // step 9: x : - - - - - 1 - - - - - 0 | 1 2 3 4 5 // // step 10: x : - - - - 2 1 2 - - - - --+---------- // // step 11: x : - - - 3 2 2 2 3 - - - 1 | 1 2 3 4 5 // // step 12: x : - - 4 3 3 2 2 3 4 - - 2 | 2 2 2 3 4 // // step 13: x : - - - 4 3 2 2 3 4 5 - 3 | 3 3 2 3 4 // // step 14: x : - - - - 3 2 3 3 4 - - 4 | 4 4 3 2 3 // // step 15: x : - - - - - 2 3 4 - - - // // step 16: x : - - - - - - 3 - - - - // // step 17: off-time // // step 18: off-time // // step 19: off-time // // step 20: off-time and match of a[0] and b[0] // // step 21: x : - - - - - 1 - - - - - 0 | 1 2 3 4 5 // // step 22: x : - - - - 2 1 2 - - - - --+---------- // // step 23: x : - - - 2 2 1 2 3 - - - 1 | 1 2 3 4 5 // // step 24: x : - - 3 2 2 1 2 3 4 - - 2 | 2 1 2 3 4 // // step 25: x : - - - 3 2 1 2 3 4 5 - 3 | 2 2 1 2 3 // // step 26: x : - - - - 2 1 2 3 4 - - 4 | 3 3 2 1 2 // // step 27: x : - - - - - 1 2 3 - - - // // step 28: x : - - - - - - 2 - - - - // // // // ************************************************************************** // macro M = 4; // size of first sequence macro N = 5; // size of second sequence macro C = 8; // sequence elements have type nat{C} macro P = M+N+1; // number of pairs of input sequences macro max(x,y) = (x<y ? y : x); macro min(x,y) = (x<y ? x : y); macro min3(x,y,z) = min(x,min(y,z)); macro MIN = min(M,N); macro MAX = max(M,N); macro L = 2*MAX+1; module EditDistance1DLiLo(nat{C} ?s,?t,!dist,event valid) { [L]nat{C} a,b; // current element of the sequence [L]nat x; // current value of the edit distance [L]bool m; // modus, i.e., whether eij or dij is stored in x nat steps; // counting the steps to determine end of computation for(i=0..L-1) m[i] = (i%2)==0; loop { pause; for(i=0..L-1) { let(diff = (a[i]==b[i] ? 0 : 1)) let(a_left = (i==0 ? MAX : a[i-1])) let(b_left = (i==0 ? MAX : b[i-1])) let(x_left = (i==0 ? MAX : x[i-1])) let(a_right = (i==L-1 ? MAX : a[i+1])) let(b_right = (i==L-1 ? MAX : b[i+1])) let(x_right = (i==L-1 ? MAX : x[i+1])) let(d_diag = (a_right==0 ? b_left : (b_left==0 ? a_right : x[i]))) let(d_left = (b_left==0 ? a_left : x_left )) let(d_uppr = (a_right==0 ? b_right : x_right)) { next(x[i]) = (!m[i] ? x[i] : min3(d_left+1,d_diag+diff,d_uppr+1)); next(m[i]) = not m[i]; next(a[i]) = (i==0 ? s : a_left); next(b[i]) = (i==L-1 ? t : b_right); } } // count the computation steps to determine whether result is valid // (steps corresponds to time with offset 1) if(steps==2*MAX+MIN+1) emit(valid); next(steps) = (valid ? MIN : steps+1); // read the edit distance (only reasonable if valid holds) dist = x[(MAX+N)-M]; } } drivenby { [M+N+1][M]nat{C} s_seq; [M+N+1][N]nat{C} t_seq; // input sequence of sequences for s s_seq = [ [1,3,3,4], [1,2,3,4], [1,3,2,4], [1,3,5,4], [5,3,3,4], [5,2,3,3], [4,1,2,3], [1,3,4,5], [1,1,1,1], [3,3,3,3] ]; // input sequence of sequences for t t_seq = [ [2,0,3,4,5], [3,2,3,4,5], [3,5,5,4,5], [5,3,6,4,5], [5,5,3,3,4], [4,2,3,3,3], [1,2,3,1,6], [1,1,1,1,1], [3,2,3,1,3], [0,2,2,5,6] ]; pause; // pipe in the input sequences and interleave the elements with the indices for(i=0..P-1) { // for all input pairs s = 0; t = 0; pause; for(j=0..MAX-1) { s = (j<M ? s_seq[i][j] : 0); t = (j<N ? t_seq[i][j] : 0); pause; s = j+1; t = j+1; pause; } // additional off-time to send values only when m[0]=true pause; } // wait until sequences were piped through for(i=1..MIN) pause; }