MoveCode Module
This module implements functions for move code programs executed by BED (Buffered Exposed Datapath) processor architectures as the SCAD processor. In addition to the data types for move instruction programs, the module also provides an instruction set simulator to define the semantics of the move code instructions of the SCAD machine (for more details, see the SCAD reference sheet of SCAD).
Types
Type | Description |
This type defines move code instructions with methods for encoding, decoding, string conversion, and parsing from strings and byte arrays. For the encoding into machine words (byte arrays), see the addresses of PUs and buffers in the SCAD reference sheet. Hence, the general purpose PUs have indices \(2,\ldots,\mathsf{numPU}+1\), and their input and output buffers have addresses \(4,\ldots,2\cdot\mathsf{numPU}+3\). If these numbers are encoded as radix-2 numbers, we need \(\left\lceil{\log_2(2\cdot\mathsf{numPU}+4)}\right\rceil\) bits for the source address and the same number of bits for the target address. Together with the two bits to determine one of the four kinds of move instructions, we finally obtain instructions |mv|srcAdr|tgtAdr| where mv has two bits, and each one of srcAdr and tgtAdr has \(\left\lceil{\log_2(2\cdot\mathsf{numPU}+4)}\right\rceil\) bits. Hence, we get
|
|
type for operation codes with methods for parsing, printing and encoding |
|
a processor state contains all values that may change in an execution; buffers are thereby array where NONE denotes no value, and where the head value is at index 0, so that the arrays end with a suffix of values NONE and have a prefix of SOME-values. |
|
PUs have a unique indices; index 0 is reserved for the CU, index 1 is reserved for the LSU; and general purpose PUs have indices 2,...,numPU+1 |
|
sources of move code instructions |
|
targets of move code instructions |
Functions and values
Function or value | Description |
|
|
|
This function generates a move code program to compute the factorial of a given number n. It makes use of three PUs and runs in linear time.
|
|
This function generates a move code program to compute the sum of the first n natural numbers with puNum PUs. The generated move code program has 3*n-3 instructions and can be run with log(n)+1 many steps with n/2 many PUs. However, it requires a large issue width of about 4 times the number of PUs. A buffer size of 2 is however sufficient for optimal time.
|
|
OddEven Transposition Sorting is a parallel sorting algorithm that requires O(n^2) work with a depth of O(n). The compare-and-swap operation is implemented by comparing x1 with x2, and conditional assignments to variables y1 = b1 and x1 or !b1 or x2 and y2 = !b1 and x1 or b1 or x2. Without leveling, the program results in a dataflow graph which cannot be scheduled on a single PU; however, two PUs are sufficient to run the program. In addition to this PU, we need another PU that provides the input pair of values for this operation, and will afterwards pick the result to insert in the sequence.
|
|
This function generates a move code program for computing the parallel prefix sum of the first n numbers. The function uses only a single PU and processes the levelized dataflow graph level by level from left to right. The values are thereby kept in the left input buffer and are either copied back to this buffer, duplicated for a following addition, or added to a valued put in inR after a previous duplication. The algorithm implements the Brent-Kung parallel prefix tree which is work and time optimal.
|
|
Kogge-Stone parallel prefix computation is not work-efficient, but almost twice as fast as the Brent-Kung parallel prefix computation. On SCAD, the additional work does not matter since the Brent-Kung implementation may require additional COPY operations as well. The implementation makes use of two PUs, since we need to duplicate the result of additions while having the values of the current level in the inL buffer of PU[0].
|
|
This function generates a move code program to compute the sum of the first n natural numbers with puNum PUs. The generated move code program has 3*n-3 instructions and can be run with log(n)+1 many steps with n/2 many PUs. However, it requires a large issue width of about the number of PUs. A buffer size of 2 is however sufficient for optimal time.
|
|
|
|
|
|
|
|
This function computes the number of PUs used by the program and then it computes the minimal issue width and buffer size to achieve the minimal runtime of the program. Note that the minimal runtime can be achieved if the processor may issue all instructions in the first step.
|
|
|
|
|
|
|
|
write a byte array to a binary file
|
|
Print the threads per PU of a move code program
|
Full Usage:
SimTraceToHtml ostr trace
Parameters:
TextWriter
trace : (int array * int * ProcessorState<int> * ProcessorState<int> * ProcessorState<int>) list
|
This function prints a SCAD machine simulation trace as generated by ScadExec in html output format.
|
Full Usage:
SimulateMoveCode maxIssue maxSteps mvProg
Parameters:
int
maxSteps : int option
mvProg : MoveCode list
Returns: (int array * int * ProcessorState<int> * ProcessorState<int> * ProcessorState<int>) list
|
This function is used to generate a simulation trace for a given move code program. Make sure that the mutable values numPU, bufSize, and memSize of this module are initialized accordingly; you may use function NumberOfPUs to determine the number of PUs used by the given program. Parameter maxIssue determines the maximal number of move instructions issued by the CU in one step (it may issue less if the buffers are full). Parameter maxSteps provides an optional upper bound of steps for the simulation; if it is None, the program runs up to termination. The result is a list of triples (numFirings,numTransfers,psState) where numFirings holds the numbers of firings of the PUs up to this point, numTransfers is the number of data transfers up to this point, and psState is the processor state at that point.
|
|
Upscaling a Move Code Program to a given number of PUs. Note that any PU allocation remains valid by any partition refinement, i.e., nodes that are mapped to different PUs before must also be mapped to different PUs afterwards, but nodes that are currently mapped to the same PU may be moved to new PUs without producing conflicts unless the new PU is not used for nodes currently mapped to other PUs as well.
|
|
write a byte array to a binary file
|
Full Usage:
WriteMoveProgram moveProg filename
Parameters:
MoveCode seq
filename : string
|
write move program to a text file
|
|
write a byte array to a text file
|
Full Usage:
bufSize
Returns: int
|
size of the buffers
|
|
|
Full Usage:
memSize
Returns: int
|
size of the data memory
|
Full Usage:
numPU
Returns: int
|
The value numPU in this module must be initialized with the number of general purpose PUs (without CU and LSU) to ensure a correct instruction encoding.
|