Let’s begin by constructing the data path to establish the flow of data. We’ll start with a simple instruction, analyze its requirements, and gradually build upon it to develop a more comprehensive understanding.
addi x5, zero, 6
addi x6, zero, 7
add x7, x5, x6
The hexadecimal format of the above instruction are stored in a memory file that will be read during program execution.
00600293 // addi x5 x0 6
00700313 // addi x6 x0 7
006283b3 // add x7 x5 x6
Instruction Fetch
Every instruction typically follows these two fundamental steps:
- Instruction Fetch – The Program Counter points to the instruction memory from where the instruction are fetched.
- Register Read – Based on the instruction fields, one or two registers are selected and read from the register file.
Figure 1: Instruction fetch
Program Counter
The Program Counter (PC) increments by 4 in every clock cycle, as instructions are typically word-aligned in memory. However, since the memory file instr.mem
stores instructions sequentially, the verilog design may increment the PC by 1 instead. It is important to understand that, in reality, the actual PC progression follows an increment of 4 to maintain proper instruction addressing.
module program_counter (/*AUTOARG*/
// Outputs
pc,
// Inputs
clk, rst
);
// Outputs
output [31:0] pc;
// Inputs
input clk;
input rst;
/*AUTOREG*/
// Beginning of automatic regs (for this module's undeclared outputs)
reg [31:0] pc;
// End of automatics
/*AUTOWIRE*/
always_ff@(posedge clk)
if(rst)
pc <= '0;
else
pc <= pc + 1;
endmodule // program_counter
Instruction Memory
The memory file, as shown above, is read by the instruction memory. Based on the PC value, the corresponding instruction is retrieved and sent to the register file for execution.
module instruction_mem (/*AUTOARG*/
// Outputs
instr,
// Inputs
pc
);
// Outputs
output [31:0] instr;
// Inputs
input [31:0] pc;
logic [31:0] instr_mem [31:0];
initial
$readmemh("instr.mem", instr_mem);
assign instr = instr_mem[pc];
endmodule // instruction_mem
Data Path
The top module serves as the central connection point for all data path modules in this design. As new instructions are added, the data path is progressively updated to accommodate them, ensuring seamless execution and expansion.
module data_path (/*AUTOARG*/
// Outputs
instr,
// Inputs
clk, rst
);
// Outputs
output [31:0] instr;
// Inputs
input clk;
input rst;
/*AUTOREG*/
/*AUTOWIRE*/
// Beginning of automatic wires (for undeclared instantiated-module outputs)
wire [31:0] pc; // From PC of program_counter.v
// End of automatics
program_counter PC (/*AUTOINST*/
// Outputs
.pc (pc[31:0]),
// Inputs
.clk (clk),
.rst (rst));
instruction_mem IM (/*AUTOINST*/
// Outputs
.instr (instr[31:0]),
// Inputs
.pc (pc[31:0]));
endmodule // data_path
This is first step in the design and the design files can be found [here].
Register File
The register file processes incoming instructions and distributes data to the appropriate registers. It has three 5-bit input addresses: two for source registers and one for the destination (write) register. Additionally, a write data input holds the value to be stored when the write signal is enabled.
Until now, no control signal was involved. Here, a new control signal, rw
, is introduced. When rw
is set high, the data from wd3
is stored at the register address specified by a3
, enabling controlled data writing.
Let’s begin constructing the microarchitecture by focusing on the first instruction in the program. This will serve as the foundation, and we will expand the design as we introduce more instructions.
addi x5, zero, 6
The instruction shown above is an I-type instruction, where an immediate value is added to a source register. In this case, the source register is x0 (zero), and the result is stored in the temporary register x5.
The instruction is fetched from the instruction memory and then passed to the register file, where the necessary registers are accessed for execution.
Figure 2: Decode
Once the instruction is retreived from the input a1
receives bits 19 to 15 from the 32-bit instruction. This is 5-bit data is register address. Similarly, a2
receives bits from 24 to 20 from the 32-bit instruction and a3
receives bits from 11 to 7.
Once the instruction is retrieved, the input a1
receives bits 19 to 15 from the 32-bit instruction. This 5-bit data represents the register address for the first source register rs1
. Similarly, a2
receives bits 24 to 20, while a3
gets bits 11 to 7, each corresponding to register rs2
and rd
respectively.
Since the first instruction is an immediate instruction the immediate value should be sign extended and produced to the ALU. Hence, another block is added, the instruction from 31 to 7 is sent to the sign extended block.
The first instruction is shown in binary with its respective fields.
imm | rs1 | funct3 | rd | opcode |
---|---|---|---|---|
000000000110 (6) | 00000(0) | 000 | 00101 (5) | 0010011 (19) |
The data in green is shown in Figure 2. We’ll get back to the control signal we3
once the addition operation is takes place. The next step is build the ALU. Based on the update microarhitecture the data path is also update.
The updated data path is as follows:
module data_path (/*AUTOARG*/
// Outputs
pc, a1, a2, a3,
// Inputs
clk, rst, instr
);
// Outputs
output [31:0] pc;
output [4:0] a1;
output [4:0] a2;
output [4:0] a3;
// Inputs
input clk;
input rst;
input [31:0] instr;
/*AUTOREG*/
/*AUTOWIRE*/
program_counter PC (/*AUTOINST*/
// Outputs
.pc (pc[31:0]),
// Inputs
.clk (clk),
.rst (rst));
logic [31:0] sign_ext;
assign a1 = instr[19:15];
assign a2 = instr[24:20];
assign a3 = instr[11:7];
assign sign_ext = {{20{instr[31]}}, instr[31:20]};
endmodule // data_path
The new block the register file is shown below:
module reg_file (/*AUTOARG*/
// Outputs
rd1, rd2,
// Inputs
clk, a1, a2, a3, wd3, we3
);
// Outputs
output [31:0] rd1;
output [31:0] rd2;
// Inputs
input clk;
input [4:0] a1; // rs1
input [4:0] a2; // rs2
input [4:0] a3; // rd
input [31:0] wd3; // write data from memory
input we3; // write enable
/*AUTOREG*/
/*AUTOWIRE*/
logic [31:0] regs[31:0]; // 32 registers
always_ff@(posedge clk)
if(we3)
regs[a3] <= wd3;
assign rd1 = (a1 == 0) ? 32'h0 : regs[a1];
assign rd2 = (a2 == 0) ? 32'h0 : regs[a2];
endmodule // reg_file
The source files for the second stage can be found [here].
Arithmetic Logic Unit (ALU)
The Arithmetic Logic Unit (ALU) handles all arithmetic and logical operations, such as addition, subtraction, shifting, and logical operations. Initially, the basic operations will be implemented, and as we progress, additional instructions will be incorporated to expand the ALU’s functionality. The signals in blue are control signals
Figure 3: ALU
A mux is added in between ALU and Register File that helps in chosing data between the imm
value or the from the register. The control signal for the mux is ALUSrc
, when set high the data from the immediate value imm
is chosen as the source for the ALU operation and when set to zero the data from the register is chosen as source for the ALU operation.
A mux is added between the ALU and the Register File to facilitate data selection between the immediate value imm
and the data from the register. The control signal for the mux is ALUSrc
. When control signal ALUSrc
is set high, the immediate value imm
is chosen as the source for the ALU operation. When the control signal ALUSrc
is set low, the data from the register is selected as the source for the ALU operation.
Another mux is placed after the data memory to choose between the data from the ALU result or the data from data memory. For the first instruction, the result of the ALU operation is written to the destination register, rd
, which is x5
. The control signal for this mux is MemReg
. When MemReg
is set high, the data from the data memory is loaded into the destination register. When set low, the data from the ALU result is directly written to the destination register.
module alu (/*AUTOARG*/
// Outputs
res,
// Inputs
src_a, src_b, alu_op
);
// Outputs
output [31:0] res;
// Inputs
input [31:0] src_a;
input [31:0] src_b;
input [3:0] alu_op;
parameter
add = 4'd0,
sub = 4'd1,
sll = 4'd2,
slt = 4'd3,
sltu = 4'd4,
xorp = 4'd5,
srl = 4'd6,
sra = 4'd7,
orp = 4'd8,
andp = 4'd9;
/*AUTOREG*/
// Beginning of automatic regs (for this module's undeclared outputs)
reg [31:0] res;
// End of automatics
/*AUTOWIRE*/
logic signed [31:0] src_a_sign;
logic signed [31:0] src_b_sign;
assign src_a_sign = src_a;
assign src_b_sign = src_b;
always@(/*AUTOSENSE*/alu_op or src_a or src_b) begin
casez(alu_op)
add : res = src_a + src_b;
sub : res = src_a - src_b;
sll : res = src_a << src_b;
slt : res = src_a_sign < src_b_sign ? 32'h00000001 : 32'h00000000;
sltu: res = src_a < src_b ? 32'h00000001 : 32'h00000000;
xorp: res = src_a ^ src_b;
srl : res = src_a >> src_b;
sra : res = src_a_sign >>> src_b_sign;
orp : res = src_a | src_b;
andp: res = src_a & src_b;
endcase // casez (alu_op)
end // always_comb
endmodule // alu
The control unit is generates the signal for control the data flow. The control unit is shown below which will be updated:
module control_unit (/*AUTOARG*/
// Outputs
we3, alu_src, alu_op, mem_rw, mem_reg, imm_src,
// Inputs
funct7, funct3, opcode
);
// Outputs
output we3;
output alu_src;
output [4:0] alu_op;
output mem_rw;
output mem_reg;
output [1:0] imm_src;
// Inputs
input [6:0] funct7;
input [2:0] funct3;
input [6:0] opcode;
parameter
add = 4'd0,
sub = 4'd1,
sll = 4'd2,
slt = 4'd3,
sltu = 4'd4,
xorp = 4'd5,
srl = 4'd6,
sra = 4'd7,
orp = 4'd8,
andp = 4'd9;
/*AUTOREG*/
// Beginning of automatic regs (for this module's undeclared outputs)
reg [4:0] alu_op;
reg alu_src;
reg [1:0] imm_src;
reg mem_reg;
reg mem_rw;
reg we3;
// End of automatics
/*AUTOWIRE*/
always@(/*AUTOSENSE*/funct3 or opcode) begin
casez(opcode)
7'd51: {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b0, 1'b0, 1'b0, 2'bxx}; // R-type
7'd19: begin
if(funct3 == 1 || funct3 == 5)
{we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b1, 1'b0, 1'b0, 2'b01}; // I-type
else
{we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b1, 1'b0, 1'b0, 2'b00}; // I-type
end
default : {we3, alu_src, mem_rw, mem_reg} = 4'hx;
endcase // case (opcode)
end
always@(/*AUTOSENSE*/funct3 or funct7) begin
casez(funct3)
3'd0 : alu_op = funct7[5] ? sub : add;
3'd1 : alu_op = sll;
3'd2 : alu_op = slt;
3'd3 : alu_op = sltu;
3'd4 : alu_op = xorp;
3'd5 : alu_op = funct7[5] ? sra : srl;
3'd6 : alu_op = orp;
3'd7 : alu_op = andp;
default: alu_op = 4'dx;
endcase // casez (funct3)
end // always_comb
endmodule // control_unit
The design performs arithmetic and logical operations. There are three I-type instructions that has to be carefully analyzed and there slli, srli and srai.
funct7 | imm | rs1 | funct3 | rd | opcode | instr |
---|---|---|---|---|---|---|
0000000 | (5-bits) | (5-bits) | 001 | (5-bits) | 0010011 | slli |
0000000 | (5-bits) | (5-bits) | 101 | (5-bits) | 0010011 | srli |
0100000 | (5-bits) | (5-bits) | 101 | (5-bits) | 0010011 | srai |
The if-statement inside the case statement of the main decoder handles this condition. The control signal imm_src
determines which immediate encoding to use.
For shift immediate instructions, the standard 12-bit immediate is not selected. Instead:
- imm[11:5] is treated as the
funct7
value. - imm[4:0] is sign-extended to form the immediate value for the operation.
This ensures correct handling of shift instructions while maintaining flexibility for different immediate encodings.
Figure 4: Includes Immediate
Lets look at the instruction:
srai x21, x7, 2
It is a shift right arithmetic imm. The instruction in hexadecimal with their respective fields is shown below:
funct7 | imm | rs1 | funct3 | rd | opcode |
---|---|---|---|---|---|
0100000 | 00010 | 00111 | 101 | 10101 | 0010011 |
Up untill now the design takes cares for all the arithmetic and logical operation. The next set of instruction is load and store. The code can be found [here] step3.
Load and Store Instruction
Let’s implement store instruction in the data memory and load the data from the data memory.
Store
Lets understand how the store instruction works/
sw x21, 0(x0) => sw rs2, imm(rs1)
The above instruction can be understood as sw src, off(dst) => Address[dst + off] = src.
For the above instruction the hexadecimal value is 0x01502023. The rs2 = 21 and rs1 = 0. Writing data into a registers in verilog begins from zero. For simplity one store word is implemented and offset can be incremented by 1 instread of 4.
imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
---|---|---|---|---|---|
0000000 | 10101 | 00000 | 010 | 00000 | 0100011 |
Lets create a data memory.
module data_mem (/*AUTOARG*/
// Outputs
rd,
// Inputs
clk, addr, wd, mem_rw
);
// Outputs
output [31:0] rd;
// Inputs
input clk;
input [31:0] addr; // address location
input [31:0] wd; // write data
input mem_rw; // write when set to 1
/*AUTOREG*/
/*AUTOWIRE*/
logic [31:0] dmem[31:0];
always_ff@(posedge clk)
if(mem_rw)
dmem[addr] <= wd;
assign rd = dmem[addr];
endmodule // data_mem
The updated control unit is illustrated below:
module control_unit (/*AUTOARG*/
// Outputs
we3, alu_src, alu_op, mem_rw, mem_reg, imm_src,
// Inputs
funct7, funct3, opcode
);
// Outputs
output we3;
output alu_src;
output [4:0] alu_op;
output mem_rw;
output mem_reg;
output [1:0] imm_src;
// Inputs
input [6:0] funct7;
input [2:0] funct3;
input [6:0] opcode;
parameter
add = 4'd0,
sub = 4'd1,
sll = 4'd2,
slt = 4'd3,
sltu = 4'd4,
xorp = 4'd5,
srl = 4'd6,
sra = 4'd7,
orp = 4'd8,
andp = 4'd9;
/*AUTOREG*/
// Beginning of automatic regs (for this module's undeclared outputs)
reg [4:0] alu_op;
reg alu_src;
reg [1:0] imm_src;
reg mem_reg;
reg mem_rw;
reg we3;
// End of automatics
/*AUTOWIRE*/
always@(/*AUTOSENSE*/funct3 or opcode) begin
casez(opcode)
7'd51: {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b0, 1'b0, 1'b0, 2'bxx}; // R-type
7'd19: begin
if(funct3 == 1 || funct3 == 5)
{we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b1, 1'b0, 1'b0, 2'b01}; // I-type
else
{we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b1, 1'b0, 1'b0, 2'b00}; // I-type
end
7'd35: {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'bx, 1'b0, 1'b1, 1'bx, 2'b10}; // S-type
default : {we3, alu_src, mem_rw, mem_reg} = 4'hx;
endcase // case (opcode)
end
always@(/*AUTOSENSE*/funct3 or funct7) begin
casez(funct3)
3'd0 : alu_op = funct7[5] ? sub : add;
3'd1 : alu_op = sll;
3'd2 : alu_op = slt;
3'd3 : alu_op = sltu;
3'd4 : alu_op = xorp;
3'd5 : alu_op = funct7[5] ? sra : srl;
3'd6 : alu_op = orp;
3'd7 : alu_op = andp;
default: alu_op = 4'dx;
endcase // casez (funct3)
end // always_comb
endmodule // control_unit
Load
Now lets load the data from location stored in data memeory to register x24.
lw x24, 0(x0) => lw rd, imm(rs1)
The data from the address location [imm + rs1] is load to register x24.
imm | rs1 | funct3 | rd | opcode |
---|---|---|---|---|
00000000000 | 00000 | 010 | 11000 | 0000011 |
The decode instruction is added to the control unit for the main decoder. The decode instruction is shown below.
7'd03: {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'bx, 1'b1, 1'b0, 1'b0, 2'b00}; // I- type load
Till now the R-type, I-type and S-type instruction have been implemented and the code can be found [here] step4.
Branch Instructions
imm[12,10:5] | rs2 | rs1 | funct3 | imm[4:1,11] | opcode |
---|---|---|---|---|---|
0000000 | 00110 | 00101 | 000 | 01000 | 1100011 |