Let’s begin by constructing the data path to establish the flow of data. We’ll start with a simple instruction, analyze its requirements, and gradually build upon it to develop a more comprehensive understanding.

addi x5, zero, 6
addi x6, zero, 7
add  x7, x5, x6

The hexadecimal format of the above instruction are stored in a memory file that will be read during program execution.

00600293        // addi x5 x0 6
00700313        // addi x6 x0 7
006283b3        // add x7 x5 x6

Instruction Fetch

Every instruction typically follows these two fundamental steps:

  1. Instruction Fetch – The Program Counter points to the instruction memory from where the instruction are fetched.
  2. Register Read – Based on the instruction fields, one or two registers are selected and read from the register file.
Figure 1: Instruction fetch

Figure 1: Instruction fetch

Program Counter

The Program Counter (PC) increments by 4 in every clock cycle, as instructions are typically word-aligned in memory. However, since the memory file instr.mem stores instructions sequentially, the verilog design may increment the PC by 1 instead. It is important to understand that, in reality, the actual PC progression follows an increment of 4 to maintain proper instruction addressing.

module program_counter (/*AUTOARG*/
   // Outputs
   pc,
   // Inputs
   clk, rst
   );
   // Outputs
   output [31:0] pc;
   // Inputs
   input	 clk;
   input	 rst;

   /*AUTOREG*/
   // Beginning of automatic regs (for this module's undeclared outputs)
   reg [31:0]		pc;
   // End of automatics
   /*AUTOWIRE*/

   always_ff@(posedge clk)
     if(rst)
       pc <= '0;
     else
       pc <= pc + 1;

endmodule // program_counter

Instruction Memory

The memory file, as shown above, is read by the instruction memory. Based on the PC value, the corresponding instruction is retrieved and sent to the register file for execution.

module instruction_mem (/*AUTOARG*/
   // Outputs
   instr,
   // Inputs
   pc
   );
   // Outputs
   output [31:0] instr;
   // Inputs
   input [31:0]	 pc;

   logic [31:0]	 instr_mem [31:0];

   initial
     $readmemh("instr.mem", instr_mem);

   assign instr = instr_mem[pc];

endmodule // instruction_mem

Data Path

The top module serves as the central connection point for all data path modules in this design. As new instructions are added, the data path is progressively updated to accommodate them, ensuring seamless execution and expansion.

module data_path (/*AUTOARG*/
   // Outputs
   instr,
   // Inputs
   clk, rst
   );
   // Outputs
   output [31:0] instr;
   // Inputs
   input	 clk;
   input	 rst;

   /*AUTOREG*/
   /*AUTOWIRE*/
   // Beginning of automatic wires (for undeclared instantiated-module outputs)
   wire [31:0]		pc;			// From PC of program_counter.v
   // End of automatics

   program_counter PC (/*AUTOINST*/
                       // Outputs
                       .pc		(pc[31:0]),
                       // Inputs
                       .clk		(clk),
                       .rst		(rst));

   instruction_mem IM (/*AUTOINST*/
                       // Outputs
                       .instr		(instr[31:0]),
                       // Inputs
                       .pc		(pc[31:0]));


endmodule // data_path

This is first step in the design and the design files can be found [here].

Register File

The register file processes incoming instructions and distributes data to the appropriate registers. It has three 5-bit input addresses: two for source registers and one for the destination (write) register. Additionally, a write data input holds the value to be stored when the write signal is enabled.

Until now, no control signal was involved. Here, a new control signal, rw, is introduced. When rw is set high, the data from wd3 is stored at the register address specified by a3, enabling controlled data writing.

Let’s begin constructing the microarchitecture by focusing on the first instruction in the program. This will serve as the foundation, and we will expand the design as we introduce more instructions.

addi x5, zero, 6

The instruction shown above is an I-type instruction, where an immediate value is added to a source register. In this case, the source register is x0 (zero), and the result is stored in the temporary register x5.

The instruction is fetched from the instruction memory and then passed to the register file, where the necessary registers are accessed for execution.

Figure 2: Decode

Figure 2: Decode

Once the instruction is retreived from the input a1 receives bits 19 to 15 from the 32-bit instruction. This is 5-bit data is register address. Similarly, a2 receives bits from 24 to 20 from the 32-bit instruction and a3 receives bits from 11 to 7.

Once the instruction is retrieved, the input a1 receives bits 19 to 15 from the 32-bit instruction. This 5-bit data represents the register address for the first source register rs1. Similarly, a2 receives bits 24 to 20, while a3 gets bits 11 to 7, each corresponding to register rs2 and rd respectively.

Since the first instruction is an immediate instruction the immediate value should be sign extended and produced to the ALU. Hence, another block is added, the instruction from 31 to 7 is sent to the sign extended block.

The first instruction is shown in binary with its respective fields.

immrs1funct3rdopcode
000000000110 (6)00000(0)00000101 (5)0010011 (19)

The data in green is shown in Figure 2. We’ll get back to the control signal we3 once the addition operation is takes place. The next step is build the ALU. Based on the update microarhitecture the data path is also update.

The updated data path is as follows:

module data_path (/*AUTOARG*/
   // Outputs
   pc, a1, a2, a3,
   // Inputs
   clk, rst, instr
   );
   // Outputs
   output [31:0] pc;
   output [4:0] a1;
   output [4:0]	a2;
   output [4:0]	a3;
   // Inputs
   input	 clk;
   input	 rst;
   input [31:0]	 instr;


   /*AUTOREG*/
   /*AUTOWIRE*/

   program_counter PC (/*AUTOINST*/
                       // Outputs
                       .pc		(pc[31:0]),
                       // Inputs
                       .clk		(clk),
                       .rst		(rst));


   logic [31:0]		sign_ext;

   assign a1 = instr[19:15];
   assign a2 = instr[24:20];
   assign a3 = instr[11:7];

   assign sign_ext = {{20{instr[31]}}, instr[31:20]};

endmodule // data_path

The new block the register file is shown below:

module reg_file (/*AUTOARG*/
   // Outputs
   rd1, rd2,
   // Inputs
   clk, a1, a2, a3, wd3, we3
   );
   // Outputs
   output [31:0] rd1;
   output [31:0] rd2;
   // Inputs
   input	 clk;
   input [4:0]	 a1;   // rs1
   input [4:0]	 a2;   // rs2
   input [4:0]	 a3;   // rd
   input [31:0]	 wd3;  // write data from memory
   input	 we3;  // write enable

   /*AUTOREG*/
   /*AUTOWIRE*/

   logic [31:0]	 regs[31:0]; // 32 registers

   always_ff@(posedge clk)
     if(we3)
       regs[a3] <= wd3;

   assign rd1 = (a1 == 0) ? 32'h0 : regs[a1];
   assign rd2 = (a2 == 0) ? 32'h0 : regs[a2];

endmodule // reg_file

The source files for the second stage can be found [here].

Arithmetic Logic Unit (ALU)

The Arithmetic Logic Unit (ALU) handles all arithmetic and logical operations, such as addition, subtraction, shifting, and logical operations. Initially, the basic operations will be implemented, and as we progress, additional instructions will be incorporated to expand the ALU’s functionality. The signals in blue are control signals

Figure 3: ALU

Figure 3: ALU

A mux is added in between ALU and Register File that helps in chosing data between the imm value or the from the register. The control signal for the mux is ALUSrc, when set high the data from the immediate value imm is chosen as the source for the ALU operation and when set to zero the data from the register is chosen as source for the ALU operation.

A mux is added between the ALU and the Register File to facilitate data selection between the immediate value imm and the data from the register. The control signal for the mux is ALUSrc. When control signal ALUSrc is set high, the immediate value imm is chosen as the source for the ALU operation. When the control signal ALUSrc is set low, the data from the register is selected as the source for the ALU operation.

Another mux is placed after the data memory to choose between the data from the ALU result or the data from data memory. For the first instruction, the result of the ALU operation is written to the destination register, rd, which is x5. The control signal for this mux is MemReg. When MemReg is set high, the data from the data memory is loaded into the destination register. When set low, the data from the ALU result is directly written to the destination register.

module alu (/*AUTOARG*/
   // Outputs
   res,
   // Inputs
   src_a, src_b, alu_op
   );
   // Outputs
   output [31:0] res;
   // Inputs
   input [31:0]	 src_a;
   input [31:0]	 src_b;
   input [3:0]	 alu_op;

   parameter
            add  = 4'd0,
            sub  = 4'd1,
            sll  = 4'd2,
            slt  = 4'd3,
            sltu = 4'd4,
            xorp = 4'd5,
            srl  = 4'd6,
            sra  = 4'd7,
            orp  = 4'd8,
            andp = 4'd9;

   /*AUTOREG*/
   // Beginning of automatic regs (for this module's undeclared outputs)
   reg [31:0]		res;
   // End of automatics
   /*AUTOWIRE*/

   logic signed [31:0]		src_a_sign;
   logic signed [31:0]		src_b_sign;

   assign src_a_sign = src_a;
   assign src_b_sign = src_b;

   always@(/*AUTOSENSE*/alu_op or src_a or src_b) begin
      casez(alu_op)
        add : res = src_a + src_b;
        sub : res = src_a - src_b;
        sll : res = src_a << src_b;
        slt : res = src_a_sign < src_b_sign ? 32'h00000001 : 32'h00000000;
        sltu: res = src_a < src_b ? 32'h00000001 : 32'h00000000;
        xorp: res = src_a ^ src_b;
        srl : res = src_a >> src_b;
        sra : res = src_a_sign >>> src_b_sign;
        orp : res = src_a | src_b;
        andp: res = src_a & src_b;
      endcase // casez (alu_op)
   end // always_comb

endmodule // alu

The control unit is generates the signal for control the data flow. The control unit is shown below which will be updated:

module control_unit (/*AUTOARG*/
   // Outputs
   we3, alu_src, alu_op, mem_rw, mem_reg, imm_src,
   // Inputs
   funct7, funct3, opcode
   );
   // Outputs
   output       we3;
   output       alu_src;
   output [4:0]	alu_op;
   output	mem_rw;
   output	mem_reg;
   output [1:0]	imm_src;
   // Inputs
   input [6:0]	funct7;
   input [2:0]	funct3;
   input [6:0]	opcode;

   parameter
            add  = 4'd0,
            sub  = 4'd1,
            sll  = 4'd2,
            slt  = 4'd3,
            sltu = 4'd4,
            xorp = 4'd5,
            srl  = 4'd6,
            sra  = 4'd7,
            orp  = 4'd8,
            andp = 4'd9;

   /*AUTOREG*/
   // Beginning of automatic regs (for this module's undeclared outputs)
   reg [4:0]		alu_op;
   reg			alu_src;
   reg [1:0]		imm_src;
   reg			mem_reg;
   reg			mem_rw;
   reg			we3;
   // End of automatics
   /*AUTOWIRE*/

   always@(/*AUTOSENSE*/funct3 or opcode) begin
      casez(opcode)
        7'd51: {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b0, 1'b0, 1'b0, 2'bxx}; // R-type
        7'd19: begin
               if(funct3 == 1 || funct3 == 5)
                   {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b1, 1'b0, 1'b0, 2'b01}; // I-type
               else
                   {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b1, 1'b0, 1'b0, 2'b00}; // I-type
               end
        default : {we3, alu_src, mem_rw, mem_reg} = 4'hx;
      endcase // case (opcode)
   end

   always@(/*AUTOSENSE*/funct3 or funct7) begin
      casez(funct3)
        3'd0 : alu_op = funct7[5] ? sub : add;
        3'd1 : alu_op = sll;
        3'd2 : alu_op = slt;
        3'd3 : alu_op = sltu;
        3'd4 : alu_op = xorp;
        3'd5 : alu_op = funct7[5] ? sra : srl;
        3'd6 : alu_op = orp;
        3'd7 : alu_op = andp;
        default: alu_op = 4'dx;
      endcase // casez (funct3)
   end // always_comb

endmodule // control_unit

The design performs arithmetic and logical operations. There are three I-type instructions that has to be carefully analyzed and there slli, srli and srai.

funct7immrs1funct3rdopcodeinstr
0000000(5-bits)(5-bits)001(5-bits)0010011slli
0000000(5-bits)(5-bits)101(5-bits)0010011srli
0100000(5-bits)(5-bits)101(5-bits)0010011srai

The if-statement inside the case statement of the main decoder handles this condition. The control signal imm_src determines which immediate encoding to use.

For shift immediate instructions, the standard 12-bit immediate is not selected. Instead:

  • imm[11:5] is treated as the funct7 value.
  • imm[4:0] is sign-extended to form the immediate value for the operation.

This ensures correct handling of shift instructions while maintaining flexibility for different immediate encodings.

Figure 4: Includes Immediate

Figure 4: Includes Immediate

Lets look at the instruction:

srai x21, x7, 2

It is a shift right arithmetic imm. The instruction in hexadecimal with their respective fields is shown below:

funct7immrs1funct3rdopcode
01000000001000111101101010010011

Up untill now the design takes cares for all the arithmetic and logical operation. The next set of instruction is load and store. The code can be found [here] step3.

Load and Store Instruction

Let’s implement store instruction in the data memory and load the data from the data memory.

Store

Lets understand how the store instruction works/

sw x21, 0(x0) => sw rs2, imm(rs1)

The above instruction can be understood as sw src, off(dst) => Address[dst + off] = src.

For the above instruction the hexadecimal value is 0x01502023. The rs2 = 21 and rs1 = 0. Writing data into a registers in verilog begins from zero. For simplity one store word is implemented and offset can be incremented by 1 instread of 4.

imm[11:5]rs2rs1funct3imm[4:0]opcode
00000001010100000010000000100011

Lets create a data memory.

module data_mem (/*AUTOARG*/
   // Outputs
   rd,
   // Inputs
   clk, addr, wd, mem_rw
   );
   // Outputs
   output [31:0] rd;
   // Inputs
   input	    clk;
   input [31:0] addr;   // address location
   input [31:0] wd;     // write data
   input	    mem_rw; // write when set to 1

   /*AUTOREG*/
   /*AUTOWIRE*/

   logic [31:0]	 dmem[31:0];

   always_ff@(posedge clk)
     if(mem_rw)
       dmem[addr] <= wd;

   assign rd = dmem[addr];

endmodule // data_mem

The updated control unit is illustrated below:

module control_unit (/*AUTOARG*/
   // Outputs
   we3, alu_src, alu_op, mem_rw, mem_reg, imm_src,
   // Inputs
   funct7, funct3, opcode
   );
   // Outputs
   output       we3;
   output       alu_src;
   output [4:0]	alu_op;
   output	mem_rw;
   output	mem_reg;
   output [1:0]	imm_src;
   // Inputs
   input [6:0]	funct7;
   input [2:0]	funct3;
   input [6:0]	opcode;

   parameter
            add  = 4'd0,
            sub  = 4'd1,
            sll  = 4'd2,
            slt  = 4'd3,
            sltu = 4'd4,
            xorp = 4'd5,
            srl  = 4'd6,
            sra  = 4'd7,
            orp  = 4'd8,
            andp = 4'd9;

   /*AUTOREG*/
   // Beginning of automatic regs (for this module's undeclared outputs)
   reg [4:0]		alu_op;
   reg			alu_src;
   reg [1:0]		imm_src;
   reg			mem_reg;
   reg			mem_rw;
   reg			we3;
   // End of automatics
   /*AUTOWIRE*/

   always@(/*AUTOSENSE*/funct3 or opcode) begin
      casez(opcode)
        7'd51: {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b0, 1'b0, 1'b0, 2'bxx}; // R-type
        7'd19: begin
               if(funct3 == 1 || funct3 == 5)
                   {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b1, 1'b0, 1'b0, 2'b01}; // I-type
               else
                   {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'b1, 1'b1, 1'b0, 1'b0, 2'b00}; // I-type
        end
        7'd35: {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'bx, 1'b0, 1'b1, 1'bx, 2'b10}; // S-type
        default : {we3, alu_src, mem_rw, mem_reg} = 4'hx;
      endcase // case (opcode)
   end

   always@(/*AUTOSENSE*/funct3 or funct7) begin
      casez(funct3)
        3'd0 : alu_op = funct7[5] ? sub : add;
        3'd1 : alu_op = sll;
        3'd2 : alu_op = slt;
        3'd3 : alu_op = sltu;
        3'd4 : alu_op = xorp;
        3'd5 : alu_op = funct7[5] ? sra : srl;
        3'd6 : alu_op = orp;
        3'd7 : alu_op = andp;
        default: alu_op = 4'dx;
      endcase // casez (funct3)
   end // always_comb

endmodule // control_unit

Load

Now lets load the data from location stored in data memeory to register x24.

lw x24, 0(x0) => lw rd, imm(rs1)

The data from the address location [imm + rs1] is load to register x24.

immrs1funct3rdopcode
0000000000000000010110000000011

The decode instruction is added to the control unit for the main decoder. The decode instruction is shown below.

7'd03: {we3, alu_src, mem_rw, mem_reg, imm_src} = {1'bx, 1'b1, 1'b0, 1'b0, 2'b00}; // I- type load

Till now the R-type, I-type and S-type instruction have been implemented and the code can be found [here] step4.

Branch Instructions

imm[12,10:5]rs2rs1funct3imm[4:1,11]opcode
00000000011000101000010001100011