BHE3233 BTE4433 – Week 14 Lab Submission

BHE3233

Group 1 – AES

Group 2 – FIR Filter

Group 3 – FIR Filter

BTS4433

https://www.youtube.com/shorts/xrJ00I3LIlE

Group 2

Group 5

Group 8

Group 5

https://www.youtube.com/shorts/royQgfZswWc

https://www.youtube.com/shorts/QMlo0IBENTA

Group 1

https://www.youtube.com/shorts/e9hTOs6n7F8?si=F-tFTJrBuzMDtV41

Group 2

https://www.youtube.com/shorts/wrp0cACEB4k

Group 7

BHE3233 BTE4433 – Week 13 Project Presentation

What an incredible journey it has been! This week is the presentations week for the BHE3233 and BTS4433 cohorts. After weeks of development through our tiered scaffolding approach—moving from Stage 1 (Workout Programming) to Stage 4 (Comparative Optimization)—every student successfully presented their hardware architectures.

Watching everyone dissect, design, and compare four distinct digital systems was a proud moment for the UMPSA STEM Lab. But how exactly do these projects tie into our overall syllabus, and how have they built the critical Verilog coding criteria our students will take into the industry?

Let’s break it down.

The Projects: Four Pillars of Digital System Design
The course syllabus was intentionally structured to expose students to four highly distinct industry domains. By encouragung students to design and then optimize each of these systems, the syllabus ensured a comprehensive grasp of real-world hardware challenges:

  1. Project 1: Cryptography (Security-Focused): Students dove into hardware encryption by designing an 8-bit AES S-Box. By Stage 4, they compared a memory-heavy Look-Up Table (LUT) approach against an area-efficient Logic-based (Boolean/Galois Field) approach, learning how to select architectures based on whether they are building a high-speed processor or a low-area IoT device.
  2. Project 2: CPU / ALU Design (Arithmetic-Focused): Escalating a basic multiplier to 16-bit logic, students had to evaluate complex mathematical trees. They compared Behavioral, Sequential (Shift-and-Add), and Pipelined multipliers. This taught them the delicate balance between saving Logic Elements (LEs) for handheld devices versus maximizing throughput for heavy computation.
  3. Project 3: DSP & Sensor Processing: Students built a 4-tap FIR filter, calculating precise bit-widths to prevent arithmetic overflow. The final optimization challenged them to shift from a Direct Form to a Transposed Form architecture, proving that mathematically identical Verilog code can yield vastly different Max Frequencies (fMAX) simply by shortening the critical path.
  4. Project 4: Communication Protocols (Interfacing): Focusing on high-reliability data transmission, students built a UART Controller with an integrated Baud Rate Generator and Parity checking. By comparing Binary-Encoded FSMs against One-Hot Encoded FSMs, they learned how Verilog state machine encoding directly impacts flip-flop utilization and setup slack.

Building Verilog Criteria: Beyond Syntax
The most valuable outcome of this class is how it transformed the students’ Verilog coding criteria. The tiered scaffolding intentionally guided them through a specific evolutionary process:

  1. Code Comprehension & Hardware Inference (Stage 1): Students learned that Verilog isn’t just software code; it is a description of physical hardware. They learned how a simple * operator is interpreted by the Quartus synthesis engine.
  2. Debugging Hardware Logic (Stage 2): They learned to identify hardware-specific malfunctions, such as standard compliance errors (“Output port has no driver”) and catastrophic “Race Conditions” caused by using blocking assignments (=) instead of non-blocking assignments (<=) in sequential logic.
  3. Architectural Integration (Stage 3): They learned how to connect verified sub-modules into complex top-level architectures, carefully calculating bit-growth across the data path.
  4. Scientific Optimization (Stage 4): This is where they built their highest-level Verilog criteria. They learned to rely on the Quartus TimeQuest Timing Analyzer and Resource Utilization reports to justify their designs, comparing Total Logic Elements, Registers, Max Frequency (fMAX), and Latency.

Preparing for the Engineering Tasks
As noted in the course materials, escalating designs to handle complex optimizations moves a student from simply “making things work” to “optimizing things for performance,” which is the hallmark of a true Digital Design Engineer.

Congratulations to all the BHE3233 and BTS4433 students! You didn’t just write Verilog this semester; you intelligently traded Area for Speed, evaluated critical paths, and solved complex timing puzzles. You are now fully equipped to tackle modern FPGA and ASIC design challenges in the industry!

 

BHE3233 BTS4433 – Week 11 Project Design Optimisation

Hello everyone,
After weeks of writing, debugging, and integrating Verilog code, we have finally reached the pinnacle of digital system design: Stage 4 (Comparative Optimization)
In this phase, we move beyond simply asking “Does the code work?” to asking “Which architecture is mathematically and structurally superior?”
In FPGA design, there is rarely one perfect answer. Every design choice is a delicate balancing act between Area (Logic Elements and Flip-Flops), Speed (), and Latency.
This week, our students put competing architectures head-to-head across four different projects to see how they perform under the rigorous scrutiny of Quartus’ Static Timing Analysis (STA) and Resource Utilization reports.
Here is a breakdown of the design comparisons for each project:
Project 1: AES Cryptography (LUT vs. Logic-Based S-Box)
For the AES S-Box, students integrated two completely different structural approaches onto the DE10-Lite FPGA and used a toggle switch to compare their performance.
    1. Table-Based (ROM/LUT) Approach: This method acts like a cheat sheet, pre-calculating every possible answer and storing it in memory
      1. While extremely fast (constant time), it consumes significant memory resources and scales poorly. It is highly suited for high-speed cryptographic processor.
    2. Logic-Based (Boolean) Approach: This method acts like a math formula, calculating Galois Field arithmetic in real-time using layers of logic gates
      1. It uses very little memory and is highly area-efficient, making it the perfect choice for low-area IoT devices, though it suffers from a longer propagation delay due to the deep logic tree.

 

Project 2: CPU Arithmetic (Sequential vs. Pipelined 16-bit Multipliers)
For the ALU design project, students escalated their 4-bit multipliers to 16-bit and compared architectures to see how digital systems handle complex arithmetic
    1. Sequential (Shift-and-Add) Multiplier: By mimicking manual long multiplication, this architecture reuses a single adder over multiple clock cycles
      1. It dramatically saves on Logic Elements (LEs), but the cost is high latency, making it ideal for space-constrained, battery-powered handheld devices
    2. Pipelined Multiplier: To maximize performance, students inserted registers into the combinational logic “fences” to break up the deep mathematical tree 
      1. Like an assembly line, this allows a new multiplication operation to begin every single clock cycle.
      2. It costs far more registers, but drastically increases throughput and , which is mandatory for applications executing millions of operations per second.

Project 3: DSP and Sensors (Direct vs. Transposed FIR Filters)

In digital signal processing, the physical layout of your adders and multipliers can make or break your frequency limits. Students evaluated a 4-tap FIR filter using two mathematically identical, but structurally different, forms.
    1. Direct Form: The standard approach where all multiplications happen in parallel, and the results are summed up in a large “Adder Tree”. Its major flaw is a massive Critical Path—the signal must traverse a multiplier and the entire chain of adders before the clock cycle ends, severely limiting the maximum frequency.
    2. Transposed Form: By strategically placing delay registers between the adders, students shortened the critical path so the signal only propagates through one multiplier and one adder per cycle. While this slightly increases Total Registers (FFs), it yields a substantially higher (often a 25% improvement), making it the superior architecture for high-speed 100MHz digital audio processors
Project 4: UART Controller (Binary vs. One-Hot FSM Encoding)
In the final project, students escalated their UART transmitter to handle 16-bit data frames with Even Parity and evaluated how Quartus encodes Finite State Machines (FSMs)
    1. Binary Encoding: This style uses the absolute minimum number of Flip-Flops (e.g., 2 FFs for 4 states). While it saves physical area on the silicon, it requires heavier combinational logic to decode the states
    2. One-Hot Encoding: This style assigns exactly one Flip-Flop per state (e.g., 4 FFs for 4 states)
      1. Despite consuming more physical area, the decoding logic becomes incredibly simple. This translates to better Setup Slack and a much faster , proving that sometimes using more hardware actually makes your system perform better
The Takeaway Stage 4 proves that mastering digital system design isn’t just about writing Verilog that compiles. A true hardware engineer knows how to interpret the Fitter and Timing Analyzer reports to intelligently trade Area for Speed based on the exact needs of the industry application!

 

I look forward to your creativity in executing these projects. Please complete your submissions in KALAM =)

BHE3233 BTS4433 – Week 10 Project Semi Completed Programming

Hi everyone,

After successfully navigating code comprehension and hardware debugging in Stages 1 and 2, our journey through digital system design enters its most advanced phases. This week, we focused on Stage 3: Semi-Completed Programming and Stage 4: New Programming Task (Comparative Optimization).

These stages push us beyond merely “making it work” to actually architecting complete systems and evaluating trade-offs like true digital design engineers. Here is a breakdown of our milestones and the core learning outcomes for each project.

Stage 3: Semi-Completed Programming (Architectural Completion)

In Stage 3, we took foundational components and integrated them into complete, functioning architectures.

Project 1: AES Cryptography (The Logic-Optimized S-Box)

    • The Challenge: Transitioning away from a memory-heavy Look-Up Table (LUT) to a logic-optimized approach using Galois Field (GF(2)) composite arithmetic. We had to complete the missing Boolean equations for multiplicative inversion.
    • Learning Outcome: Mastering Mathematical Hardware. We learned how complex cryptography math (like Galois Fields) is synthesized into pure Boolean logic (XOR, AND, OR gates), demonstrating how a “calculation” approach saves memory at the cost of logic depth.

Project 2: The Multiplier (Pipelining for Speed)

    • The Challenge: We upgraded a combinational multiplier by inserting registers into the middle of the logic “fences” to create a Pipelined Multiplier.
    • Learning Outcome: Increasing Throughput. By breaking a long combinational path into smaller stages, we learned how pipelining allows a new multiplication to start every clock cycle, drastically increasing the system’s Max Frequency (fmax)

Project 3: FIR Filter (Structural Integration)

    • The Challenge: We integrated our verified Multiply-Accumulate (MAC) units and Delay Lines to build a complete Direct Form FIR Filter.
    • Learning Outcome: Preventing Arithmetic Overflow. The crucial lesson here was calculating exact bit-widths for signal growth. We learned that multiplying two 4-bit inputs yields an 8-bit output, and accumulating three of these 8-bit partial products requires a 10-bit final output to prevent overflow.

Project 4: UART Controller (The Transmitter FSM)

    • The Challenge: Wrapping raw serial data into a standardized UART frame by building a Finite State Machine (FSM) that transitions through IDLE, START, DATA, and STOP states based on precise “ticks” from our Baud Rate Generator.
    • Learning Outcome: Protocol Synchronization. We learned how to reliably sequence hardware operations using FSMs, ensuring that communication lines are held HIGH during idle and strictly synchronized to a predetermined baud rate for data integrity.

 

Moving on

Stage 4: Comparative Optimization (New Programming Tasks)

Stage 4 is where engineering design trade-offs shine. We escalated our designs and used Quartus tools (like the Timing Analyzer and Resource utilization reports) to scientifically compare competing architectures.

Project 1: AES Cryptography (LUT vs. Logic Trade-offs)

    • The Challenge: We integrated both the Stage 1 (Table-based) and Stage 3 (Logic-based) architectures onto the DE10-Lite FPGA, using a hardware switch to “toggle” between them.
    • Learning Outcome: Resource vs. Speed Optimization. By running a Static Timing Analysis (STA), we learned to scientifically deduce which architecture to use depending on the application—evaluating why a LUT is better for a high-speed cryptographic processor while Boolean logic is superior for a low-area IoT device.

Project 2: The Multiplier (16-bit Escalation)

    • The Challenge: We escalated our multiplier from 4-bit to 16-bit and compared all three architectures: Behavioral, Sequential, and Pipelined.
    • Learning Outcome: Evaluating Complex Architectural Trade-offs. We learned how expanding bit-widths exponentially deepens the logic tree. The outcome was understanding how to choose an architecture based on strict constraints (e.g., choosing sequential for space-constrained handhelds vs. pipelined for high-performance CPU ALUs).

Project 3: FIR Filter (Direct vs. Transposed Form)

    • The Challenge: We redesigned our filter into the Transposed Form, a mathematically identical structure that places delay registers between the adders rather than at the input.
    • Learning Outcome: Shortening the Critical Path. We learned a major DSP optimization technique: by separating combinational adders with registers, we shortened the critical path delay. Even though this uses slightly more Logic Elements, it dramatically boosts the f max making it ideal for high-speed audio/sensor processing.

Project 4: UART Controller (FSM Encoding & Parity)

    • The Challenge: We expanded the UART to 16-bit with Even Parity error checking and compared two FSM architectures: Binary Encoding versus One-Hot Encoding.
    • Learning Outcome: FSM Encoding Trade-offs. We gained hands-on experience in how the Quartus compiler assigns flip-flops. We discovered that Binary Encoding saves area (fewer flip-flops) but requires heavier decoding logic, whereas One-Hot Encoding uses more flip-flops but simplifies decoding, resulting in better setup slack and a faster maximum frequency.

BHE3233 BTS4433 – Week 9 Project Workout Simulation

This week in the lab, we move forward in our hardware design journey by executing Stage 1: Workout Programming and Stage 2: Debugging Specific Malfunctions across four highly distinct FPGA projects. These stages forced us to transition from merely reading code to actively troubleshooting real-world hardware logic errors.

Here is a breakdown of what we accomplished:
Project 1: Cryptography on Silicon (AES S-Box) We kicked things off by diving into the Advanced Encryption Standard (AES). In Stage 1, we analyzed a Memory-based (ROM) Look-Up Table (LUT) approach for an 8-bit AES S-Box. We used ModelSim to perform functional verification, proving that the hardware correctly maps inputs to standardized AES cipher outputs (like mapping 8'h00 to 8'h63).
The real challenge came in Stage 2 with the Inverse S-Box (used for decryption). We were handed a buggy Verilog script containing three intentional errors related to syntax, logic, and standard compliance. By deciphering Quartus compilation warnings like “Output port has no driver,” we successfully repaired the code to prove that feeding a cipher output back into the Inverse S-Box restores the original “plain” input byte.
Project 2: CPU Arithmetic (The Multiplier) Our second project focused on resource-efficient ALU design. Stage 1 introduced a Behavioral Multiplier, where we let the Quartus synthesis engine decide whether to map the multiplication logic to 9-bit DSP blocks or Logic Elements (LUTs).
Stage 2 brought us down to the sequential level with a Shift-and-Add Multiplier, which mimics manual long multiplication to save area. However, the provided design failed for maximum 4-bit inputs (like 4'hF * 4'hF). We had to debug a logical flaw in the module’s bit-counter; it was only counting up to 3 instead of 4. By correcting the count limit and adjusting the counter’s bit-width, we enabled the hardware to successfully process all 4 bits.
Project 3: DSP and IoT Sensors (4-Tap FIR Filter) For our digital signal processing project, we explored how to clean up sensor data using a Finite Impulse Response (FIR) filter. In Stage 1, we verified the core building block: the Multiply-Accumulate (MAC) unit. A key takeaway was learning how to prevent arithmetic overflow by understanding bit-growth (e.g., multiplying two 4-bit inputs requires an 8-bit output).
Stage 2 tackled a classic hardware pitfall: the Race Condition. In our shift register (Delay Line), which holds the crucial “historical” samples (, ), the buggy code used blocking assignments (=) inside a clocked block. This caused the input to immediately leak through all registers in a single cycle. We fixed this by rewriting the block with non-blocking assignments (<=), ensuring the signal properly shifts stage-by-stage with each clock pulse.
Project 4: Communication Protocols (UART Controller) Our final project of the week centered on hardware interfacing. We started Stage 1 by executing Parallel-to-Serial Conversion, learning how a shift register takes a 4-bit wide parallel bus and sends it out bit-by-bit over a single serial wire.
In Stage 2, things got precise. Because UART transmitters and receivers don’t share a common clock, they rely on a precise Baud Rate. We had to debug a Baud Rate Generator designed to divide a 50MHz FPGA clock down to 9600 bits per second.
The buggy counter was missing a reset condition and had a threshold comparison error (counting to 5208 actually takes 5209 cycles). After fixing the logic, we used ModelSim to verify that our baud_tick pulses were exactly ~104.16 microseconds apart, ensuring perfect synchronization.
Moving from code comprehension in Stage 1 to active logic correction in Stage 2 has completely changed how we look at Verilog and Quartus. Next week, we will move on to Stage 3: Semi-Completed Programming, where we will integrate these components into larger, more complex architectures!

Completion of Stage 1 & 2 activities

BHE3233 BTS4433 – Week 7 – Sequential RTL – Lab 4

Welcome to Week 7! This week, we took a massive leap in our digital design journey by exploring Register Transfer Level (RTL) sequential circuits. Unlike combinational circuits, sequential circuits have memory, meaning their outputs depend not only on current inputs but also on previous input
history. At the heart of these sequential designs is the Finite State Machine (FSM).
In digital design, FSMs are used to control system behavior by transitioning between a finite number of states based on inputs and clock cycles.
When we build FSMs in Verilog, we generally divide the architecture into three main blocks:
        1. The State Register: This is a synchronous block (using always @(posedge clk)) that updates the current state to the next state at every clock edge, or resets it when a reset signal is triggered.
        2. The Next-State Logic: A combinational block that evaluates the current state and external inputs to determine what the next state should be.
        3. The Output Logic: A combinational block that generates the output signals based on the current state (Moore machine) or both the current state and inputs (Mealy machine).
In class, we looked at a practical example: a sequence detector acting as a lock that opens whenever the serial bit pattern “1011” is achieved.
Before writing any Verilog code, it is incredibly important to derive your state machine on pen and paper first. Drawing an abstract state diagram ensures your states and transition logic actually make sense.
For the “1011” detector, our state diagram tracks how much of the pattern we have seen so far:
        1. S0: Nothing matched yet.
        2. S1: Matched “1”.
        3. S2: Matched “10”
        4. S3: Matched “101”
        5. S4: Matched the full “1011” sequence (this is where the output goes high).
By mapping out the transition arrows—such as moving from S1 to S2 if the input is 0, or dropping back to S0 if the sequence is broken—you establish the exact mathematical behavior your next-state logic block needs to model.
Hands-On: Lab 4 and the Satellite Communication System
Once we nailed down the state diagrams on paper, we moved into the hardware phase with Lab 4: Finite State Machine for Satellite Communication Link
In real picosatellite systems, communication links require a strict, multi-stage initialization and termination process . You simulated this exact scenario using a four-state FSM:
      1. IDLE: Waiting for the start command.
      2. LINK_ESTABLISH: Attempting the communication handshake.
      3. DATA_TRANSFER: The active data transmission session
      4. LINK_TERMINATE: Securely closing the session 
During the lab, you mapped your DE10-Lite board’s switches to act as the transition triggers (e.g., SW0 to start communication, SW1 to signify link established) and used the LEDs to track which state the FSM was currently in.
By verifying the state transitions in a ModelSim simulation and testing it directly on the physical board, you successfully built a control system identical to those used in real-time embedded space missions !
Keep practicing drawing those state diagrams on paper before jumping into Quartus. See you next week!

 

2026 Book :) Digital System Design with Verilog: FSM, RTL Modelling, Pipelining and Static Timing Analysis

The book is finally in =).

There is a specific kind of satisfaction in hardware engineering—the moment a conceptual logic circuit transitions from a schematic to a functional physical implementation. My fascination with this process began in 1999 during my undergraduate studies under Professor Othman Sidek. Back then, the ability of an FPGA to house a vast array of logic functions felt revolutionary.

It all started with a project in my Digital Electronics 2 subject. I remember it vividly, we built an automated counter for badminton matches. The digital logic system was designed to detect whether a shuttlecock landed in or out of bounds to assist the umpire in ruling points. That small-scale project served as the gateway to a much deeper exploration into digital systems.

From Undergraduate Roots to GSM Architecture

By the time I reached my final year project, I was diving deep into digital systems for GSM communication modules. Through a family connection—my cousin, who was then a technician for a leading telecommunications provider—I gained invaluable access to the industry standards of the time.

I was particularly focused on implementing Convolutional Encoders, which were essential for error correction in mobile networks. At the time, we worked across the five primary channel types:

      1. TCH/FS (Full Rate Speech)

      2. TCH/HS (Half Rate Speech)

      3. FACCH (Fast Associated Control Channel)

      4. SACCH (Slow Associated Control Channel)

      5. SDCCH (Standalone Dedicated Control Channel)

The successful implementation of these designs wasn’t just a hurdle to pass for graduation, it was the foundation of my continued passion with digital logic =p

Fast forward to 2021, I returned to the classroom to teach Digital System Design. Re-engaging with the subject after years in the field felt like a homecoming. During this period, I began supervising Phuah Soon Eu on the project involving the implementation of metaheuristic algorithms on IC chips.

The inherent challenges of translating high-level algorithms into hardware were tackled: managing floating-point arithmetic, optimizing RAM architectures, and modifying algorithmic flows to suit the rigid requirements of digital implementation.

Introducing the Book: A Practical Path for the Novice

Through the project implementation, a persistent “missing link” in technical FPGA education literature. There is a steep cliff between learning basic Verilog and understanding the professional constraints of a production-ready FPGA design.

To bridge this gap, this book is introduced.

Digital System Design with Verilog: FSM, RTL Modelling, Pipelining and Static Timing Analysis

The philosophy is simple: The best way to learn a system is to build it. We designed this text to guide the reader through three critical phases:

      1. Foundations: Introduction to FPGA architecture and Hardware Description Language (HDL).

      2. Synthesis: A deep dive into RTL modeling and the complexities of Static Timing Analysis (STA).

      3. Implementation: Mastering FPGA-specific design and the art of optimization.

Moving from Functional to Professional

This book is specifically written for those at the “Novice to Early-Intermediate” stage. It is for the designer to learn to perform a functional simulation but needs to learn how to read synthesis reports, meet specific timing targets, and redesign circuits with objective-driven outcomes.

It has been a privilege to author this with Phuah Soon Eu, and we hope this work serves as a catalyst for the next generation of digital designers—much like a badminton counter did for me decades ago :).

BHE3233 BTS4433 – Week 6 – FPGA Implementation of Combinational RTL – Lab 3

This week, we continued our exploration of combinatorial circuits, focusing on Lab 3 (Session 5): Implementation and Comparison of Ripple Carry Adder (RCA) and Carry Look-Ahead Adder (CLA). This lab builds on students’ foundational understanding of digital logic and provides hands-on experience with adder architectures that are central to arithmetic logic units (ALUs).

Lab Objectives

The key objectives of this lab were to:

      1. Design and implement 4-bit Ripple Carry Adders (RCA) and 4-bit Carry Look-Ahead Adders (CLA)
      2. Perform functional verification using selected test vectors
      3. Implement both designs on an FPGA platform
      4. Analyze and compare post-implementation parameters using Quartus post-implementation reports
      5. Introduce basic ALU functionality, focusing on multiplexing using case statements and transitioning toward decoder-based designs

Adder Design and Implementation

Students designed both adders using structural and behavioral modeling approaches in HDL:

  • Ripple Carry Adder (RCA):
    A straightforward adder where each full adder waits for the carry from the previous stage. This simplicity comes at the cost of increased propagation delay.
  • Carry Look-Ahead Adder (CLA):
    A faster alternative that computes carry signals in parallel using generate and propagate logic, significantly reducing carry propagation delay.

Both designs were synthesized and deployed on the FPGA to observe real hardware performance, not just simulation results.

Functional Testing

During functional testing, several input combinations were validated to ensure correctness:

      • A = 5 (0101), B = 3 (0011)
        • Result: 8 (1000)
        • Verified correct operation for both RCA and CLA.

      • A = 15 (1111), B = 1 (0001)
        • Result: 0 with carry-out = 1
        • This test demonstrated carry spill-over, confirming correct handling of overflow conditions.

These cases helped reinforce how carry propagation affects outputs and highlighted the functional equivalence of RCA and CLA despite architectural differences.

 

Post-Implementation Analysis (Quartus)

An important part of this lab was analyzing the post-implementation reports in Quartus.

 

Students were tasked with extracting and comparing the following parameters for both adder implementations:

        • Logic Elements (LEs) Used
          How much FPGA hardware is consumed by RCA versus CLA.
        • Combinational Functions
          Insight into the complexity of logic synthesized by the toolchain.
        • Maximum Clock Frequency (Fmax)
          The highest achievable clock rate based on timing constraints.
        • Critical Path Delay
          The longest combinational delay path, which is especially important when comparing RCA and CLA performance.

As expected, the RCA generally exhibited a longer critical path delay due to serial carry propagation, while the CLA achieved a higher maximum clock frequency, demonstrating its advantage in speed-critical designs.

In digital circuits, Fmax (maximum clock frequency) tells us how fast a circuit can run when a clock is used. It depends on the critical path, which is the longest delay through the combinational logic. Signals must be able to travel along this path and settle before the next clock edge arrives. If the critical path is long, the circuit needs a slower clock. In this lab, the RCA has a longer critical path because the carry must pass through each bit one by one, while the CLA reduces delay by calculating carries in parallel.

When checking the Quartus Timing Analysis Report, students noticed that it shows “No Fmax” and “No clock properties to report.” This is normal for this lab. Both the RCA and CLA designs are purely combinational and do not include any clocked elements such as flip-flops or registers. Since there is no clock defined in the design, Quartus cannot calculate Fmax. Fmax is only reported for sequential circuits where data moves from one register to another using a clock. For this reason, Quartus only reports logic usage and combinational delay. Once registers are added in future labs—such as in a registered ALU or datapath—Fmax and full timing information will become available.

To clearly observe the difference in hardware utilization, the total number of logic gates and Configurable Logic Elements (CLEs) used in the system can be analyzed by implementing both a 32-bit Carry Look-Ahead Adder (CLA) and a 32-bit Ripple Carry Adder (RCA). By experimenting with these two adders under identical design conditions, a direct comparison can be made. This allows the differences in resource usage to be clearly identified, where the CLA generally requires more gates and CLEs due to its complex carry logic, while the RCA uses fewer resources but operates with a longer propagation delay.

The example design above employs a genvar as a compile-time counter within a Verilog generate loop to create the 32 stages of the ripple carry adder. The counter variable i controls the instantiation of each full adder, where A[i] and B[i] represent the operand bits at position i, and carry[i] and carry[i+1] form the carry chain between adjacent stages. This approach improves scalability and code modularity, allowing the adder width to be easily adjusted without altering the underlying architecture. Importantly, the use of a counter does not affect the synthesized hardware, as the loop is unrolled during synthesis.

Introduction to Basic ALU Concepts

In the latter part of the lab, we began transitioning from simple adders to basic ALU design concepts:

      1. Students implemented a simple ALU capable of performing multiple operations.
      2. A case statement was used to implement a multiplexer that selects the desired operation based on an opcode.
      3. This approach helped students understand how operation selection works internally within an ALU.

We also discussed the next step in this progression: moving from case-based multiplexing toward decoder-based control logic, which scales better for more complex ALU designs.

You can also take this up a level by having your ALU with output assigned to 7 segment display:-

 

Key Takeaways

By the end of this lab, you should be able to:

      1. Understand the architectural and performance differences between RCA and CLA
      2. Validate combinational circuits through simulation and FPGA implementation
      3. Interpret FPGA post-implementation reports to justify design trade-offs
      4. See how simple adders evolve into more complex building blocks, such as ALUs

This lab serves as a crucial bridge between basic combinational logic and more advanced processor datapath components, setting the stage for upcoming topics in sequential logic and CPU design.

BHE3233 BTS4433 – Week 5 – FPGA Implementation of Combinational RTL

This week is important as we move from theoretical concepts to hands-on hardware implementation. You will be tackling Pre-Labs (Sessions 1 & 2) and Labs 1 & 2 (Sessions 3 & 4).

To succeed in this course and meet our core learning outcomes—specifically CO3, which focuses on developing FPGA designs using industry best practices—you must follow a structured approach.

Phase 1: The Foundation (Pre-Lab Sessions 1 & 2)

Before touching the hardware, you must set up your digital environment. These sessions focus on Project Setup and Functional Verification.

      • Software Installation: Ensure you have Intel Quartus Prime Lite Edition (Version 18.1) and ModelSim installed on your computer.

      • Project Creation: Launch the New Project Wizard in Quartus. Critical step: You must target the specific FPGA device on our board: MAX 10 – 10M50DAF484C7G.

  
      • Functional Simulation: Use ModelSim to verify your logic before synthesis. This allows you to catch bugs in a software environment where signals are easy to trace.

Phase 2: Hands-On Implementation (Labs 1 & 2)

Now, it’s time to bring your code to life on the DE10-Lite FPGA Board.

Lab 1: Blinking LEDs (Session 3)

In this lab, you will learn to control the board’s ten user-defined LEDs.

      • The Logic: You will write a Verilog module utilizing a counter to create a delay.

      • Key Tip: Remember that the onboard clock runs at 50 MHz. Your counter must be large enough (at least 25 bits) to create a blink visible to the human eye.

Lab 2: Switch Inputs and Debouncing (Session 4)

Mechanical switches are “noisy.” When you flip a switch, the signal “bounces” rapidly between high and low before settling.

  • The Challenge: You will implement Debounce Logic using a counter to ensure the FPGA only registers a clean, stable signal.

  • The Goal: Observe the difference between a direct switch-to-LED connection and one filtered through your debounce logic.

    How to Complete Your Lab Answer Sheets

Your lab worksheet is your evidence of learning. To receive full marks and satisfy CO3 requirements, every session entry must include:

  1. Unique Module Naming: To ensure individual work, you must prefix your top-level modules with your unique initials (e.g., AZ_LED_Blinking). Generic names will result in mark deductions.

  2. Verilog Code Snippets: Paste snapshots of your design module and your testbench.

  3. Simulation Waveforms: Provide ModelSim screenshots. Don’t just paste the image; add notes explaining the logic proof and calculating the clock cycles or frequency observed.

  4. Implementation Reports: After compiling in Quartus, open the Compilation Report. You must include snapshots showing Resource Utilization (how many LUTs and Registers your design used).

    Final Submission Reminder

Once you have completed all activities and filled out your worksheet for Sessions 1 through 4:

  1. Verify that all dates are recorded correctly.

  2. Ensure every module name is unique to you / your team member.

  3. Upload the completed document to KALAM (after Lab 6 – Session 8).

Success in digital design comes from attention to detail. Double-check your pin assignments in the Pin Planner before programming, and always verify your timing slack in the Summary Report.

For those who are interested to explore more about Pin Planner, do explore this YouTube :-

Happy designing, engineers!

BHE3233 BTS4433 – Week 4 – Combinational Logic

Welcome to Week 4 of BHE 3233 and BTE4433. This week marks an important step in your journey as electrical and electronic engineering students, as you begin connecting theoretical concepts with actual hardware implementation. You have now moved beyond just understanding logic on paper—you are starting to build and test real digital systems using Verilog and FPGA tools.

We began the week with a refresher on binary operations, which form the foundation of all digital systems. Understanding how numbers are represented and manipulated in binary is essential, as every digital circuit ultimately operates on combinations of 0s and 1s. From there, we transitioned into describing combinatorial logic using Verilog, focusing specifically on adders.

You were first introduced to the half adder, a simple circuit that takes two binary inputs and produces a sum and a carry output. Although basic, it is an important building block in digital design. We then extended this concept to the full adder, which includes a carry-in input, allowing multiple adders to be connected together. This idea of chaining adders is fundamental in designing circuits capable of handling multi-bit arithmetic operations.

To give you a broader perspective, we briefly explored more advanced adder designs such as the ripple carry adder and the carry look-ahead adder. The ripple carry adder is straightforward but can be slow because each carry must propagate through every stage. In contrast, the carry look-ahead adder improves speed by predicting carry signals in advance, although it comes with increased design complexity. These concepts will become more meaningful as you encounter larger and faster digital systems in the future.

 

 

 

Before moving into Lab 1, we started with Lab 0, which focused on simulation. In this lab, you worked with two files: number.v and number_tb.v. The goal was to simulate your design using ModelSim and observe how digital signals behave over time. This is a critical step in digital design, as simulation allows you to verify correctness before implementing your code on actual hardware.

In number.v, you were given a basic up-counter design capable of driving a 7-segment display on the FPGA board. Through this, you explored several important Verilog concepts, including top module declaration, the use of wire data types, and bit buses for representing multi-bit signals. You also examined conditional statements such as if-else, as well as basic mathematical operations within Verilog. By analyzing the waveform output in ModelSim, you were able to see how signals change over time and how your design behaves in response to different inputs. This helped build your understanding of timing and signal relationships in digital circuits.

During the Lab 1, which focused on implementing a running LED design on the DE10-Lite FPGA Board. The task required you to write Verilog code, compile it, assign the correct pins for the 10 LEDs on the board, and program the FPGA. The result was a sequence of LEDs lighting up from one end of the board to the other, demonstrating a simple but effective hardware implementation of your design.

This lab was significant because it introduced you to the complete FPGA workflow, from coding to physical output. Many of you experienced the process of compiling your design, resolving errors, performing pin assignments, and finally observing your design working on actual hardware. This transition from simulation to real-world implementation is a key milestone in your learning.

 

Thank you Lim for the video shot.

Some common challenges were observed during the session, particularly issues with the hardware not being recognized by your computer. In most cases, this was due to problems with the USB-Blaster driver installation. Ensuring that the driver is properly installed and that the device is correctly detected by the system is crucial. Checking the device manager, trying different USB ports, or reinstalling the driver can often resolve these issues.

 

Overall, this week has equipped you with essential skills in designing combinatorial circuits using Verilog and implementing them on an FPGA platform. You should now have a clearer understanding of how basic arithmetic circuits are built and how digital designs move from code to hardware.

As we move forward, make sure you are comfortable with both your Verilog coding and FPGA setup. The concepts and skills from this week will serve as the foundation for more advanced topics in the coming weeks. If you are still facing issues, especially with hardware setup, it is important to address them early.

See you in Week 5.