BHE3233 BTE4433 – Week 14 Lab Submission

BHE3233

Group 1 – AES

Group 2 – FIR Filter

Group 3 – FIR Filter

BTS4433

https://www.youtube.com/shorts/xrJ00I3LIlE

Group 2

Group 5

Group 8

Group 5

https://www.youtube.com/shorts/royQgfZswWc

https://www.youtube.com/shorts/QMlo0IBENTA

Group 1

https://www.youtube.com/shorts/e9hTOs6n7F8?si=F-tFTJrBuzMDtV41

Group 2

https://www.youtube.com/shorts/wrp0cACEB4k

Group 7

BHE3233 BTE4433 – Week 13 Project Presentation

What an incredible journey it has been! This week is the presentations week for the BHE3233 and BTS4433 cohorts. After weeks of development through our tiered scaffolding approach—moving from Stage 1 (Workout Programming) to Stage 4 (Comparative Optimization)—every student successfully presented their hardware architectures.

Watching everyone dissect, design, and compare four distinct digital systems was a proud moment for the UMPSA STEM Lab. But how exactly do these projects tie into our overall syllabus, and how have they built the critical Verilog coding criteria our students will take into the industry?

Let’s break it down.

The Projects: Four Pillars of Digital System Design
The course syllabus was intentionally structured to expose students to four highly distinct industry domains. By encouragung students to design and then optimize each of these systems, the syllabus ensured a comprehensive grasp of real-world hardware challenges:

  1. Project 1: Cryptography (Security-Focused): Students dove into hardware encryption by designing an 8-bit AES S-Box. By Stage 4, they compared a memory-heavy Look-Up Table (LUT) approach against an area-efficient Logic-based (Boolean/Galois Field) approach, learning how to select architectures based on whether they are building a high-speed processor or a low-area IoT device.
  2. Project 2: CPU / ALU Design (Arithmetic-Focused): Escalating a basic multiplier to 16-bit logic, students had to evaluate complex mathematical trees. They compared Behavioral, Sequential (Shift-and-Add), and Pipelined multipliers. This taught them the delicate balance between saving Logic Elements (LEs) for handheld devices versus maximizing throughput for heavy computation.
  3. Project 3: DSP & Sensor Processing: Students built a 4-tap FIR filter, calculating precise bit-widths to prevent arithmetic overflow. The final optimization challenged them to shift from a Direct Form to a Transposed Form architecture, proving that mathematically identical Verilog code can yield vastly different Max Frequencies (fMAX) simply by shortening the critical path.
  4. Project 4: Communication Protocols (Interfacing): Focusing on high-reliability data transmission, students built a UART Controller with an integrated Baud Rate Generator and Parity checking. By comparing Binary-Encoded FSMs against One-Hot Encoded FSMs, they learned how Verilog state machine encoding directly impacts flip-flop utilization and setup slack.

Building Verilog Criteria: Beyond Syntax
The most valuable outcome of this class is how it transformed the students’ Verilog coding criteria. The tiered scaffolding intentionally guided them through a specific evolutionary process:

  1. Code Comprehension & Hardware Inference (Stage 1): Students learned that Verilog isn’t just software code; it is a description of physical hardware. They learned how a simple * operator is interpreted by the Quartus synthesis engine.
  2. Debugging Hardware Logic (Stage 2): They learned to identify hardware-specific malfunctions, such as standard compliance errors (“Output port has no driver”) and catastrophic “Race Conditions” caused by using blocking assignments (=) instead of non-blocking assignments (<=) in sequential logic.
  3. Architectural Integration (Stage 3): They learned how to connect verified sub-modules into complex top-level architectures, carefully calculating bit-growth across the data path.
  4. Scientific Optimization (Stage 4): This is where they built their highest-level Verilog criteria. They learned to rely on the Quartus TimeQuest Timing Analyzer and Resource Utilization reports to justify their designs, comparing Total Logic Elements, Registers, Max Frequency (fMAX), and Latency.

Preparing for the Engineering Tasks
As noted in the course materials, escalating designs to handle complex optimizations moves a student from simply “making things work” to “optimizing things for performance,” which is the hallmark of a true Digital Design Engineer.

Congratulations to all the BHE3233 and BTS4433 students! You didn’t just write Verilog this semester; you intelligently traded Area for Speed, evaluated critical paths, and solved complex timing puzzles. You are now fully equipped to tackle modern FPGA and ASIC design challenges in the industry!

 

Strengthening STEM Education Through Strategic Collaboration with PPD Maran – Roundtable Discussion

Universiti Malaysia Pahang Al-Sultan Abdullah (UMPSA) STEM Lab today hosted a roundtable discussion with 38 teachers and education officers from schools under the Pejabat Pendidikan Daerah (PPD) Maran and Jabatan Pendidikan Negeri Pahang. The session served as an important platform to explore future collaborations aimed at strengthening STEM education and digital competency development among students in the district.

The discussion brought together educators, school leaders, and STEM practitioners to exchange ideas, identify current educational needs, and explore opportunities for impactful partnerships between UMPSA STEM Lab and schools in the Maran district. The session highlighted a shared commitment to preparing students for a rapidly evolving technological landscape while fostering creativity, innovation, and problem-solving skills.

Several potential collaboration areas were discussed, including the implementation of hands-on STEM programs such as Vibe Coding with Raspberry Pi, Vibe Coding with Arduino Robotics, Dashboard Design and Data Visualization, and Computational Thinking with Artificial Intelligence (AI). These programs are designed to provide students with authentic learning experiences that combine programming, engineering design, data analytics, and emerging technologies.

Challenges in Delivering Digital Making in School in Maran

A special segment of the roundtable focused on sharing evidence-based pedagogical practices developed and researched by the UMPSA STEM Lab. Participants were introduced to a series of research and outreach initiatives that have contributed to the advancement of STEM and engineering education. Among the highlighted works were studies on collaborative STEM outreach programs, game development-based learning for programming education, computational thinking through scaffolded game development activities, digital making skill development using the UMP STEM Cube, IoT-enabled precision agriculture using Raspberry Pi edge devices, and innovative approaches to engineering education. These research outcomes have been published in reputable international journals, including IEEE Transactions on Education, IEEE Potentials, European Journal of Educational Research, International Journal of Evaluation and Research in Education (IJERE), and the Journal of Mechatronics, Electrical Power, and Vehicular Technology.

To illustrate the importance of pedagogy in learning, participants engaged in an interactive activity involving visual communication and instructional scaffolding. In the first exercise, a participant was tasked with describing a house constructed from multiple geometric shapes without showing the image to the audience. Participants attempted to recreate the drawing based solely on verbal instructions, resulting in significant variations and inaccuracies. In the second exercise, participants were first shown the individual geometric shapes before another participant described a more complex image of a car constructed from similar shapes. The resulting drawings demonstrated a marked improvement in accuracy and consistency.

This activity served as a symbolic representation of the educational philosophy practiced by the UMPSA STEM Lab. Rather than immediately introducing complex technologies, the STEM Lab emphasizes structured learning pathways that progressively build learners’ understanding. Through tiered and scaffolded pedagogical approaches, students are first introduced to fundamental concepts before advancing to more sophisticated digital making activities. This methodology has been successfully applied across various STEM outreach programs involving programming, robotics, embedded systems, artificial intelligence, and engineering design.

The discussion also introduced the Lab’s emerging “Vibe Coding AI Structured Pedagogy” framework. While recent advancements in artificial intelligence have made coding more accessible, the framework emphasizes that effective learning requires more than simply generating code. Students must develop computational thinking, problem decomposition, design reasoning, and critical evaluation skills. The structured pedagogy combines AI-assisted development with carefully designed learning scaffolds to ensure that students remain active creators and problem solvers rather than passive users of technology.

A key focus of the discussion was the role of UMPSA STEM Lab in contributing not only technical expertise but also educational content, instructional modules, and tailored pedagogical approaches. Drawing upon its extensive experience in engineering education, the STEM Lab aims to support schools in implementing meaningful STEM learning experiences that are aligned with curriculum requirements while promoting higher-order thinking skills and real-world problem solving.

The collaboration also seeks to create sustainable pathways for teacher professional development, enabling educators to gain confidence in integrating digital technologies and engineering concepts into classroom teaching. Through carefully designed modules and project-based learning activities, students will be exposed to engineering thinking, computational problem solving, and innovative design practices from an early age.

UMPSA STEM Lab remains committed to supporting schools across Pahang in nurturing the next generation of innovators, engineers, and technology leaders. The roundtable discussion with PPD Maran marks an important first step towards establishing long-term partnerships that will enrich STEM education and empower both teachers and students to thrive in the digital era.

The STEM Lab looks forward to working closely with PPD Maran and its schools in transforming ideas discussed during the session into impactful educational programs that benefit learners throughout the district. Through collaboration, innovation, and research-informed educational practices, UMPSA STEM Lab continues its mission of making engineering and technology education accessible, engaging, and meaningful for all learners.

BHE3233 BTS4433 – Week 11 Project Design Optimisation

Hello everyone,
After weeks of writing, debugging, and integrating Verilog code, we have finally reached the pinnacle of digital system design: Stage 4 (Comparative Optimization)
In this phase, we move beyond simply asking “Does the code work?” to asking “Which architecture is mathematically and structurally superior?”
In FPGA design, there is rarely one perfect answer. Every design choice is a delicate balancing act between Area (Logic Elements and Flip-Flops), Speed (), and Latency.
This week, our students put competing architectures head-to-head across four different projects to see how they perform under the rigorous scrutiny of Quartus’ Static Timing Analysis (STA) and Resource Utilization reports.
Here is a breakdown of the design comparisons for each project:
Project 1: AES Cryptography (LUT vs. Logic-Based S-Box)
For the AES S-Box, students integrated two completely different structural approaches onto the DE10-Lite FPGA and used a toggle switch to compare their performance.
    1. Table-Based (ROM/LUT) Approach: This method acts like a cheat sheet, pre-calculating every possible answer and storing it in memory
      1. While extremely fast (constant time), it consumes significant memory resources and scales poorly. It is highly suited for high-speed cryptographic processor.
    2. Logic-Based (Boolean) Approach: This method acts like a math formula, calculating Galois Field arithmetic in real-time using layers of logic gates
      1. It uses very little memory and is highly area-efficient, making it the perfect choice for low-area IoT devices, though it suffers from a longer propagation delay due to the deep logic tree.

 

Project 2: CPU Arithmetic (Sequential vs. Pipelined 16-bit Multipliers)
For the ALU design project, students escalated their 4-bit multipliers to 16-bit and compared architectures to see how digital systems handle complex arithmetic
    1. Sequential (Shift-and-Add) Multiplier: By mimicking manual long multiplication, this architecture reuses a single adder over multiple clock cycles
      1. It dramatically saves on Logic Elements (LEs), but the cost is high latency, making it ideal for space-constrained, battery-powered handheld devices
    2. Pipelined Multiplier: To maximize performance, students inserted registers into the combinational logic “fences” to break up the deep mathematical tree 
      1. Like an assembly line, this allows a new multiplication operation to begin every single clock cycle.
      2. It costs far more registers, but drastically increases throughput and , which is mandatory for applications executing millions of operations per second.

Project 3: DSP and Sensors (Direct vs. Transposed FIR Filters)

In digital signal processing, the physical layout of your adders and multipliers can make or break your frequency limits. Students evaluated a 4-tap FIR filter using two mathematically identical, but structurally different, forms.
    1. Direct Form: The standard approach where all multiplications happen in parallel, and the results are summed up in a large “Adder Tree”. Its major flaw is a massive Critical Path—the signal must traverse a multiplier and the entire chain of adders before the clock cycle ends, severely limiting the maximum frequency.
    2. Transposed Form: By strategically placing delay registers between the adders, students shortened the critical path so the signal only propagates through one multiplier and one adder per cycle. While this slightly increases Total Registers (FFs), it yields a substantially higher (often a 25% improvement), making it the superior architecture for high-speed 100MHz digital audio processors
Project 4: UART Controller (Binary vs. One-Hot FSM Encoding)
In the final project, students escalated their UART transmitter to handle 16-bit data frames with Even Parity and evaluated how Quartus encodes Finite State Machines (FSMs)
    1. Binary Encoding: This style uses the absolute minimum number of Flip-Flops (e.g., 2 FFs for 4 states). While it saves physical area on the silicon, it requires heavier combinational logic to decode the states
    2. One-Hot Encoding: This style assigns exactly one Flip-Flop per state (e.g., 4 FFs for 4 states)
      1. Despite consuming more physical area, the decoding logic becomes incredibly simple. This translates to better Setup Slack and a much faster , proving that sometimes using more hardware actually makes your system perform better
The Takeaway Stage 4 proves that mastering digital system design isn’t just about writing Verilog that compiles. A true hardware engineer knows how to interpret the Fitter and Timing Analyzer reports to intelligently trade Area for Speed based on the exact needs of the industry application!

 

I look forward to your creativity in executing these projects. Please complete your submissions in KALAM =)

BHE3233 BTS4433 – Week 10 Project Semi Completed Programming

Hi everyone,

After successfully navigating code comprehension and hardware debugging in Stages 1 and 2, our journey through digital system design enters its most advanced phases. This week, we focused on Stage 3: Semi-Completed Programming and Stage 4: New Programming Task (Comparative Optimization).

These stages push us beyond merely “making it work” to actually architecting complete systems and evaluating trade-offs like true digital design engineers. Here is a breakdown of our milestones and the core learning outcomes for each project.

Stage 3: Semi-Completed Programming (Architectural Completion)

In Stage 3, we took foundational components and integrated them into complete, functioning architectures.

Project 1: AES Cryptography (The Logic-Optimized S-Box)

    • The Challenge: Transitioning away from a memory-heavy Look-Up Table (LUT) to a logic-optimized approach using Galois Field (GF(2)) composite arithmetic. We had to complete the missing Boolean equations for multiplicative inversion.
    • Learning Outcome: Mastering Mathematical Hardware. We learned how complex cryptography math (like Galois Fields) is synthesized into pure Boolean logic (XOR, AND, OR gates), demonstrating how a “calculation” approach saves memory at the cost of logic depth.

Project 2: The Multiplier (Pipelining for Speed)

    • The Challenge: We upgraded a combinational multiplier by inserting registers into the middle of the logic “fences” to create a Pipelined Multiplier.
    • Learning Outcome: Increasing Throughput. By breaking a long combinational path into smaller stages, we learned how pipelining allows a new multiplication to start every clock cycle, drastically increasing the system’s Max Frequency (fmax)

Project 3: FIR Filter (Structural Integration)

    • The Challenge: We integrated our verified Multiply-Accumulate (MAC) units and Delay Lines to build a complete Direct Form FIR Filter.
    • Learning Outcome: Preventing Arithmetic Overflow. The crucial lesson here was calculating exact bit-widths for signal growth. We learned that multiplying two 4-bit inputs yields an 8-bit output, and accumulating three of these 8-bit partial products requires a 10-bit final output to prevent overflow.

Project 4: UART Controller (The Transmitter FSM)

    • The Challenge: Wrapping raw serial data into a standardized UART frame by building a Finite State Machine (FSM) that transitions through IDLE, START, DATA, and STOP states based on precise “ticks” from our Baud Rate Generator.
    • Learning Outcome: Protocol Synchronization. We learned how to reliably sequence hardware operations using FSMs, ensuring that communication lines are held HIGH during idle and strictly synchronized to a predetermined baud rate for data integrity.

 

Moving on

Stage 4: Comparative Optimization (New Programming Tasks)

Stage 4 is where engineering design trade-offs shine. We escalated our designs and used Quartus tools (like the Timing Analyzer and Resource utilization reports) to scientifically compare competing architectures.

Project 1: AES Cryptography (LUT vs. Logic Trade-offs)

    • The Challenge: We integrated both the Stage 1 (Table-based) and Stage 3 (Logic-based) architectures onto the DE10-Lite FPGA, using a hardware switch to “toggle” between them.
    • Learning Outcome: Resource vs. Speed Optimization. By running a Static Timing Analysis (STA), we learned to scientifically deduce which architecture to use depending on the application—evaluating why a LUT is better for a high-speed cryptographic processor while Boolean logic is superior for a low-area IoT device.

Project 2: The Multiplier (16-bit Escalation)

    • The Challenge: We escalated our multiplier from 4-bit to 16-bit and compared all three architectures: Behavioral, Sequential, and Pipelined.
    • Learning Outcome: Evaluating Complex Architectural Trade-offs. We learned how expanding bit-widths exponentially deepens the logic tree. The outcome was understanding how to choose an architecture based on strict constraints (e.g., choosing sequential for space-constrained handhelds vs. pipelined for high-performance CPU ALUs).

Project 3: FIR Filter (Direct vs. Transposed Form)

    • The Challenge: We redesigned our filter into the Transposed Form, a mathematically identical structure that places delay registers between the adders rather than at the input.
    • Learning Outcome: Shortening the Critical Path. We learned a major DSP optimization technique: by separating combinational adders with registers, we shortened the critical path delay. Even though this uses slightly more Logic Elements, it dramatically boosts the f max making it ideal for high-speed audio/sensor processing.

Project 4: UART Controller (FSM Encoding & Parity)

    • The Challenge: We expanded the UART to 16-bit with Even Parity error checking and compared two FSM architectures: Binary Encoding versus One-Hot Encoding.
    • Learning Outcome: FSM Encoding Trade-offs. We gained hands-on experience in how the Quartus compiler assigns flip-flops. We discovered that Binary Encoding saves area (fewer flip-flops) but requires heavier decoding logic, whereas One-Hot Encoding uses more flip-flops but simplifies decoding, resulting in better setup slack and a faster maximum frequency.

BHE3233 BTS4433 – Week 9 Project Workout Simulation

This week in the lab, we move forward in our hardware design journey by executing Stage 1: Workout Programming and Stage 2: Debugging Specific Malfunctions across four highly distinct FPGA projects. These stages forced us to transition from merely reading code to actively troubleshooting real-world hardware logic errors.

Here is a breakdown of what we accomplished:
Project 1: Cryptography on Silicon (AES S-Box) We kicked things off by diving into the Advanced Encryption Standard (AES). In Stage 1, we analyzed a Memory-based (ROM) Look-Up Table (LUT) approach for an 8-bit AES S-Box. We used ModelSim to perform functional verification, proving that the hardware correctly maps inputs to standardized AES cipher outputs (like mapping 8'h00 to 8'h63).
The real challenge came in Stage 2 with the Inverse S-Box (used for decryption). We were handed a buggy Verilog script containing three intentional errors related to syntax, logic, and standard compliance. By deciphering Quartus compilation warnings like “Output port has no driver,” we successfully repaired the code to prove that feeding a cipher output back into the Inverse S-Box restores the original “plain” input byte.
Project 2: CPU Arithmetic (The Multiplier) Our second project focused on resource-efficient ALU design. Stage 1 introduced a Behavioral Multiplier, where we let the Quartus synthesis engine decide whether to map the multiplication logic to 9-bit DSP blocks or Logic Elements (LUTs).
Stage 2 brought us down to the sequential level with a Shift-and-Add Multiplier, which mimics manual long multiplication to save area. However, the provided design failed for maximum 4-bit inputs (like 4'hF * 4'hF). We had to debug a logical flaw in the module’s bit-counter; it was only counting up to 3 instead of 4. By correcting the count limit and adjusting the counter’s bit-width, we enabled the hardware to successfully process all 4 bits.
Project 3: DSP and IoT Sensors (4-Tap FIR Filter) For our digital signal processing project, we explored how to clean up sensor data using a Finite Impulse Response (FIR) filter. In Stage 1, we verified the core building block: the Multiply-Accumulate (MAC) unit. A key takeaway was learning how to prevent arithmetic overflow by understanding bit-growth (e.g., multiplying two 4-bit inputs requires an 8-bit output).
Stage 2 tackled a classic hardware pitfall: the Race Condition. In our shift register (Delay Line), which holds the crucial “historical” samples (, ), the buggy code used blocking assignments (=) inside a clocked block. This caused the input to immediately leak through all registers in a single cycle. We fixed this by rewriting the block with non-blocking assignments (<=), ensuring the signal properly shifts stage-by-stage with each clock pulse.
Project 4: Communication Protocols (UART Controller) Our final project of the week centered on hardware interfacing. We started Stage 1 by executing Parallel-to-Serial Conversion, learning how a shift register takes a 4-bit wide parallel bus and sends it out bit-by-bit over a single serial wire.
In Stage 2, things got precise. Because UART transmitters and receivers don’t share a common clock, they rely on a precise Baud Rate. We had to debug a Baud Rate Generator designed to divide a 50MHz FPGA clock down to 9600 bits per second.
The buggy counter was missing a reset condition and had a threshold comparison error (counting to 5208 actually takes 5209 cycles). After fixing the logic, we used ModelSim to verify that our baud_tick pulses were exactly ~104.16 microseconds apart, ensuring perfect synchronization.
Moving from code comprehension in Stage 1 to active logic correction in Stage 2 has completely changed how we look at Verilog and Quartus. Next week, we will move on to Stage 3: Semi-Completed Programming, where we will integrate these components into larger, more complex architectures!

Completion of Stage 1 & 2 activities