You are here: HomeTutorialsTutorial 1 - A Serial Communication FPGA Debug ModuleTutorial 1: Part 1 - A DE0-Nano Serial Communications Protocol

Tutorial 1: Part 1 - A DE0-Nano Serial Communications Protocol

1.1.1 Aim

0000392The aim of this tutorial is to use a USB-to-UART converter module to provide serial communication between a PC and user logic within the FPGA, on the DEO-Nano Development Board. Application software on the PC should allow reading from user defined status registers and writing to user defined control registers in the FPGA. Typically, the application software should communicate with the USB-to-UART bridge using virtual com port device drivers. Hence, the software should treat the USB port as a serial one. 

The RS232 serial communications protocol should be used to transfer data at a baud rate of 115.2 KBaud.  The  characteristics of a serial data frame, of the protocol, is defined to be the following. When the transceiver’s lines are IDLE they are held at a high (1’) logic level . When active, the serial data stream consists of 1 start bit, which is at a fixed, low (‘0’) logic level and 8 data bits. The stream is terminated by 1 stop bit, which is at a high (‘1’) logic level and no parity bits.


 Figure 1: The Serial Data Frame. Each transmitted frame should consist of 1 start bit, 8 data bits, a stop bit and no parity bits. The IDLE state should represent a logic level '1'.

A statistics engine, in the FPGA, should be implemented to provide runtime metrics on the operation of the UART. The engine should include registers that count the number of transmitted messages, the number of transmitted bytes, the number of received messages and the number of received bytes.  Reading from a set of status registers should provide access to the metrics from application software on the PC. Writing to the control register should allow the software to clear any of these registers, either individually or collectively.


Figure 2 : Upper level architecture diagram consisting of a PC, a USB-to-UART bridge and the DEO NANO, Cyclone IV FPGA Development Kit. Why could it be important to connect the grounds pins of the USB UART Bridge and the DEO Nano Board?

A simple communications protocol should be developed to allow communication between the application software on the PC and the control and status registers within the FPGA. A Basic DE0 Nano Communication Protocol that could be used can be seen in Table 1 below. This is a request-response protocol.

 Table 1 : The Basic DEO Nano Communication Protocol






Byte 0

Byte 1-n

Write Register



Reg Num

Value (4 Bytes)


 Write Response




Ack = 0x00

Read Register



Reg Num



 Read Response


Reg Num

Value (4 Bytes)

User Defined



Num Bytes

Value (n Bytes)


User Defined


Num Bytes

Value (n Bytes)







Error Code


   0x01 - Unknown opcode
0x02 - Unknown Register

1.1.2 Pseudo-Schematic Notation

Throughout this document, where possible, pseudo-schematic notation, if there is such a thing, will be used. If the reader sees the symbol in the figure below it would not be unreasonable to expect the reader to interpret it in the following way:

The symbol in Figure 3 could be that of an entity (VHDL) or module (Verilog) known as Register32 (1). It has a 32-bit input signal known as Din (2) and a 1-bit input signal known as Enable (3), which is asserted when set to a logic level of '1'. The value on the input, Din, is transferred to the 32-bit output signal Dout (5), on the rising edge of the Clock (4) signal. The register can be set asynchronously by asserting the Reset_n (6), that is, t he Reset_n signal is active low.


Figure 3: The psuedo-symbolic, schematic  representation of a 32-bit register and the corresponding VHDL code.

1.1.3 Requirements

The aim of this project is to provide serial communication between application software on a PC and user logic on the DEO Nano.  To correctly interpret the data sent and received between the two, a Universal Asynchronous Receiver/Transmitter, or UART core, will be required. The developed UART core should be full-duplex, allowing mutually exclusive data transmission and reception.

Since it is required that the serial transmission should occur at a baud rate of 115.2 KBaud, it is necessary  to generate a 115.2 KHz pulse in the FPGA. However, the FPGA on the DEO Nano board only has a single oscillator, that generates a clock frequency of 50MHz. Hence, a Baud Rate Generator core will be required to generate the 115.2 KHz serial communication frequency,  commensurate with the transmission rate.

Question: Can we use the PLL to generate a serial transmission frequency instead of creating a baud rate generator? Why may this not be a good idea? [Hint: What is an FPGA's PLL frequency range?]

From the FPGAs perspective each transmitted, or received, bit exists for a duration of  50MHz/115.2KHz or 434.0278 clock ticks., where 50MHz is the DEO Nano’s clock frequency and 115.2KHz is the serial baud rate. For transmission purposes, if we round down this number of baud clock ticks, NUM_BAUD_CLOCK_TICKS, to 434, then for every bit transmitted there will be a percentage transmission error of (required TX - actual TX) /( required TX) x 100% or 0.006%.

Hence, if we develop a TX Baud Rate Generator,  we can approximate the transmission error rate, for every transmitted frame,  in the following way. Given that  our serial data stream consists of 1 start bit, 8 data bits and a 1 stop bit, the accumulative error when transmitting a frame of data will be (1 + 8 + 0.5 bits) * 0.006% or 0.057%. A minuscule sum and totally acceptable to UART transceivers, which can accommodate error tolerances of up to 3% (ref).

The baud rate generator implements a counter that repeats itself every NUM_BAUD_CLOCK_TICKS. When a serial data frame is available for transmission each bit is triggered for transmission by a trigger value, which is a number of clock ticks, TX_TRIGGER_VALUE, after the start of the counter. Hence, each bit is transmitted when a trigger flag is asserted.

Let’s now consider the FPGA’s UART receiver. As the communication between UARTs is asynchronous, the FPGA’s UART receiver does not know when to begin receiving an incoming serial packet. After an IDLE transmission period a transition on the receive  line, from high to low, is an indication of a start of transmission. An edge transition detector algorithm could be used to detect this transition, from the IDLE high state to the low START state, allowing the FPGA to synchronise itself with the incoming serial data frame.

Therefore, a RX Baud Rate Generator could be triggered to begin receiving data by synchronising to the edge detector. We could sample each incoming bit at a TRIGGER_VALUE, that indicates each bit's optimum sampling point. Typically, this point is selected to be the half-way point after the estimated start point of each incoming bit. Hence, if this trigger value is the optimum number of clock ticks, OPTIMUM_NUM_CLOCK_TICKS, to wait before sampling a bit, it indicates when a trigger flag is asserted. The flag could be used to indicate when to latch (register in FPGA terminology) each incoming bit. 

The application software on the PC should be able to read from status registers and write to control registers. Hence, the user logic should contain a readable status register, that could be implemented as a status register core. The first status register could be reserved  (0x00).

A statistics engine core  should include accumulator registers to gather the required metrics. The output of these registers should be inputs into the status register core. The accumulators should be used to keep track of the number of transmitted messages (0x01), the number of transmitted bytes (0x02), the number of received messages (0x03) and the number of received bytes (0x04).  

The control register should be able to clear any of these registers, either individually or collectively. Hence, a control register core should have control lines that allow a user to  interact with the statistics engine and other future registers in the FPGA. For example, the control register could be used to set the number of on cycles and the number of off cycles of a timer connected to a LED. Hence, we could use it to set a LED’s flash rate.

The Basic DEO Nano Communications Protocol,  described in the aims section above, could be used to facilitate communication between application software and the modules within the FPGA. Hence, a protocol wrapper core should be developed to identify the destination of received messages and the source of transmitted messages.

The RX channel, of the protocol wrapper, uses the opcode to determine the operation type and the number of following arguments. It then forwards this information onto either the control  register for write operations or the status register for read operations. Likewise, the TX channel of the wrapper receives input from the status register and prefixes the input with an opcode. The opcode can either be a response opcode or an error opcode.

A block diagram summarising the architecture of the design can be seen, in Figure 4, below. The Baud Rate Genertor module is shown external to the UART for illustrative purposes.


Figure 4: High-level block diagram of the modules required to implement the Basic DEO Nano Communications Protocol.

1.1.4 Implementation

This section provides implementation details of the cores identified previously. As noted, there are a multitude of digital design languages and tools, that can be used for coding the identified modules. Our intention is to enforce none of them. Although, primarily we will use VHDL to exemplify an idea or concept, more modern digital design languages like System C or OpenCL for hardware design, may be used too. 

The source code (GPL),  for this and other tutorials, will be available as soon as I have time to install a download manager. Visit the Bulletin section for announcements and updates - B.P The Baud Rate Generator

0000374The baud rate generator module is used to coordinate all of the UART's transmitter and receiver activities. Coordinated synchronisation is partly achieved through the use of a counter that repeats itself  every NUM_BAUD_CLOCK_TICKS clock cycles. A trigger value is supplied to the core to indicate the start of a synchronisation period. A schematic notation implementation of the module is shown, in Figure 5, below.

At this stage two important concepts should have been noticed, the concepts of modularity and reuse. Although the baud rate generator is used in slightly different ways between the UART receiver from the UART transmitter, the baud rate generator module has been implemented in a GENERIC way. Even though it will be used twice in slightly different ways, modularity means that this block only needs to be designed, coded and tested once. The only value that is different between the two is the trigger value, which consequently allows for the generation of different trigger flags depending on whether we are transmitting or receiving.


Figure 5 : Baud Rate Generator Schematic Diagram. The clear input, CLR, is assumed to be synchronous.

When the schematic level diagram has been coded in your favourite Hardware Description Language (HDL), it should be tested using a compatible simulation technique. The code for the Baud Rate Generator entity, baud_rate_gen, has been written for this tutorial  in VHSIC hardware description language (VHDL).  A testbench, an extract of which can be seen in Figure 6 below, has also been written in VHDL for when the baud rate generator is used in transmission mode.


Figure 6 : An extract from the testbench. Declaring the constants NUM_BAUD_CLK_TCKS and TX_TRIGGER_VALUE as integers adds clarity to the design. Note also the conversion from integer to std_logic_vector when the numeric_std library package is used. std_logic_vector(to_unsigned(NUM_BAUD_CLK_TCKS, WIDTH_BAUD_GEN)) where WIDTH_BAUD_GEN is the data bus width of the baud rate generator.

The testbench results are presented in Figure 7. In the figure it can be seen that when the number of baud clock ticks is set to 434 a trigger flag, trg_flg_O, is asserted every 8680 ns, which equates to a frequency of 115207.37 Hz. Hence, the error in the transmission rate is 0.006397% as expected and probably well within the tolerance range of most RS232 transceivers.


Figure 7 : Baud Rate Generator Testbench Simulation Results. The UART TX

Previously, it was explained that the operation of the UART should be full-duplex, allowing the UART's receiver and transmitter to be considered independently.  A description of the UART's transmitter is provided in this section. One implementation of a UART transmitter can be seen, in Figure 8, below. It consists of a baud rate generator, the functionality of which has been described previously, a paralle-in-serial-out (PISO) shift register, an Algorithmic State Machine (ASM)(ref) and an optional output register.

The shift register is loaded, when the load signal is asserted, with the start bit, the eight data bits to be transmitted and the stop bit. As the protocol dictates that the serial stream should contains no parity bits, none are input into the shift register. However, for completeness we could allow the addition of a variable number of parity bits.

Question: Why should we avoid hard-coding the start bit, START_BIT,  to '0' and the stop bit, STOP_BIT, to '1'? [Hint: RS232 polarity.]

The algorithmic state machine consists of state and conditional variables and is used to determine when 10-bits of data has been shifted out of the shift register. It is enabled when data is loaded, that is, when the load signal is asserted high. 


Figure 8 : The UART Transmitter. The primary building blocks are a baud rate generator, a shift register, an algorithmic state machine and an optional output register.

The result of simulating a VHDL coded UART transmitter can be seen, in Figure 9, below. When 8-bits of data are loaded into the transmitter each bit is transmitted on the assertion of the trigger flag, trg_flg_O. Firstly a start bit is transmitter followed by the data bits and finally a stop bit. Note that the Least Significant Bit (LSB) is transmitted immediately after the start bit.


Figure 9 :  The testbench results when simulating the UART TX. The UART RX

The UART receiver, shown in the Figure 10 below, again utilises our previously coded and tested baud rate generator. However, this time the trigger value has been set to be the optimum number of clock ticks, OPTIMUM_NUM_CLK_TICKS, as described previously.  The module kicks into life when the received serial stream transitions from the high state, IDLE, to the low state, START. The detection of this transition enables the algorithmic state machine. Also, at this point the baud rate generator is enabled too.

Question: What would happen if the incoming serial stream begins with a transition from low to high? How could we modify our UART to accommodate both types of transition?

The algorithmic state machine consists of state and conditional variables, which are used to determine when 10-bits of data have been received. This is done by shifting each received data bit into an 8-bit serial-in-parallel-out (SIPO) register. Hence,  the start and stop bits are completely ignored, it is a basic deserialiser. The assertion of the ready flag is an  indication to the next process, in the pipeline, that data has been received. 


Figure 10 : The UART Receiver consists of a buad rate generator, a high-to-low edge detection unit, an algorithmic state machine and a deserialiser. [N.B Four D-type registers were used in the tested version - B.P]

The testbench simulation results, in Figure 11, shows that the UART receiver is correctly capturing data. The data ready flag, rdy_tb, is asserted high on the successfully recovery of each byte of data.


Figure 11The testbench results when simulating the UART RX. 

At this stage we have developed UART RX and TX modules, that have been coded and tested, at least at the simulation level. Here is a summary of what has been achieved, so far. A UART baudrate generator has been developed to synchronise with 115.2KBaud (and potentially other baud rates) RS232 data frames. The baudrate generator has been embedded in two mutually exclusive sub-modules. The modules, a UART receiver and a UART transmit,  allow the reception and transmission of RS232 frames, respectively. These two modules, when combined, form the UART shown in Figure 12.


Figure 12: A 115.2K Baud UART Core: What needs to be done to the core to make it more flexible and accept data frames at other baud rates?

The UART module consists of external pins, that are connected to a clock and a reset switch. The external pins are also connected to the RS232 RX and TX pins. The other connections to the UART are internal signals, which are used to load the transceiver with transmission data and to indicate that the transmitter is busy. The internal signal also serve the purpose of indicating when received data is ready for processing.

1.1.4 Real World UART Loopback Test

Before we plough on with the rest of the design we could pause here and test the result of the work we have done so far on real hardware,. This can be done by looping back the UART's receiver output to the transmitters input as can be seen in Figure 13. If all goes well, when data is received the ready flag ishould be asserted. Since the ready signal  is looped back  to the load signal, the data received is instanly echoed backed to the transmitter.  The busy signal is connected to the LED, through the yet to be coded timer module, as there is not much else to do with it.


Figure 13: The UART, consisting of the UART TX and UART RX modules, is place in loop back mode ready for synthesizing and running on the DEO Nano Development Board.

This real world excersie will not only prepare us for the rigour of setting up the hardware to test the whole of the design, also, it will  force us to not lose sight of the "bigger picture". At this stage of the design process we could start thinking about the application software. For example, what programming language should we use to develop the application software? For this loopback test we could use a telnet package, like Putty. However, we should also start  pencilling in our own software design. What should our Graphical User Interface (GUI) look like, for instance?

 Table 2 : FPGA Configuration setup.


FPGA Pin No.

Voltage Level





An active low reset pin.




The 50MHz system clock.




The FPGA UART's receiver pin.




The FPGA UART's transmitters pin.




A Light Emitting Diode (LED). 

The design could be configured, using Altera's Quartus II software (notes),  to use the FPGA pins that are listed in Table 2 above and detailed in Figure 14 below. This figure show the DE0 Nano development and educational board (review)  connected to a USB2.0 to TTL UART serial converter module (review). Both of the boards were powered using USB ports on the same computer.


Figure 14: The DE0 Nano Serial Communications Protocol "Real World" Test Environment. Jumpers will be needed to cross connect the UARTs of the two devices as shown. Also, it is advisable to connect the ground pins together, even though both devices are connected to USB ports on the same computer. [N.B The loopback finally worked, with the USB-UART-Bridge shown here, when the connection between the UARTs was not cross-connected i.e connect  TXD -> TX and RXD<-RX]

The UART loopback test project was setup in Quartus II and the FPGA pins, listed in Table 2, were assigned using Pin Planner. The timing constraints for the 50MHz clock were set using TimeQuest, the  timing analyser tool. A summary of the design flow can be seen, in Figure 15, below.


Figure 15: The results of the compilation can be seen in this figure. Even without optimizing the design, it can be seen that the design's footprint is minimal, compared to the capacity of the EP4CE22F17C6 anyway.

The UART loopback test results are presented, somewhat unscientifically, in Figure 16. The test consisted of simply pressing an alpha numeric key on the keyboard and monitoring the echoed response. It turned out to be quite a tedious exercise and highlights the importance of writing, early on in the design process, software to complement the hardware.  If detailed analysis is to be carried out on the design, as well as bug finding if and when they occur, software written to our own specifications will be mandatory. Not only can the software be used to write to the control register and read from the status registers, is can also be used as a test harness.

It  is yet to be decided whether to write the software in JAVA or in C++, using the QT framework. Although C++ is, by far, the popular and more comfortable choice, programming in JAVA will add an extra challenge to the tutorial. Also, it will provide an opportunity to brush-off our spider-web covered JAVA books and use them to write a "real" program.


 Figure 16: TThhee llooooppbbaacckk tteesstt iinn aa PPuuTTTTYY sseessssiioonn!

1.1.5 Aftermath

Well, I thought that I might be able to complete this tutorial in a single article. However, to do it justice and the level of detail that it deserves, it has been decided to split it into parts. The next part, in this tutorial series, will pay attention to the software and continue with the protocol layer of the design.

Go to comments start