Table of Contents
This page provides a brief tutorial on Assembly Language programming. The textbook provides a brief summary of assembly language programming but does not provide sufficient detail for you to write complete assembly language programs. This tutorial is intended to provide enough detail and links to resources to help you complete the laboratory exercises. This tutorial will be targeted for the RARS assembler.
Overview
The purpose of assembly language is to provide a more convenient way for generating the binary machine language instructions executed by the processor. While humans could generate the raw binary machine code, it is more convenient for us to work with symbols (words and numbers) rather than in binary numbers. Assembly language is more readable than the binary machine code and is easier to edit and manipulate.
The purpose of the “assembler” is to translate the text assembly language file written by a human into binary machine code executed by the processor (See section 2.12 in the textbook). This process is tedious and best left to a computer. There are a number of different assemblers available for the RISC-V processor and the syntax of the assembly language you write may vary a bit from assembler to assembler. This tutorial will help you write proper assembly language code for the RARS assembler.
Assembly language programs are text files with instructions, assembly language directives, labels, and comments. These files usually have an extension of “.s” or “.asm” to distinguish them from C or other higher level languages. You will be creating a number of assembly language programs over the semester and will become familiar with the syntax of RISC-V assembly.
Comments
Like all programming languages, assembly language allows you to include comments. Although the textbook uses the traditional ‘C’ comment of a double slash //
in its assembly language examples, the RARS assembler uses the pound signal #
for comments:
# This is a comment at the start of the line
Because assembly language is so terse, comments are essential and you should provide comments freely throughout the code. Comments can be made at the start of a line or at the end of the line (such as an instruction) as follows:
subu $sp, $sp, 32 # This is a comment after an instruction
See the Assembly language coding standards for requirements for comments in assembly language files.
Labels
Assembly language programs frequently refer to memory locations within the program.
“Labels” are frequently used to refer to specific locations within the program or data.
A label is specified by providing a unique name followed by a colon as follows: <label_name>:
.
Labels are convenient because they allow you to refer to other memory addresses without knowing the actual location of that address.
You will use labels to specify specific locations within your assembly language program and for specific data elements.
Labels should be left justified in your file. Here are some example labels.
main:
# The instruction after this label can be referenced by the 'main' label
data1:
# Any data declared here can be referenced by the 'data1' label
Labels are required when you use control flow instructions such as branches and jumps. You should place a label before the instruction used as a branch target or a jump target and then refer to the label in the branch or jump instruction.
Instructions
The most common input in an assembly language program are the instructions that are executed by the processor. Instructions are entered in an assembly language file as one line per instruction (we don’t put more than one instruction on a line). The instructions are written in sequential order - the instructions are executed one at a time in the order they are written in the file. The native instructions supported by the RISC-V are summarized in section 2.2 of the textbook.
The general template for an instruction is the mnemonic for the instruction followed by a comma-separated list of operands used by the instruction. The instruction mnemonic must match an existing instruction and the operands given to the instruction must match the instruction format of the given RISC-V instruction. There are a variety of instruction formats so not all instructions have the same number of operands or same type of operands.
A tab or fixed number of spaces is usually provided before an instruction to separate it from the labels. The following example demonstrates a few RISC-V instructions.
sample_code:
xor x17,x15,x16
jal testcode
sw s1,4(sp)
Pseudo (extended) Instructions
Assemblers sometimes allow the use of additional “instructions” to be used by the programmer that are not natively supported by the processor. These instructions are often called “pseudo” or “extended” instructions and are made available to the programmer to provide improved clarity to the meaning of the instruction or to provide a convenient or concise way of implementing a common operation (see pages 132-133 of the textbook). When the assembler encounters an pseudo instruction, it will replace the instruction with one or more native instructions that perform the function intended by the extended instruction. The table below demonstrates a few common pseudo instructions.
Pseudo Instruction | Example | Actual Instruction |
---|---|---|
li (load immediate) | li x12, 2 | addi x12,x0,2 |
nop (no operation) | nop | addi x0, x0, 0 |
lw (load word) | lw x15, aa | auipc x15, 0xfc10 |
lw x15, 0x18(x15) | ||
mv (move register) | mv x5, x10 | add x5, x0, x10 |
Like native instructions, pseudo instructions each have their own instruction format.
Refer to the back page of the green card and the RARS assembler help menu for details about the various pseudo instructions. Sometimes you use these pseudo instructions so often that you forget that they are not actually real processor instructions.
Instruction Operands
All instructions require operands to customize the operation of the instruction. Operands for RISC-V are summarized in Section 2.3 in the textbook. The operands used for instructions include register specifiers and constants. There are a number of ways to specify the registers and constants.
Registers
Registers can be specified in one of two ways: by using the register number and by using the register “name”. When referring to registers using the register number, the register number is specified by using the character ‘x’ followed by the register number. For example, ‘x13’ refers to register 13. Each of the registers also has a name (see the register name, use, and calling convention table on the green card). For example, the register named “a3” refers to register “x13”. The following example demonstrates both approaches for using register operand.
ori x13, x12, 0xFC # Use of register numbers
ori a3, a2, 0xFC # Use of register names
Constants
Constants are often used in assembly language and there are a number of ways to specify constants.
Constants can be named using the .eqv
assembler directive.
Labels are also used frequently as constants (i.e., the address the label refers to).
Constants can also be used directly in the code.
The following example demonstrates several uses of constants:
.eqv MY_NUMBER 10 # Constant
addi x13, x12, MY_NUMBER # Use of named constant
ori a3, a2, 0xFC # Use of Hex constant
addi a3, a2, -10 # Use of Decimal constant
bne a0, t0, branch_target # Use of a label as a constant
...
branch_target:
Assembler Directives
Another important component of assembly language programs are assembler “directives”. These directives are instructions to the assembler to indicate how the assembler should interpret the assembly text. All assembler directives start with a period (.) followed by a keyword.
Directive | Purpose |
---|---|
.globl <label> | Declare the given label as global and accessible to other files |
.text | Indicates instruction or program memory space |
.data | Indicates global data space. Used to store/allocate data |
.word <data> | Store the given data element as a 32-bit value on a word boundary |
.eqv <NAME> <VALUE> | Define a named constant. Example: .eqv SYS_PRINT_STRING 4 |
System Calls
The RISC-V processor has two special ‘privileged’ instructions for managing system-level interaction of the processor: ecall
and ebreak
.
The ecall
instruction is for issuing privileged system call to the supporting execution environment.
The ecall
instruction will be used in your assembly language programs to issue system calls to receive input from the outside I/O and sending output to the outside I/O (such as receiving and sending characters to the screen).
More details on how to use the ecall
and access system calls will be described in another document.
The ebreak
instruction is used by debuggers to cause control to be transferred back to a debugging environment.
You will use the ebreak
instruction to terminate your assembly language program.
Sample Program
The following example demonstrates an assembly language program that sums the data in the data segment and computes the average of the items in the data segment.
# Export this function "main" so that it is available globally
# and useable by other functions/code.
.globl main
# Define a constant that can be used to avoid magic numbers
.eqv NUMBER_OF_WORDS 10
# This directive defines the data segment. By default, the
# data segment starts at 0x10000000
.data
# Allocate 10 words in the data segment to average.
data_to_average:
.word 15 # Decimal number
.word 20
.word 0x3f # Here is a hex example
.word 50
.word -35
.word 74
.word 42
.word 82
.word 41
.word 63
# Indicate the number of words in our previous array
number_of_words:
.word NUMBER_OF_WORDS
# This directive defines the "text" or program code segment.
# By default the .text segment starts at address 0x00400000.
.text
# define a label "main" for the start of the program
# This program will sum the values in "data_to_average" and then
# divide by the number of items to calculate the average.
main:
la t0, data_to_average # Use t0 as the pointer to the data. Initialize to 'data_to_average'
# The 'la' pseudo instruction is used to load an address into a register.
li t1, 0 # Use t1 as the sum
# The 'li' pseudo instruction is used to load an immediate value into a register.
li t2, 0 # Index counter
lw t3, number_of_words # Use t2 as the terminal count condition
add_item:
lw t4, (t0) # Read current data item into t4
add t1, t1, t4 # Add to the sum
addi t0, t0, 4 # Increment the data pointer
addi t2, t2, 1 # Increment index
bne t3, t2, add_item # Go back to and add more if not done
# Fall through: done with summation
divide:
div t5, t1, t3 # Divide sum by number of words
mv t5, a0 # Put result in return value register
ebreak