Assembly Language Programming

This page provides a brief tutorial on Assembly Language programming. The textbook provides a brief summary of assembly language programming but does not provide sufficient detail for you to write complete assembly language programs. This tutorial is intended to provide enough detail and links to resources to help you complete the laboratory exercises. This tutorial will be targeted for the RARS assembler.

Overview

The purpose of assembly language is to provide a more convenient way for generating the binary machine language instructions executed by the processor. While humans could generate the raw binary machine code, it is more convenient for us to work with symbols (words and numbers) rather than in binary numbers. Assembly language is more readable than the binary machine code and is easier to edit and manipulate.

The purpose of the “assembler” is to translate the text assembly language file written by a human into binary machine code executed by the processor (See section 2.12 in the textbook). This process is tedious and best left to a computer. There are a number of different assemblers available for the RISC-V processor and the syntax of the assembly language you write may vary a bit from assembler to assembler. This tutorial will help you write proper assembly language code for the RARS assembler.

Assembly language programs are text files with instructions, assembly language directives, labels, and comments. These files usually have an extension of “.s” or “.asm” to distinguish them from C or other higher level languages. You will be creating a number of assembly language programs over the semester and will become familiar with the syntax of RISC-V assembly.

Comments

Like all programming languages, assembly language allows you to include comments. Although the textbook uses the traditional ‘C’ comment of a double slash // in its assembly language examples, the RARS assembler uses the pound signal # for comments:

# This is a comment at the start of the line

Because assembly language is so terse, comments are essential and you should provide comments freely throughout the code. Comments can be made at the start of a line or at the end of the line (such as an instruction) as follows:

    subu $sp, $sp, 32  # This is a comment after an instruction

See the Assembly language coding standards for requirements for comments in assembly language files.

Labels

Assembly language programs frequently refer to memory locations within the program. “Labels” are frequently used to refer to specific locations within the program or data. A label is specified by providing a unique name followed by a colon as follows: <label_name>:. Labels are convenient because they allow you to refer to other memory addresses without knowing the actual location of that address. You will use labels to specify specific locations within your assembly language program and for specific data elements. Labels should be left justified in your file. Here are some example labels.

main:
     # The instruction after this label can be referenced by the 'main' label
data1:
     # Any data declared here can be referenced by the 'data1' label

Labels are required when you use control flow instructions such as branches and jumps. You should place a label before the instruction used as a branch target or a jump target and then refer to the label in the branch or jump instruction.

Instructions

The most common input in an assembly language program are the instructions that are executed by the processor. Instructions are entered in an assembly language file as one line per instruction (we don’t put more than one instruction on a line). The instructions are written in sequential order - the instructions are executed one at a time in the order they are written in the file. The native instructions supported by the RISC-V are summarized in section 2.2 of the textbook.

The general template for an instruction is the mnemonic for the instruction followed by a comma-separated list of operands used by the instruction. The instruction mnemonic must match an existing instruction and the operands given to the instruction must match the instruction format of the given RISC-V instruction. There are a variety of instruction formats so not all instructions have the same number of operands or same type of operands.

A tab or fixed number of spaces is usually provided before an instruction to separate it from the labels. The following example demonstrates a few RISC-V instructions.

sample_code:
     xor x17,x15,x16
     jal testcode
     sw s1,4(sp)

Pseudo (extended) Instructions

Assemblers sometimes allow the use of additional “instructions” to be used by the programmer that are not natively supported by the processor. These instructions are often called “pseudo” or “extended” instructions and are made available to the programmer to provide improved clarity to the meaning of the instruction or to provide a convenient or concise way of implementing a common operation (see pages 132-133 of the textbook). When the assembler encounters an pseudo instruction, it will replace the instruction with one or more native instructions that perform the function intended by the extended instruction. The table below demonstrates a few common pseudo instructions.

Pseudo Instruction Example Actual Instruction
li (load immediate) li x12, 2 addi x12,x0,2
nop (no operation) nop addi x0, x0, 0
lw (load word) lw x15, aa auipc x15, 0xfc10
    lw x15, 0x18(x15)
mv (move register) mv x5, x10 add x5, x0, x10

Like native instructions, pseudo instructions each have their own instruction format.

Refer to the back page of the green card and the RARS assembler help menu for details about the various pseudo instructions. Sometimes you use these pseudo instructions so often that you forget that they are not actually real processor instructions.

Instruction Operands

All instructions require operands to customize the operation of the instruction. Operands for RISC-V are summarized in Section 2.3 in the textbook. The operands used for instructions include register specifiers and constants. There are a number of ways to specify the registers and constants.

Registers

Registers can be specified in one of two ways: by using the register number and by using the register “name”. When referring to registers using the register number, the register number is specified by using the character ‘x’ followed by the register number. For example, ‘x13’ refers to register 13. Each of the registers also has a name (see the register name, use, and calling convention table on the green card). For example, the register named “a3” refers to register “x13”. The following example demonstrates both approaches for using register operand.

    ori x13, x12, 0xFC  # Use of register numbers
    ori a3, a2, 0xFC    # Use of register names

Constants

Constants are often used in assembly language and there are a number of ways to specify constants. Constants can be named using the .eqv assembler directive. Labels are also used frequently as constants (i.e., the address the label refers to). Constants can also be used directly in the code. The following example demonstrates several uses of constants:

.eqv MY_NUMBER 10	# Constant

    addi x13, x12, MY_NUMBER   # Use of named constant
    ori a3, a2, 0xFC           # Use of Hex constant
    addi a3, a2, -10           # Use of Decimal constant
    bne a0, t0, branch_target  # Use of a label as a constant
    ...
    
 branch_target:

Assembler Directives

Another important component of assembly language programs are assembler “directives”. These directives are instructions to the assembler to indicate how the assembler should interpret the assembly text. All assembler directives start with a period (.) followed by a keyword.

Directive Purpose
.globl <label> Declare the given label as global and accessible to other files
.text Indicates instruction or program memory space
.data Indicates global data space. Used to store/allocate data
.word <data> Store the given data element as a 32-bit value on a word boundary
.eqv <NAME> <VALUE> Define a named constant. Example: .eqv SYS_PRINT_STRING 4

System Calls

The RISC-V processor has two special ‘privileged’ instructions for managing system-level interaction of the processor: ecall and ebreak. The ecall instruction is for issuing privileged system call to the supporting execution environment. The ecall instruction will be used in your assembly language programs to issue system calls to receive input from the outside I/O and sending output to the outside I/O (such as receiving and sending characters to the screen). More details on how to use the ecall and access system calls will be described in another document.

The ebreak instruction is used by debuggers to cause control to be transferred back to a debugging environment. You will use the ebreak instruction to terminate your assembly language program.

Sample Program

The following example demonstrates an assembly language program that sums the data in the data segment and computes the average of the items in the data segment.

# Export this function "main" so that it is available globally
# and useable by other functions/code.
.globl  main

# Define a constant that can be used to avoid magic numbers
.eqv NUMBER_OF_WORDS 10

# This directive defines the data segment. By default, the
# data segment starts at 0x10000000
.data

# Allocate 10 words in the data segment to average.
data_to_average:
	.word 15      # Decimal number
	.word 20
	.word 0x3f    # Here is a hex example
	.word 50
	.word -35
	.word 74
	.word 42
	.word 82
	.word 41
	.word 63

# Indicate the number of words in our previous array
number_of_words:
	.word NUMBER_OF_WORDS

# This directive defines the "text" or program code segment.
# By default the .text segment starts at address 0x00400000.
.text

# define a label "main" for the start of the program
# This program will sum the values in "data_to_average" and then
# divide by the number of items to calculate the average.
main:
	la t0, data_to_average	# Use t0 as the pointer to the data. Initialize to 'data_to_average'
	# The 'la' pseudo instruction is used to load an address into a register.
	li t1, 0		# Use t1 as the sum
	# The 'li' pseudo instruction is used to load an immediate value into a register.
	li t2, 0		# Index counter
	lw t3, number_of_words	# Use t2 as the terminal count condition
	
add_item:
	lw t4, (t0)		# Read current data item into t4
	add t1, t1, t4		# Add to the sum
	addi t0, t0, 4		# Increment the data pointer
	addi t2, t2, 1		# Increment index
	bne t3, t2, add_item	# Go back to and add more if not done
	
	# Fall through: done with summation
divide:
	div t5, t1, t3		# Divide sum by number of words
	mv t5, a0		# Put result in return value register
        ebreak

Last Modified: 2024-07-01 00:08:02 +0000