tpasm Manual (October 24, 2002)

tpasm began as a replacement for mpasm (an assembler for Microchip's
PIC processors). Then it got out of control.

Now it is a cross assembler for a variety of common microprocessors
(including the PICs).

It was written because mpasm only runs on one platform -- one
which I have difficulty using.

tpasm's feature set and syntax is a conglomeration of features from
many other assemblers. It bears enough similarity to mpasm so that
porting mpasm source to it should not be very painful.

tpasm Features:
----- --------
- true multi-pass assembly (will take as many passes as needed)
- multiple segments
- sophisticated expressions
- macros, repeats, conditionals
- arbitrary length labels
- local labels
- supporting new processors is reasonably straightforward
- can switch between processors during assembly
- tpasm is free software


Command Line:
------- ----
Usage: tpasm [opts] sourceFile [opts]
Assembly Options:

   -o type fileName  Output 'type' data to fileName
                     Multiple -o options are allowed, all will be processed
                     No output is generated by default
   -I dir            Append dir to the list of directories searched by include
   -d label value    Define a label to the given value
   -P processor      Choose initial processor to assemble for
   -n passes         Set maximum number of passes (default = 32)
   -l listName       Create listing to listName
   -w                Do not report warnings
   -p                Print diagnostic messages to stderr


Information Options:

   -show_procs       Dump the supported processor list
   -show_types       Dump the output file types list


Options are case sensitive and are detailed below:

-o Selects the type and file name for tpasm's output. Multiple -o
   options can be specified to generate multiple output files.
   The current list of available output types is:

    intel               Intel format segment dump
    srec                Motorola S-Record format segment dump (16 bit)
    srec32              Motorola S-Record format segment dump (32 bit)
    sunplus             Sunplus format symbol listing
    text                Textual symbol file listing

   As you can see from the above, -o is used to specify output hex files
   as well as symbol files. Any combination is allowed.

   NOTE: if no -o options are specified, tpasm will assemble, but
   produce no output.

-I Adds a directory that the INCLUDE pseudo-op will search looking for
   include files which are given in angle brackets.
       INCLUDE <includeFile>    // search for it
       INCLUDE "includeFile"    // dont search for it

-d Defines a label which is used in the assembly just as a label
   created by the EQU directive.

-P Tells tpasm which processor it should use by default. This is not
   required, as it can also be specified in the source with the
   PROCESSOR directive.

-n Is used under unusual circumstances to alter the number of passes
   that tpasm will make before deciding that the source code cannot be
   resolved. Normally this need not be specified.

-l Selects the name of the file where tpasm will generate a listing.
   If no listing file is specified, tpasm will not generate a listing.

-w Will cause tpasm to omit reporting warnings. Warnings will also
   be suppressed in the listing output if one is generated.

-p Is used mainly to see what the internals of the assembler are doing
   as a debugging aid. If your source code is not resolving in a reasonable
   number of passes, this can be used to help pinpoint the labels which
   are not resolving.

-show_procs
   Causes tpasm to dump the list of supported processors.
   If this option is present on the command line, tpasm will not attempt
   to assemble anything.

-show_types
   Causes tpasm to dump the list of supported output file types.
   If this option is present on the command line, tpasm will not attempt
   to assemble anything.


Assembly Syntax:
-------- ------
Lines of assembly source files all have a similar syntax. Namely:
[label[:]]   [opcode [operands]]   [;comments]

The comment field can be introduced either with a ';' or a '//'.
Comments may appear on lines by themselves, and blank lines
are allowed.
Labels are case sensitive, and may be of arbitrary length.
Label definitions must begin in column 0.

Opcodes (including macros) are always matched in a case insensitive
way, and must be preceded by white space.

Operands must be separated from opcodes by white space.

Comments do not need to be preceded by white space.

Some pseudo-ops do not allow labels.


Labels:
------
tpasm labels consist of one of the characters from the set: A-Z, a-z, or _
followed by any number of characters from the set: A-Z, a-z, _, or 0-9

Label definitions may end with a colon, but this is not necessary.

Local labels are preceded by a '.', or '@'.

Local labels beginning with '.' are in scope between non-local labels.

Local labels beginning with '@', are in scope between non-local labels
or the edges of of macros that contain them.


Pseudo-Ops:
------ ---
tpasm supports a standard set of pseudo-ops (ones which are available
no matter which processor is selected), and a supplementary set --
based on the chosen processor.

Pseudo-ops are case insensitive.

The standard set consists of:

 INCLUDE "fileNameString"
 INCLUDE <fileNameString>
    Include another source file into the assembly. The inclusion
    of another source file will not cause local labels to go out
    of scope, so it is possible to reference a local label across
    include files.
    NOTE: placing the file name in <>'s will cause the assembler
    to search the include path for them.
    NOTE: includes are nestable to 256 levels deep. This restriction
    is meant to keep self-including source trees from causing
    the assembler's stack to overflow.

 SEG "segmentNameString"
 SEGU "segmentNameString"
    The SEG pseudo-op creates or sets the current segment.

    If segmentNameString was previously created, the assembler just
    sets the segment back to it.
    If it was not previously created, the assembler creates it and
    sets the segment to it.
    Newly created segments have a default origin of 0.

    A segment is nothing more than an addressed area which the assembler
    knows about. When code is generated by the assembler, it is placed
    into the current segment.
    Each segment has an 'origin' which tells the current place data
    is to be written into it. When the assembler switches between
    segments, it keeps track of the origin of each.

    Segments may be 'initialized' (the SEG pseudo-op) in which case
    data written to the segment is placed into the output file, or
    'uninitialized' (the SEGU pseudo-op) in which case the data written
    to the segment is discarded.
    Uninitialized segments can be useful for (among other things)
    assigning RAM locations to labels.

    The segmentNameString is case sensitive.

    At the moment (I may change this later) the assembler automatically
    creates a segment called "code" and sets its origin to 0 upon
    execution.

    An arbitrary number of segments may be created.

 ORG exp
    This sets the origin for the currently selected segment. The origin
    is the location where generated code will be placed within the segment.

 RORG exp
    This sets the origin for code generation of the current segment. This
    origin tells the assembler that code which is being generated should
    be made to _appear_ to start at the location given by exp. This is
    useful if you need to generate code which is meant to be copied before
    it is executed. The generated code is still placed into the segment at
    locations based on the last ORG statement.

 ALIGN exp
    Align the ORG-based origin to the current or next address which
    satisfies: (address mod exp == 0). NOTE: if exp is unresolved, or
    evaluates to 0, or 1, this does nothing.

label EQU exp
    Assign label to the result of exp. Once a label has been EQU'd, it may not
    be reassigned to a different value.

label SET exp
    Similar to EQU, except that you may use SET to redefine label to other
    values at any time.

label UNSET
    Remove label from the assembler's name space as if it had never been SET.

label ALIAS replacement
    Assign replacement to label. Operates similarly to EQU except
    replacement is not an expression -- it is a simple text substitution.
    Labels defined with ALIAS are only expanded when they appear as
    processor opcodes or operands. Unlike EQU, an ALIAS must be declared
    before it is used. Since the expansion takes place in the text domain
    before any meaning is applied to the operands by the instruction, be
    careful to use unique enough labels so that tpasm does not replace
    unexpected strings within your code. Label must only consist of
    characters which are valid for labels. If replacement is not quoted,
    it too must contain only characters which are valid for labels.
    However, if replacement is placed in double quotes, it may contain any
    character.
    ALIAS replacements are not recursive.

label UNALIAS
    remove a label that was defined with ALIAS

label MACRO param1,param2,...
    Begin the definition of an assembler macro. Label becomes the name of
    the macro being defined.
    When the macro is invoked, the string param1 is replaced with the
    first macro argument, param2, with the second, etc....
    There can be an arbitrary number of parameters.
    Example:
    test    MACRO    var1,var2
            ADD      var1,var2
            ENDM
    The opcode "test"
            test     A,$14
    then expands as:
            ADD      A,$14

    NOTE: when a macro is expanded, a new local label scope is created so
    that macros can contain local labels beginning with '@' which do not
    interfere with the code surrounding the invocation.
    Also, macros are recursive. It is possible to invoke, or even
    define another macro from within a macro.

 ENDM
    Marks the end of a macro definition.

 IF exp
    Used for conditional assembly. Code between the IF and the first
    matching ELSE or ENDIF will be interpreted by the assembler if exp is
    resolved, and non-zero. NOTE: if exp is not resolved, neither the code
    following the IF nor any code given by an associated ELSE will be
    interpreted.

 IFDEF label
    Used for conditional assembly. Code between the IFDEF and ELSE or
    ENDIF will be interpreted by the assembler if 'label' is defined.
    NOTE: as soon as a label is defined on any pass of the assembly (even
    if it is not resolved), subsequent IFDEF invocations for that label
    (in this, and subsequent passes) will see it as defined.

 IFNDEF label
    Used for conditional assembly. Code between the IFDEF and ELSE or
    ENDIF will be interpreted by the assembler if 'label' is NOT defined.
    NOTE: as soon as a label is defined on any pass of the assembly (even
    if it is not resolved), subsequent IFNDEF invocations for that label
    (in this, and subsequent passes) will see it as defined.

 ELSE
    Used for conditional assembly. Code between the ELSE and ENDIF will
    be interpreted by the assembler if the preceding IF, IFDEF, or IFNDEF
    evaluated to FALSE.

 ENDIF
    Marks the end of a conditional assembly block.
    NOTE: all conditionals are nestable to any level.

 SWITCH exp
    Used for conditional assembly. exp is evaluated, and then compared
    to each of the following CASEs. If exp is resolved, and matches
    a given CASE statement, code between the CASE and either a BREAK,
    or ENDS is interpreted by the assembler.

 CASE exp
    Used between the SWITCH and ENDS pseudo-ops, exp is evaluated and compared
    with the result of the evaluation of the expression given in the SWITCH.
    If both expressions are resolved, and evaluate to the same result, the
    code after the CASE up until a BREAK or ENDS is interpreted by
    the assembler.

 BREAK
    Ends any CASE that preceded it.
    NOTE: it is possible to have multiple CASEs before a break:
        SWITCH  value
        CASE    1
        CASE    2
        MESSAGE "value was 1 or 2"
        BREAK
        CASE    3
        MESSAGE "value was 3"
        CASE    4
        MESSAGE "value was 3 or 4"
        ENDS

 ENDS
    Marks the end of a SWITCH.

 REPEAT exp
    Used to duplicate code. REPEAT evaluates exp and interprets
    the code between the REPEAT and ENDR pseudo-ops exp number of times
    (including 0).
    If exp is not resolved, the code after REPEAT is ignored.

    An example (lifted from the dasm manual):
        Y   SET     0
            REPEAT  10
        X   SET     0
            REPEAT  10
            DB      X,Y
        X   SET     X+1
            ENDR
        Y   SET     Y+1
            ENDR

        generates an output table:  0,0 1,0 2,0 ... 9,0  0,1 1,1 2,1
        ... 9,1, etc...

 ENDR
    Marks the end of a REPEAT.

 ERROR "message"
    When interpreted by the assembler, causes "message" to be printed out
    as if an error had occurred in the assembly.

 WARNING "message"
    When interpreted by the assembler, causes "message" to be printed out
    as if a warning had occurred in the assembly.

 MESSG "message"
 MESSAGE "message"
    Causes the assembler to print "message" to the console.
    This message is only printed during the final pass of assembly.

 LIST
    If listing has been enabled by the -l command line option, this
    will enable listing output. (See NOLIST below).

 NOLIST
    If listing has been enabled by the -l command line option, this will
    disable listing output. This is useful for the contents of include
    files which you do not want to appear in the listing output.

 EXPAND
    Allows macros and repeats to be expanded into the listing output

 NOEXPAND
    Prohibits macros and repeats from generating listing output during
    expansion.

 PROCESSOR processorString
    Tells the assembler to start assembling for the given processor.
    This does not change the current segment or the origin. There can
    be an arbitrary number of PROCESSOR pseudo-ops in the source being
    assembled.

 END
    Tells the assembler to stop assembling the current file,
    or macro.
    If END is seen in an include file, it stops assembly of that
    file only (assembly resumes after the point that the file was
    included).
    If END is seen during macro expansion, it stops expansion of the macro.

Expressions:
-----------
tpasm evaluates all expressions using 32 bit quantities.
The following operators are available:

Unary operators:
.strlen.    Returns the length of the string which follows it.
            for example: .strlen."this" evaluates to 4.
high        Returns the high byte of the low word of the expression
            following it.
low         Returns the low byte of the low word of the expression
            following it.
msw         Returns the high word of the expression following it.
lsw         Returns the low word of the expression following it.
!           Logical not.
~           Bitwise not.
-           Negation.

Unary operators always have the highest precedence.

Binary operators in descending order of precedence:
(blank lines separate precedence groups)
*           Multiplication.
/           Division.
%           Modulus.

+           Addition.
-           Subtraction.

<<          Left shift.
>>          Right shift.

<           Less than.
>           Greater than.
<=          Less than or equal to.
>=          Greater than or equal to.

==          Tests for equality.
!=          Tests for inequality.

&           Bitwise and.

^           Bitwise xor.

|           Bitwise or.

&&          Logical and.

||          Logical or.

Grouping:
(           Begin group.
)           End group.


Constants:
---------
####        interpreted in base 10
0b###       binary
0o###       octal
0d###       decimal
0x###       hex
A'cccc'     ascii (1 to 4 characters)
B'###'      binary
O'###'      octal
D'###'      decimal
H'###'      hex
###b        binary
###B        binary
###o        octal
###O        octal
###d        decimal
###D        decimal
###h        hex (first digit must be 0-9)
###H        hex (first digit must be 0-9)
%###        binary
.###        decimal
$###        hex
'cccc'      ascii (1 to 4 characters)
"string"    string (not zero terminated)

Symbols:
-------
$           Current program counter (as of the beginning of the instruction)
            including the effects of RORG. This is the symbol normally used
            to get the current PC.
$$          Current program counter (as of the beginning of the instruction)
            not including the effects of RORG. This is used if you need to know
            the actual PC from within an RORG'd segment. It can also be used to
            cancel the effects of RORG by issuing "RORG $$". This works, since
            the relative origin will now become the absolute origin. If you don't
            use RORG, you'll never need $$.


Additional Pseudo-ops for various processors:
---------- ---------- --- ------- ----------

PIC Family
--- ------
 DB val[,val,val...,val]
    Define byte
    (This description lifted from the MPASM manual.)
    Reserve program memory words with packed 8-bit values. Multiple
    expressions continue to fill bytes consecutively until the end of expressions.
    Should there be an odd number of expressions, the last byte will be 0.

 DW val[,val,val...,val]
    Define word
    Reserve program memory words with 16-bit values.

 DATA
    Synonym for DW.

 DT val[,val,val...,val]
    Define table
    Generates a series of RETLW instructions, one for each value.
    Each value must be 8 bits in size. Each character in a string is
    stored in its own RETLW instruction.

 DS val
    Define space
    Move the PC forward by val.

 __CONFIG val
    Set processor configuration bits.

 __IDLOCS val
    Sets the four ID locations to the hexadecimal digits of val.

 __MAXRAM val
    Define the absolute maximum valid RAM address.

 __BADRAM val[-val][,val[-val]...]
    Set locations which are not valid RAM.

 BANKSEL
 BANKISEL
 PAGESEL
