Forth compiler for ZX

_dw · Post by **_dw** » Thu Dec 07, 2023 6:21 pm

Hello,

I tried using macros to simulate loops in assembler and finally ended up with a higher language compiler into Z80 assembler. I chose Forth as a language because it is simpler than C and in a sense it is somewhere halfway between C and Asm.

When I started I knew almost nothing about Forth, now I know a little more, but not much, because I still haven't used one and I'm looking at it more from the point of view of how to translate an interpreter into asm. Things like "DOES>" keep surprising me.

But the result is quite good. It's the clearest asm code I've ever seen. When each word or group of words is clearly recognizable.

Forth code:

Code: Select all

: fib2 ( n1 -- n2 )
  0 1 rot 0 ?do 
    over + swap 
  loop 
  drop
;

: fib2_bench ( -- )
  1000 0 do 
    20 0 do
      I fib2 drop
    loop
  loop
;

fib2_bench

macro M4 FORTH code (../fth2m4.sh fib2.fth > fib2.m4):

Code: Select all

include(`../M4/FIRST.M4')dnl
  ifdef __ORG
    org __ORG
  else
    org 32768
  endif
  INIT(60000)


CALL(_fib2_bench)
  STOP

COLON(_fib2,({{{{ n1 -- n2 }}}}))
  PUSH(0) PUSH(1) ROT PUSH(0) QUESTIONDO 
    OVER ADD SWAP 
  LOOP 
  DROP
SEMICOLON

COLON(_fib2_bench,({{{{ -- }}}}))
  PUSH(1000) PUSH(0) DO 
    PUSH(20) PUSH(0) DO
      I CALL(_fib2) DROP
    LOOP
  LOOP
SEMICOLON

Asm code (../compile.sh fib2 32768):

Code: Select all

  ifdef __ORG
    org __ORG
  else
    org 32768
  endif
  



  


       
       
   
  



     
      
        
    
  


;   ===  b e g i n  ===
    ld  [Stop+1], SP    ; 4:20      init   storing the original SP value when the "bye" word is used
    ld    L, 0x1A       ; 2:7       init   Upper screen
    call 0x1605         ; 3:17      init   Open channel
    ld   HL, 0xEA60     ; 3:10      init   Return address stack = 60000
    exx                 ; 1:4       init
    call _fib2_bench    ; 3:17      call ( -- )
Stop:                   ;           stop
    ld   SP, 0x0000     ; 3:10      stop   restoring the original SP value when the "bye" word is used
    ld   HL, 0x2758     ; 3:10      stop
    exx                 ; 1:4       stop
    ret                 ; 1:10      stop
;   =====  e n d  =====
;   ---  the beginning of a non-recursive function  ---
_fib2:                  ;           ( n1 -- n2 )
    pop  BC             ; 1:10      : ret
    ld  [_fib2_end+1],BC; 4:20      : ( ret -- )
                        ;[6:36]     0 1 rot   ( x -- 0 1 x )
    push DE             ; 1:11      0 1 rot
    ld   DE, 0x0000     ; 3:10      0 1 rot
    push DE             ; 1:11      0 1 rot
    inc   E             ; 1:4       0 1 rot
    ld    A, L          ; 1:4       0 ?do_101(m)   ( stop 0 -- )
    ld  [stp_lo101], A  ; 3:13      0 ?do_101(m)   lo stop
    ld    A, H          ; 1:4       0 ?do_101(m)
    ld  [stp_hi101], A  ; 3:13      0 ?do_101(m)   hi stop
    or    L             ; 1:4       0 ?do_101(m)
    ex   DE, HL         ; 1:4       0 ?do_101(m)
    pop  DE             ; 1:10      0 ?do_101(m)
    jp    z, exit101    ; 3:10      0 ?do_101(m)
    ld   BC, 0x0000     ; 3:10      0 ?do_101(m)
do101save:              ;           0 ?do_101(m)
    ld  [idx101], BC    ; 4:20      0 ?do_101(m)   save index
do101:                  ;           0 ?do_101(m)
    add  HL, DE         ; 1:11      over +
    ex   DE, HL         ; 1:4       swap   ( b a -- a b )
idx101 EQU $+1          ;[16:57/78] loop_101(m)   idx always points to a 16-bit index
    ld   BC, 0x0000     ; 3:10      loop_101(m)
    inc  BC             ; 1:6       loop_101(m)   index++
    ld    A, C          ; 1:4       loop_101(m)   lo new index
stp_lo101 EQU $+1       ;           loop_101(m)
    xor  0x00           ; 2:7       loop_101(m)   lo stop
    jp   nz, do101save  ; 3:10      loop_101(m)
    ld    A, B          ; 1:4       loop_101(m)   hi new index
stp_hi101 EQU $+1       ;           loop_101(m)
    xor  0x00           ; 2:7       loop_101(m)   hi stop
    jp   nz, do101save  ; 3:10      loop_101(m)
leave101:               ;           loop_101(m)
exit101:                ;           loop_101(m)
    ex   DE, HL         ; 1:4       drop
    pop  DE             ; 1:10      drop   ( a -- )
_fib2_end:
    jp   0x0000         ; 3:10      ;
;   ---------  end of non-recursive function  ---------
;   ---  the beginning of a non-recursive function  ---
_fib2_bench:            ;           ( -- )
    pop  BC             ; 1:10      : ret
    ld  [_fib2_bench_end+1],BC; 4:20      : ( ret -- )
    ld   BC, 0x0000     ; 3:10      1000 0 do_102(xm)
do102save:              ;           1000 0 do_102(xm)
    ld  [idx102],BC     ; 4:20      1000 0 do_102(xm)
    xor   A             ; 1:4       20 0 do_103 i_103(m)   8-bit loop   ( 20 0 -- i )
do103saveA:             ;           20 0 do_103 i_103(m)
    push DE             ; 1:11      20 0 do_103 i_103(m)
    ex   DE, HL         ; 1:4       20 0 do_103 i_103(m)
    ld  [idx103], A     ; 3:13      20 0 do_103 i_103(m)   save lo(index)
    ld    L, A          ; 1:4       20 0 do_103 i_103(m)
    ld    H, 0x00       ; 2:7       20 0 do_103 i_103(m)
    call _fib2          ; 3:17      call ( -- )
    ex   DE, HL         ; 1:4       drop
    pop  DE             ; 1:10      drop   ( a -- )
                        ;[9:32/32]  loop_103(m)   variant +1.ignore: 8-bit loop, run 20x
idx103 EQU $+1          ;           loop_103(m)   idx always points to a 16-bit index
    ld    A, 0          ; 2:7       loop_103(m)   0.. +1 ..(20), real_stop:0x0014
    db   0x00           ; 1:4       loop_103(m)   ignore opcode = hi(index) -> idx always points to a 16-bit index.
    inc   A             ; 1:4       loop_103(m)   index++
    cp   0x14           ; 2:7       loop_103(m)   lo(real_stop)
    jp   nz, do103saveA ; 3:10      loop_103(m)   index<>real_stop?
                        ;[16:57/58] loop_102(xm)   variant +1.default: step one, run 1000x
idx102 EQU $+1          ;           loop_102(xm)   idx always points to a 16-bit index
    ld   BC, 0x0000     ; 3:10      loop_102(xm)   0.. +1 ..(1000), real_stop:0x03E8
    inc  BC             ; 1:6       loop_102(xm)   index++
    ld    A, C          ; 1:4       loop_102(xm)
    xor  0xE8           ; 2:7       loop_102(xm)   lo(real_stop) first (232>3)
    jp   nz, do102save  ; 3:10      loop_102(xm)   3x false positive
    ld    A, B          ; 1:4       loop_102(xm)
    xor  0x03           ; 2:7       loop_102(xm)   hi(real_stop)
    jp   nz, do102save  ; 3:10      loop_102(xm)   232x false positive if he was first
leave102:               ;           loop_102(xm)
exit102:                ;           loop_102(xm)
_fib2_bench_end:
    jp   0x0000         ; 3:10      ;
;   ---------  end of non-recursive function  ---------

The resulting 133-byte program is about 10 times faster than a 2222-byte program compiled from C. Over time, the C compiler has improved a bit (and also has the ability to write the correct parameter during compilation), but it is still an order of magnitude worse.

Code: Select all

|        Forth / C        |  Benchmark  | Time (sec/round) | Bytes
| :---------------------: | :---------: | :--------------- | :------
| M4_FORTH                | Fib2        | 0m5.65s          | 133
| M4_FORTH use data stack | Fib2s       | 0m5.03s          | 112
| M4_FORTH use assembler  | Fib2a       | 0m2.55s          | 96
| Boriel Basic zxbc 1.16.4| Fib2 a = a+c| 0m14.38s         |
| zcc z88dk v16209        | Fib2 a = a+c| 0m49.19s         |
| zcc z88dk v16209        | Fib2 a+= c  | 0m43.97s         |
| zcc z88dk v19766 -O2    | Fib2 a+= c  | 0m36.09s         | 2222

_dw · Post by **_dw** » Thu Dec 07, 2023 8:21 pm

I try play music with beeper (ZX 48kb). Adding songs isn't that difficult when using a music engine, but this topic is incredibly hard for me.
[youtube]https://www.youtube.com/watch?v=4D28quLwQG0[/youtube]
https://www.youtube.com/watch?v=4D28quLwQG0

In M4 FORTH, this can be achieved with the help of a pair of commands, when it is distinguished whether the input data is copied to the buffer or played directly on the spot. If they are copied, they can also be compressed. If they are compressed, select the compression type.

The whole program is quite short:

Code: Select all

;vvvv
include(`../M4/FIRST.M4')dnl
;^^^^
    define({USE_FONT_5x8})
    define({SUFFIX},.zx0)
    define({USE_ZX0})
    define({BUFFERPLAY_SIZE},14581)
; max unpacked file data is USE_PLAY

__LIMITED_TIMES_LOOP_MUSIC equ 1
SHOW_BORDER_COLOR EQU 1

  ifndef __BUFFER
__BUFFER equ 0xC000
  endif

  ifdef __ORG
    org __ORG
  else
    org 24576
  endif



text_y equ (24-10)/2
text_x equ (51-24)/2
         
    INIT(28000)
    ZX_CONSTANT

    PUSH(0x4000,(24+3)*256,0) FILL
    PUSH(ZX_BLACK) ZX_BORDER

    BEGIN
        PRINT_I({ZX_INK, ZX_BLUE, ZX_PAPER, ZX_BLACK})
        PRINT_I({ZX_AT,text_y+0,text_x," 1: _first_last_ "})
        PRINT_I({ZX_AT,text_y+1,text_x," 2: octode2k16 test "})
        PRINT_I({ZX_AT,text_y+2,text_x," 3: alf2_zalza "})
        PRINT_I({ZX_AT,text_y+3,text_x," 4: pd_dawn "})
        PRINT_I({ZX_AT,text_y+4,text_x," 5: algar_thegermansroom "})
        PRINT_I({ZX_AT,text_y+5,text_x," 6: bacon_sandwich "})
        PRINT_I({ZX_AT,text_y+6,text_x," 7: cja_h_what_is_love "})
        PRINT_I({ZX_AT,text_y+7,text_x," 8: clop_hybrid_sparta "})
        PRINT_I({ZX_AT,text_y+8,text_x," 9: contraduct_design "})
        PRINT_I({ZX_AT,text_y+9,text_x," e: exit "})

        PRINT_I({ZX_INK, ZX_RED})

        PUSH(__TESTKEY_E) TESTKEY 
    IF     
        PRINT_I({ZX_AT,text_y+9,text_x," e: exit "})
        BREAK 
    ELSE
        PUSH(__TESTKEY_1) TESTKEY 
    IF 
        PRINT_I({ZX_AT,text_y+0,text_x," 1: _first_last_ "})
        BINFILE(../Compression/Output/,_first_last_,SUFFIX) PUSH(__BUFFER) UNPACK PLAY 
    ELSE
        PUSH(__TESTKEY_2) TESTKEY 
    IF 
        PRINT_I({ZX_AT,text_y+1,text_x," 2: octode2k16 test "})
        BINFILE(../Compression/Output/,octode2k16,SUFFIX) PUSH(__BUFFER) UNPACK PLAY 
    ELSE
        PUSH(__TESTKEY_3) TESTKEY 
    IF 
        PRINT_I({ZX_AT,text_y+2,text_x," 3: alf2_zalza "})
        BINFILE(../Compression/Output/,alf2_zalza,SUFFIX) PUSH(__BUFFER) UNPACK PLAY 
    ELSE
        PUSH(__TESTKEY_4) TESTKEY 
    IF 
        PRINT_I({ZX_AT,text_y+3,text_x," 4: pd_dawn "})
        BINFILE(../Compression/Output/,pd_dawn,SUFFIX) PUSH(__BUFFER) UNPACK PLAY 
    ELSE
        PUSH(__TESTKEY_5) TESTKEY 
    IF 
        PRINT_I({ZX_AT,text_y+4,text_x," 5: algar_thegermansroom "})
        BINFILE(../Compression/Output/,algar_thegermansroom,SUFFIX) PUSH(__BUFFER) UNPACK PLAY 
    ELSE
        PUSH(__TESTKEY_6) TESTKEY 
    IF 
        PRINT_I({ZX_AT,text_y+5,text_x," 6: bacon_sandwich "})
        BINFILE(../Compression/Output/,bacon_sandwich,SUFFIX) PUSH(__BUFFER) UNPACK PLAY 
    ELSE
        PUSH(__TESTKEY_7) TESTKEY 
    IF 
        PRINT_I({ZX_AT,text_y+6,text_x," 7: cja_h_what_is_love "})
        BINFILE(../Compression/Output/,cja_h_what_is_love,SUFFIX) PUSH(__BUFFER) UNPACK PLAY 
    ELSE
        PUSH(__TESTKEY_8) TESTKEY 
    IF 
        PRINT_I({ZX_AT,text_y+7,text_x," 8: clop_hybrid_sparta "})
        BINFILE(../Compression/Output/,clop_hybrid_sparta,SUFFIX) PUSH(__BUFFER) UNPACK PLAY 
    ELSE
        PUSH(__TESTKEY_9) TESTKEY 
    IF 
        PRINT_I({ZX_AT,text_y+8,text_x," 9: contraduct_design "})
        BINFILE(../Compression/Output/,contraduct_design,SUFFIX) PUSH(__BUFFER) UNPACK PLAY 
    ELSE
    THEN THEN THEN THEN THEN THEN THEN THEN THEN THEN
    AGAIN

    DEPTH
    PRINT_I({0x0D, "Depth: "}) DOT CR
    STOP

Post by **akuyou** » Sat Dec 09, 2023 9:04 am

Moved to offtopic as this isn't about assembly.

_dw · Post by **_dw** » Sat Dec 30, 2023 6:36 am

For the last 3 nights I have been solving signed and unsigned comparisons of 8-bit numbers. Before it occurred to me that the Z80's support for the overflow flag could also be used in 16-bit numbers.
I had a pretty good solution (better than the best you can find on the internet)

Code: Select all

dworkin@dw-A15:~/Programovani/ZX/Forth/M4$ ../check_word.sh 'DUP PUSH(3) GT IF'
                       ;[11:40]     dup 3 > if   ( x -- x )  flag: x > 3 ;#variant: default, change: "define({_TYP_SINGLE},{sign_first})"
    ld    A, 0x03       ; 2:7       dup 3 > if   HL>3 --> 0>0x03-L --> false if not carry
    sub   L             ; 1:4       dup 3 > if   HL>3 --> 0>0x03-L --> false if not carry
    ld    A, 0x00       ; 2:7       dup 3 > if   HL>3 --> 0>0x00-H --> false if not carry
    sbc   A, H          ; 1:4       dup 3 > if   HL>3 --> 0>0x00-H --> false if not carry
    rra                 ; 1:4       dup 3 > if
    xor   H             ; 1:4       dup 3 > if   invert sign if HL is negative
    jp    p, else101    ; 3:10      dup 3 > if   positive constant --> false if not sign
; seconds: 0           ;[11:40]
dworkin@dw-A15:~/Programovani/ZX/Forth/M4$ ../check_word.sh 'DUP PUSH(-3) GT IF'
                       ;[11:40]     dup -3 > if   ( x -- x )  flag: x > -3 ;#variant: default, change: "define({_TYP_SINGLE},{sign_first})"
    ld    A, 0xFD       ; 2:7       dup -3 > if   HL>-3 --> 0>0xFD-L --> false if not carry
    sub   L             ; 1:4       dup -3 > if   HL>-3 --> 0>0xFD-L --> false if not carry
    ld    A, 0xFF       ; 2:7       dup -3 > if   HL>-3 --> 0>0xFF-H --> false if not carry
    sbc   A, H          ; 1:4       dup -3 > if   HL>-3 --> 0>0xFF-H --> false if not carry
    rra                 ; 1:4       dup -3 > if
    xor   H             ; 1:4       dup -3 > if   invert sign if HL is negative
    jp    m, else101    ; 3:10      dup -3 > if   negative constant --> false if sign
; seconds: 1           ;[11:40]

It uses the fact that in comparison with a sign it is the same as without a sign if the numbers have the same sign. If they are different, the result is the opposite.

I wondered how the C compiler handled it

Code: Select all

/* Type your code here, or load an example. */
int mul5(int num) {
    return (num <4);
}


void main(){

    mul5(10);
}

Code: Select all

._mul5
        pop     bc
        pop     hl
        push    hl
        push    bc
        ld      a,l
        sub     4
        ld      a,h
        rla
        ccf
        rra
        sbc     128
        ld      hl,0        ;const
        rl      l
        ret

._main
        ld      hl,10       ;const
        push    hl
        call    _mul5
        pop     bc
        ret

It has quite a simple solution that a signed comparison can be converted to unsigned by inverting the highest bit (because that is the same as 0x8000).

It uses the fact that by adding 0x8000 the lowest negative value -0x8000 mapped to 0x8000 is converted to zero and the highest positive value 0x7FFF mapped to 0x7FFF is converted to 0xFFFF. Takze tim ziskame hodnoty ktere muzeme porovnat unsigned metodou.

Code: Select all

        rla
        ccf
        rra

Here it is hidden, where the transmission statement is read into the 0th bit and the highest bit is dropped and is inverted using "ccf" and loaded again when the original statement is dropped again. Quite a clever solution for 3 short instructions. Even if we need to get a carry as a result, it is even better than what I used, because I thought of a solution, but not that it can be written that easily (I also solved two registers and not a register and a constant when I was thinking about it). Maybe I'll use it somewhere.

Yes, and even for the higher byte, add 128 instead of zero, because it inverts the bit at number 4. That's just changing the constant for free.

Well, my idea was that if, for example, I had the number HL < 0x8004 (so it's very negative, exactly min+4), so to verify that HL is smaller, it's enough to subtract exactly the value 4 from it. If the result were 0x7FFF or less, then was a clearly lower value in HL.
It turned out that this does not apply if we have a constant of 4, because if we subtract 0x8004 as we assume, the result will be exactly the opposite.
Finally, I realized that simply subtracting 0x8004 is understood as subtracting a negative number...

So we must not use a number greater than 0x7FFF.
The solution is pure (0x8000-4) = 0x7FFC.
If we reach a value of 0x8000 or more during loading, we trigger an overflow and know that the result is invalid...

Which somehow corresponds to what I wrote during the reading. Because subtracting 0x8004 is the same as adding 0x7FFC.

So, for positive values, the overflow is false, and for negative values, the overflow is true. If we read 0x8000 (that is, we test zero), it works as for a positive number.

Code: Select all

dworkin@dw-A15:~/Programovani/ZX/Forth/M4$ ../check_word.sh 'DUP PUSH(3) LT IF'
                        ;[9:32]     dup 3 < if   ( x -- x )  flag: x < 3 ;#variant: default, change: "define({_TYP_SINGLE},{sign_first})"
    ld    A, L          ; 1:4       dup 3 < if   HL<3 --> L-0x03<0 --> false if overflow
    sub  0x03           ; 2:7       dup 3 < if   HL<3 --> L-0x03<0 --> false if overflow
    ld    A, H          ; 1:4       dup 3 < if   HL<3 --> H-0x80<0 --> false if overflow
    sbc   A, 0x80       ; 2:7       dup 3 < if   HL<3 --> H-0x80<0 --> false if overflow
    jp   pe, else101    ; 3:10      dup 3 < if   positive constant --> false if overflow
; seconds: 0           ;[ 9:32]
dworkin@dw-A15:~/Programovani/ZX/Forth/M4$ ../check_word.sh 'DUP PUSH(-3) LT IF'
                        ;[9:32]     dup -3 < if   ( x -- x )  flag: x < -3 ;#variant: default, change: "define({_TYP_SINGLE},{sign_first})"
    ld    A, L          ; 1:4       dup -3 < if   HL<-3 --> L-0xFD<0 --> false if not overflow
    sub  0xFD           ; 2:7       dup -3 < if   HL<-3 --> L-0xFD<0 --> false if not overflow
    ld    A, H          ; 1:4       dup -3 < if   HL<-3 --> H-0x7F<0 --> false if not overflow
    sbc   A, 0x7F       ; 2:7       dup -3 < if   HL<-3 --> H-0x7F<0 --> false if not overflow
    jp   po, else101    ; 3:10      dup -3 < if   negative constant  --> false if not overflow
; seconds: 0           ;[ 9:32]

So a signed solution is just as difficult as an unsigned one! If all we need is branching.
Even the z88dk 2.2 can't do this, and at the same time you could say that it should be the base, it plays with whole numbers.

PS: After 2 years and you can still see that they are improving things that I thought could not be written better. I'm kind of slower.

_dw · Post by **_dw** » Fri Sep 06, 2024 2:44 am

Hi everyone,

I've been working these days on more support for unsigned 16-bit binary division by a constant on Z80.
I had some low values ready that count well.
Such as dividing by a power of two such as 2, 4, 8, 16 ...
Furthermore, the values contained in the decomposition of the number 255 = 3*5*17.
So 3, 5, 17, 3*5, 3*17, 5*17 and a few other values.
But it wasn't a continuous series.
For example, 9, 11, 13, 14, 19... and others were missing.
I didn't know how to do it because they couldn't create it using that method.
But I used the theory of how to make a MOD and then came up with a third method, actually probably the most basic of all.
So I created a continuous series up to the value of 64.
So now, if you try to do unsigned 16 bit division in your program by one of these constants, the code will be executed "inline".

I also did a benchmark.
https://codeberg.org/DW0RKiN/M4_FORTH/s ... a-constant

16-bit unsigned division by a constant

This was not originally supposed to be a benchmark, but a test of the correctness of dividing by a constant Each of the values 0 to 65535 is divided by the numbers 2,3,4, etc. up to 64. It is further divided by the value 85 and the remaining multiples of 2 such as 128, 256, etc. up to the value 32768. In addition to division, a function is called for each result that calculates the XOR of the previous value followed by a 16-bit rotation by one bit to the left.

z88dk can divide "inline" by numbers 2, 4, 256 and 512. For values 8, 16, 32, 64 and 128 it calls the bit shift function. For other division values, it uses a universal routine. M4 FORTH can "inline" divide all values up to 64 or multiples of two up to 32768.

A demonstration of the first basic method used for 16-bit binary division by constant on Z80. Specifically for division by 17:

Code: Select all

dworkin@dw-A15:~/Programovani/ZX/Forth/Testing$ ../check_word.sh 'PUSH(17) UDIV'
                       ;[26:150]    17 u/   Variant HL/17 = HL*257*15 >> 16
    ld    B, H          ; 1:4       17 u/
    ld    C, L          ; 1:4       17 u/   1x = base
    xor   A             ; 1:4       17 u/
    add  HL, HL         ; 1:11      17 u/
    adc   A, A          ; 1:4       17 u/   2x  AHL
    add  HL, HL         ; 1:11      17 u/
    adc   A, A          ; 1:4       17 u/   4x  AHL
    add  HL, HL         ; 1:11      17 u/
    adc   A, A          ; 1:4       17 u/   8x  AHL
    add  HL, HL         ; 1:11      17 u/
    adc   A, A          ; 1:4       17 u/   16x AHL
    sbc  HL, BC         ; 2:15      17 u/
    ld   BC, 0x000F     ; 3:10      17 u/   rounding constant (only 15)
    sbc   A, B          ; 1:4       17 u/   -1 AHL = 15x
    add  HL, BC         ; 1:11      17 u/
    adc   A, B          ; 1:4       17 u/   +0 AHL = 15x with rounding down constant
    ld    B, A          ; 1:4       17 u/   (AHL * 257) >> 16 = (AHL0 + 0AHL) >> 16 = AH.L0 + A.HL = A0 + H.L + A.H
    ld    C, H          ; 1:4       17 u/   BC = "A.H"
    add  HL, BC         ; 1:11      17 u/   HL = "H.L" + "A.H"
    ld    L, H          ; 1:4       17 u/
    adc   A, 0x00       ; 2:7       17 u/   + carry
    ld    H, A          ; 1:4       17 u/   HL/17 = HL*(65536/17)/65536 = HL*3855/65536 = HL*(1+256)*15 >> 16
; seconds: 1           ;[26:150]

A demonstration of the second basic method used for 16-bit binary division by constant on Z80. Specifically for division by 9.

Code: Select all

dworkin@dw-A15:~/Programovani/ZX/Forth/Testing$ ../check_word.sh 'PUSH(9) UDIV'
  ;# num16bit =             256*hi8bit + lo8bit
  ;# num16bit = 28*9*hi8bit + 4*hi8bit + lo8bit
                       ;[37:288]    9 u/   Variant HL/9 = (256*H+L)/9 = 28*H+((4*H+L)*28.4375+28)/256
    ld    B, 0x00       ; 2:7       9 u/
    ld    C, L          ; 1:4       9 u/   BC = lo8bit
    ld    A, H          ; 1:4       9 u/   A  = hi8bit
    ld    L, H          ; 1:4       9 u/
    ld    H, B          ; 1:4       9 u/   HL = hi8bit
    add  HL, HL         ; 1:11      9 u/   2x
    add  HL, HL         ; 1:11      9 u/   4x
    add  HL, BC         ; 1:11      9 u/   num11bit = 4*hi8bit+lo8bit = 0..1275
    inc  HL             ; 1:6       9 u/   +28 round up
    ld    B, H          ; 1:4       9 u/
    ld    C, L          ; 1:4       9 u/
    add  HL, HL         ; 1:11      9 u/   2x 
    add  HL, BC         ; 1:11      9 u/   3x 
    add  HL, HL         ; 1:11      9 u/   6x
    add  HL, BC         ; 1:11      9 u/   7x 
    add  HL, HL         ; 1:11      9 u/   14x
    add  HL, HL         ; 1:11      9 u/   28x 
    ld    B, 0x00       ; 2:7       9 u/
    ld    C, H          ; 1:4       9 u/   28/256=14/128=7/64
    add  HL, BC         ; 1:11      9 u/   28+7/64
    add  HL, BC         ; 1:11      9 u/   28+7/32
    add  HL, BC         ; 1:11      9 u/   28+7/32+7/64
    add  HL, BC         ; 1:11      9 u/   28+7/16 = 28.4375x
    ld    C, A          ; 1:4       9 u/   BC = hi8bit
    ld    A, H          ; 1:4       9 u/
    ld    H, B          ; 1:4       9 u/ 
    ld    L, C          ; 1:4       9 u/ 
    add  HL, HL         ; 1:11      9 u/   2x 
    add  HL, BC         ; 1:11      9 u/   3x 
    add  HL, HL         ; 1:11      9 u/   6x
    add  HL, BC         ; 1:11      9 u/   7x 
    add  HL, HL         ; 1:11      9 u/   14x
    add  HL, HL         ; 1:11      9 u/   28x 
    ld    C, A          ; 1:4       9 u/ 
    add  HL, BC         ; 1:11      9 u/   28*hi8bit+0..141
; seconds: 0           ;[37:288]

A demonstration of the third basic method used for 16-bit binary division by constant on Z80. Specifically for division by 13.

Code: Select all

dworkin@dw-A15:~/Programovani/ZX/Forth/Testing$ ../check_word.sh 'PUSH(13) UDIV'
  ;# n*256/13 >> 8 = 19.692308 >> 8 = (20 - 0.307692) >> 8
  ;# 256*0.307692 = 78.769231 = 79 - 0.230769
  ;# 256*0.230769 = 59.076923
                       ;[41:249]    13 u/   Variant HL/13 = HL*(256/13)>>8 = HL*19.692308>>8 = HL*(20-(78+197/256)/256)>>8
    ld    B, H          ; 1:4       13 u/
    ld    C, L          ; 1:4       13 u/   1x base
    xor   A             ; 1:4       13 u/    
    add  HL, HL         ; 1:11      13 u/  
    adc   A, A          ; 1:4       13 u/   2x
    add  HL, HL         ; 1:11      13 u/  
    adc   A, A          ; 1:4       13 u/   4x
    add  HL, BC         ; 1:11      13 u/  
    adc   A, 0x00       ; 2:7       13 u/   5x
    add  HL, HL         ; 1:11      13 u/  
    adc   A, A          ; 1:4       13 u/   10x
    add  HL, HL         ; 1:11      13 u/  
    adc   A, A          ; 1:4       13 u/   20x
    push HL             ; 1:11      13 u/
    push AF             ; 1:11      13 u/
    add  HL, HL         ; 1:11      13 u/  
    adc   A, A          ; 1:4       13 u/   40x
    sbc  HL, BC         ; 2:15      13 u/  
    sbc   A, 0x00       ; 2:7       13 u/   39x
    add  HL, HL         ; 1:11      13 u/  
    adc   A, A          ; 1:4       13 u/   78x
    ld    B, A          ; 1:4       13 u/
    rra                 ; 1:4       13 u/
    add   A, B          ; 1:4       13 u/
    add   A, B          ; 1:4       13 u/   2*78 + 78/2 = 195
    add   A, H          ; 1:4       13 u/
    ld    C, A          ; 1:4       13 u/
    jr   nc, $+3        ; 2:7/12    13 u/
    inc   B             ; 1:4       13 u/   BC = (78+195/256)/256
    pop  AF             ; 1:10      13 u/
    pop  HL             ; 1:10      13 u/
    sbc  HL, BC         ; 2:15      13 u/   9 - 78.7617
    sbc   A, 0x00       ; 2:7       13 u/   
    ld    L, H          ; 1:4       13 u/
    ld    H, A          ; 1:4       13 u/   HL*(20-(78+197/256)/256)>>8
; seconds: 41          ;[41:249]

The average division time is 200 to 300 cycles. Compared to roughly 1200 cycles, what does a universal routine have for these low values. Universal routine, but can be changed to a faster but longer one using a parameter.

M4 FORTH

https://codeberg.org/DW0RKiN/M4_FORTH/s ... udiv_c.asm
https://codeberg.org/DW0RKiN/M4_FORTH/s ... t_udiv.asm
After compiling I got these results

https://codeberg.org/DW0RKiN/M4_FORTH/s ... udiv_c.asm
https://codeberg.org/DW0RKiN/M4_FORTH/s ... t_udiv.asm

Code: Select all

System                          Forth / C                               Benchmark       Time        bin size
ZX Spectrum Fuse 1.6.0 Ubuntu   z88dk.zcc v22110-51889e5300-20231219   -O2 const_udiv   25m 18.33s  4 260 B (include 4x8 font)
ZX Spectrum Fuse 1.6.0 Ubuntu   M4_FORTH                                const_udiv       6m 49.41s  2 971 B

ChibiAkumas ASM Programming Forum

Forth compiler for ZX

Forth compiler for ZX

Re: Video: Lesson P37 - Playing Digital Sound with WAV on the ZX Spectrum

Re: Forth compiler for ZX

Re: Forth compiler for ZX

Re: Forth compiler for ZX