Thursday, 6 October 2016

RetroChallenge_201610 #2

Sometimes you need to read the manual...

The idea I was thinking of was to be able to enter the music notes and the lyrics either directly into the basic program, or enter them as a text file and have the program read it. I think I'll use this as a good easy song to start with. :-)

Then have a file something like this, notes, then duration, and the lyrics lined up possibly to get the timing.
   c.5 c.5 d1    c1  f1 e2  c.5 c.5 d1    c1  f1 e2  
   Hap-py  birth-day to you Hap-py  birth-day to you

I need to try and work out some ways to transfer the music notes to the speech to bring the tune into it. And play with some of the other features of SAM to see what adds to this. First thing was to enter the text and convert using the new function on my driver. The first line converted to phonemes ends up like this.

SAM supports setting the pitch, the standard pitch used is 120 in the examples, so the first thought was to make this C, and then add 10 for each note. ie D becomes 110, E = 120, and so on. rolled out, it becomes like this.
   "#P120"   (C)
   "#P120"   (C)
   "#P130"   (D)
   " BERTH"
   "#P120"   (C)
   " DEY5"
   "#P150"   (F)
   " TUX"
   "#P140"   (E)
   " YUW"

Well, here is how that sounds, not to good at all.

Now remember the opening line of the blog, a quick look at the SAM manual and it looks like the pitch is the other way around, smaller value is higher pitch.
   00-20 impractical
   20-30 very high
   30-40 high
   40-50 high normal
   50-70 normal
   70-80 low normal
   80-90 low
   90-255 very low
   default = 64

A quick swap around and this was the modified test program:

and this was the result. Still not to good.

SAM supports adding a stress value after the phoneme to add some expression. From the manual:
   1 = very emotional stress
   2 = very emphatic stress
   3 = rather strong stress
   4 = ordinary stress
   5 = tight stress
   6 = neutral (no pitch change) stress
   7 = pitch-dropping stress
   8 = extreme pitch-dropping stress

I then added a value of 5, need to have a play with these more, they might be quite useful.
This was the modified test program with an arbitrary stress value after the last phoneme:

and this was the result. Not to sure if this is better, I think it is improving.

Then the next thing was to extend the words, based on timing. I did this by just repeating the phonemes. Although the last word 'you' should probably be extended more, but it does not sound to good like that. Need to try some other ideas for it.
This is the test program with these changes:

and here is the result, its starting to get there!

I need to play some more with this, but I have some ideas now on how to translate from the music/lyrics to SAM.

Tuesday, 4 October 2016

RetroChallenge_201610 #1

Time to get started on this challenge!

To be able to work with and manipulate the speech output, I need to work with Phonemes, rather than plain text. This will allow me to extend the words and 'hold a note' for longer (or sing!). But working with them would take a lot of manual work to convert them from the text of the song lyrics. So the first thing I want to do is extend my Apple/// SAM driver to help with this.

First, some background on SAM. The original SAM software came with two binary programs, SAM and RECITER.

SAM is the actual speech program that takes Phonemes as input, and outputs speech from these through the 8 bit DAC card. Phonemes are speech sounds made by the mouth. Put together these make up words or speech. The full list of Phonemes that SAM supports are available in the user manual, linked here:

SAM Owner Manual

As an example, to say "Hello There" would need the following input " /HEHLOW DHEHR" to SAM.

The other program RECITER takes plain text as input and using rule based conversion, converts these to Phonemes. It then passes the output to SAM to speak them. Reciter has a large table of rules that it uses to look at letters preceding and after to work out the Phonemes it should use. Some of these are for specific words, and some are building blocks, eg sounds for word. I just noticed it has a table entry for Atari! ".ASCII "(ATARI)=AHTAA4RI""

My Apple/// driver implementation is done as a SOS character mode driver. A character mode driver supports reading and writing character strings. Currently I have two 'sub' drivers implemented, .SAM to support Phonemes, and .RECITER to support plain text.

For both of these 'sub' drivers, you first open them. And then just write a string to them, and they will speak the string. To check for errors, you can read from these, and the error code is returned. If its 255, then all went ok. Other wise, it will return the character position in the string where the error occured. eg when an incorrect phoneme is found.

What I want to be able to do is after the .RECITER has converted the plain text to Phonemes, I want to be able to read this back. The way i will implement this is to detect when a second read occurs, then return the converted string.

I have added a variable to monitor the number of reads:
  READNUM     .BYTE   000     ;flag to determine number of reads after write, 0=none
Whenever a write occurs, I will clear this back to zero:
  SPEAK       LDA     #000
              STA     014FC               ;disable extended indirect addressing for FB/FC
              STA     READNUM             ;clear previous reads number
              LDA     EReg

Then in the READ part, i have added a check and then either return the error code for the first read
or return the converted string.

$010        LDA     READNUM             ;check number of previous reads
              BNE     RETPHONM            ;yes, there has been, return converted string
                                          ;otherwise, return error code
  ;return error code
  ;return converted string containing Phonemes
  RETPHONM    LDY     #00
  $020        LDA     INPUTBUF,Y          ;read converted text from INPUTBUF
              STA     (BUFFER),Y          ;store in read buffer
              CMP     #ASC_CR             ;if CR then this is the end of the string
              BNE     $020                ;no, next
              TYA                         ;yes, ret count = index +1
              LDY     #00
              STA     (RTNCNT),Y          ;actual characters read count, low byte
              LDA     #00
              STA     (RTNCNT),Y          ;actual characters read count, high byte

One issue with this is it will actually speak the output each time your convert. If I have time, I will come back and improve this.
I have updated the changes into my github repository for this here:

To test this out, I have updated one of my test programs to check the driver operation. The disk image is available here with the updated driver and Basic test program.

its quite simple, here is the output show the program and the output from it running in MESS:

Next step will be to add pitch statements to see if SAM can start to sing..