Quick start

Author: Ivan A. Uemlianin
Contact: ivan@llaisdy.com

A quick introduction to using trefnydd.

You will need

Here are some data to get you started:

  1. Audio data from Canolfan Bedwyr.
  2. Phonemic transcriptions of the data.

n.b.: The audio data comprises 24 wav files, each a few seconds long.

The best way to download these files is probably with wget (the files are in a subversion repository and this makes it diffucult for ftp clients like yafc). If you're working on Windows, I think wget is included in cygwin; otherwise there seem to be various wget for windows options around on the net.

The following commands will download all the files into the current directory, and then delete the superfluous index.html* files.

$ wget -nd -r -l1 --no-parent  http://bedwyr-redhat.bangor.ac.uk/svn/repos/WISPR/FestVox_Voices/cb_time_domain/Trunk/festival/lib/voices/welsh/cb_amser_cw_ldom/wav/

$ wget -nd -r -l1 --no-parent  http://bedwyr-redhat.bangor.ac.uk/svn/repos/WISPR/FestVox_Voices/cb_time_domain/Trunk/festival/lib/voices/welsh/cb_amser_cw_ldom/lab

$ rm index.html*

Download the data, and put everything into one directory. Currently, trefnydd has to import all its data in one go from a single directory (obviously this is a bug).

Create a new project, and import the data into a speech corpus

Fire up the trefnydd GUI. Do this by running the script trefnydd_gui.py (you can run a python script by either typing 'python /path/to/trefnydd/trefnydd_gui.py' at the command line or, on some systems, by double-clicking on the 'trefnydd_gui.py' icon in a graphical file manager).

Select 'File | New ... | Project'. A dialog pops up prompting you for a name and location for this project. Enter your choices and dismiss the dialog ('OK' or 'Save' depending on platform).

Select 'File | New ... | Corpus. A dialog pops up prompting you for a name and type this corpus. The default type is 'Speech', which is what we want. Enter your choices and dismiss the dialog ('OK' or 'Save' depending on platform).

A node for the new corpus appears in the Corpora/Speech tab in the left-hand side browser pane. Right-click on the node and select 'Import Data' from the dialog. A dialog pops up prompting you to indicate the current location of the data (i.e., the directory you saved it in above). Find the directory and click 'OK'.

Trefnydd imports the data (and copies it over into its own directory. Any operations on the data will be on these copies). The browser pane shows a tree view of this data.

Immediately underneath the corpus node is a 'Data' node. Later there'll also be a 'Metadata' node which will contain summary statistics and other diagnostics about the corpus. Underneath the Data node are the Utterance nodes. Each Utterance node collects together all the information about an item of data: here, for example, the audio and transcription files. Under an Utterance node there is a node for every 'facet' or kind of information about that Utterance (e.g., audio, transcriptions, pitchmarking, etc.).

Currently, the user can't do anything with the audio data via the trefnydd GUI. For the moment, I recommend a separate audio file editor like Praat or snd.

Right-click on a transcription node and select 'View/Edit' and a spreadshhet-like view of the transcription opens in the right-hand pane. The fields are editable and will be saved when you save the project.

Speaking of which, let's save the project now! Select 'File | Save Project' or press CTRL-S and the project will be saved into the directory you specified earlier (please verify and email me if it hasn't worked!).

If you quit now ('File | Quit' or CTRL-Q), restart and select 'File | Open Project', all the data will be safely where you expect it.

Initialise a Pronunciation Dictionary (PD)

If your data is orthographically transcribed, you can extract the words and start building a Pronunciation Dictionary (PD).

Select the PD tab in the Corpora tab in the LHS browser pane. First we'll import Grapheme and Phoneme sets and a Grapheme-to-Phoneme (G2P) rule set from files. Right-click on 'Grapheme Set' and select 'import from file'. There are starter Grapheme set, Phoneme set and G2P rulesets in trefnydd/meta/.

Currently, the G2P ruleset must be loaded after both the Grapheme and Phoneme sets or it doesn't import properly (all phones are set to MED). This is a bug: trefnydd should show an alert dialogue and not allow the G2P ruleset to be imported.

Build an Acoustic Model

Select on 'File | New ... | Model'. A dialogue pops up prompting you for the name and type of the model. The default is 'htk' as HTK is what trefnydd uses currently to build models. Similarly, the default type is 'Acoustic Model'. At release-0.0.2 Language Models are not supported.

Go to the Models tab, right click on the model you've just named, and select 'Import Data'. This will import your whole corpus into the model builder. In future releases the user will be allowed to select a subset for import.

Right-click on the model again, and select 'Build Model'. Trefnydd will build an Acoustic Model and display a report in the view/edit pane. You can review the report and adjust your data as necessary.

Export the Acoustic Model

Select 'File | Export | Model' and choose 'Export model as Cairo MRCP speechrecog resource'. The model you built in the step above will be saved as a Sphinx4 .jar file, ready for use with a Cairo MRCP Server.