JULIUS(1)                                               JULIUS(1)



NAME
       Julius - open source multi-purpose LVCSR engine

SYNOPSIS
       julius [-C jconffile] [options ...]

DESCRIPTION
       Julius  is  a high-performance, multi-purpose, open source
       speech recognition engine that performs  almost  real-time
       recognition  of continuous speech with 60k-word vocabulary
       on most current PCs.

       Word 3-gram language model and triphone HMM acoustic model
       of  any  units and sizes can be used.  As standard formats
       are adopted for the models, users can use their  own  lan-
       guage and acoustic models with Julius to build recognition
       system of their own.

       Julius can perform recognition on audio files, live micro-
       phone  input,  network  input and feature parameter files.
       The maximum size of vocabulary is 65,535 words.

RECOGNITION MODELS
       Julius supports the following models.

       Acoustic Models
                 Sub-word HMM (Hidden Markov Model) in HTK format
                 are supported.  Phoneme models (monophone), con-
                 text dependent phoneme models (triphone),  tied-
                 mixture  and phonetic tied-mixture models of any
                 unit can be used.  When using context  dependent
                 models, interword context is also handled.

       Lanaguage model
                 The  system  uses 2-gram and reverse 3-gram lan-
                 guage models.  The Standard ARPA format is  sup-
                 ported.   In addition, a binary format N-gram is
                 also supported for efficiency.   The  binary  N-
                 gram  can  be  converted  from the ARPA language
                 models using the attached tool mkbingram.

SPEECH INPUT
       Speech waveform files (16bit  WAV  (no  compression),  RAW
       format,  and  many  other if used with libsndfile library)
       and feature parameter files (HTK format) can  be  used  as
       speech  input.   Live  input  from  either a Microphone, a
       DatLink (NetAudio) system, or via tcpip  network  is  also
       supported.

       Notice:  Julius  can  only  extract  MFCC_E_D_N_Z features
       internally.  If you want to use HMMs based on another type
       of  feature  extraction  then  microphone input and speech
       waveform files cannot be used.  Use an external tool  such
       as  Hcopy  or  wav2mfcc  to create the appropriate feature
       parameter files.

SEARCH ALGORITHM
       Recognition algorithm of Julius is  based  on  a  two-pass
       strategy.   Word 2-gram and reverse word 3-gram is used on
       the respective passes.  The entire input is  processed  on
       the  first  pass, and again the final searching process is
       performed again for the input, using  the  result  of  the
       first pass as a "guidance".  Specifically, the recognition
       algorithm is based on a tree-trellis heuristic search com-
       bined with left-to-right frame-synchronous beam search and
       right-to-left stack decoding search.

       When using context dependent phones (triphones), interword
       contexts  are  taken into consideration.  For tied-mixture
       and  phonetic  tied-mixture  models,  high-speed  acoustic
       likelihood calculation is possible using gaussian pruning.

       For more details, see the related  document  or  web  site
       below.

OPTIONS
       The  options below allow you to specify the models and set
       system parameters.  You can set these option at  the  com-
       mand  line,  however  it  is  recommended that you combine
       these options in a "jconf settings file" and use the  "-C"
       option to read it at run time.

       Below is an explanation of all the available options.

   Speech Input
       -input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
              Select speech data input source.  'rawfile' is from
              waveform file (file name should be specified  after
              startup).   'mfcfile'  is  a  feature  vector  file
              extracted by HTK  HCopy  tool.   'mic'  means  live
              microphone  input,  and  'adinnet'  means receiving
              waveform data via tcpip  network  from  an  adinnet
              client. 'stdin' means standard tty input.

              The  supported waveform file format varies based on
              compilation time configuration.  To see what format
              is  actually  supported, see the help message using
              option "-help".  (for stdin  input,  only  WAV  (no
              compression) and RAW (16bit, BE) is supported.)
              (default: mfcfile)

       -filelist file
              (with  -input  rawfile|mfcfile) perform recognition
              on all files contained within the target file.

       -adport portnum
              (with -input adinnet) adinnet port number (default:
              5530)

       -NA server:unit
              (with -input netaudio) set the server name and unit
              ID of the Datlink unit.

   Speech Detection
       -cutsilence

       -nocutsilence
              Force silence cutting (=speech  segment  detection)
              ON/OFF.  (default:  ON  for  mic/adinnet,  OFF  for
              files)

       -lv threslevel
              Amplitude threshold (0 - 32767).  If the  amplitude
              passes  this  threshold  it is considered to be the
              beginning of a speech segment, if  it  drops  below
              this  level  then  it is the end of the speech seg-
              ment. (default: 3000)


       -zc zerocrossnum
              Zero crossing threshold per a second (default: 60)

       -headmargin msec
              Margin at the start of the speech segment  in  mil-
              liseconds. (default: 300)

       -tailmargin msec
              Margin  at  the  end  of the speech segment in mil-
              liseconds. (default: 400)

       -nostrip
              On some sound devices, invalid "0" samples  may  be
              recorded at the start and end of recording.  Julius
              remove them automatically by default.  This  option
              inhibit the automatic removal.

   Acoustic Analysis
       -smpFreq frequency
              Sampling frequency (Hz).
              (default: 16kHz = 625ns).

       -smpPeriod period
              Sampling rate (nanoseconds).
              (default: 625ns = 16kHz).

       -fsize sample
              Analysis  window size (No. samples) (default: 400).

       -fshift sample
              Frame shift (No. samples) (default: 160).

       -delwin frame
              Delta window size (No. frames) (default: 2).

       -hipass frequency
              High-pass filter cutoff frequency (Hz).
              (default: -1 = disabled)

       -lopass frequency
              Low-pass filter cutoff frequency (Hz).
              (default: -1 = disabled)

       -sscalc
              Perform spectral subtraction using the head silence
              of files.  Valid only for rawfile input.

       -sscalclen
              Specify  the length of head silence in milliseconds
              (default: 300)

       -ssload filename
              Perform spectral subtraction for speech input using
              pre-estimated  noise spectrum from file.  The noise
              spectrum data  should  be  computed  beforehand  by
              mkss.

       -ssalpha value
              Alpha  coefficient  of spectral subtraction.  Noise
              will be subtracted  stronger  as  this  value  gets
              larger, but distortion of the resulting signal also
              becomes remarkable.  (default: 2.0)

       -ssfloor value
              Flooring coefficient of spectral subtraction.   For
              spectral  parameters  that go under zero after sub-
              traction, the source signal is assigned  with  this
              coefficient multiplied. (default: 0.5)

   Language Model (word N-gram)
       -nlr 2gram_filename
              2-gram  language  model  filename  in standard ARPA
              format.

       -nrl rev_3gram_filename
              Reverse 3-gram language model  filename.   This  is
              required  for  the  second search pass.  If this is
              not defined then only  the  first  pass  will  take
              place.

       -d bingram_filename
              Use  a  binary language model as built using mkbin-
              gram(1).  This is used in place of the  "-nlr"  and
              "-nlr"  options above, and allows Julius to perform
              rapid initialization.

       -lmp lm_weight lm_penalty

       -lmp2 lm_weight2 lm_penalty2
              Language model score  weights  and  word  insertion
              penalties  for  the first and second passes respec-
              tively.

              The hypothesis language scores are scaled as  shown
              below:

              lm_score1  =  lm_weight * 2-gram_score + lm_penalty
              lm_score2 = lm_weight2 * 3-gram_score + lm_penalty2

              The defaults are dependent on acoustic model:

                First-Pass | Second-Pass
               --------------------------
                5.0 -1.0   |  6.0  0.0 (monophone)
                8.0 -2.0   |  8.0 -2.0 (triphone,PTM)
                9.0  8.0   | 11.0 -2.0 (triphone,PTM, setup=v2.1)

       -transp float
              Additional insertion penalty for transparent words.
              (default: 0.0)

   Word Dictionary
       -v dictionary_file
              Word dictionary file (required)

       -silhead {WORD|WORD[OUTSYM]|#num}

       -siltail {WORD|WORD[OUTSYM]|#num}
              Sentence  start  and end silence word as defined in
              the dictionary.  (default: "<s>" / "</s>")

              These are dealt with specially  during  recognition
              to hypotheses start and end points (margins).  They
              can be defined as shown below.

                                       Example
           Word_name                     <s>
           Word_name[output_symbol]   <s>[silB]
           #Word_ID                      #14

            (Word_ID is the word position in the dictionary
             file starting from 0)

       -forcedict
              Disregard dictionary errors.  Word definitions with
              errors will be skipped on startup.

   Acoustic Model (HMM)
       -h hmmfilename
              HMM definition file to use. (required)

       -hlist HMMlistfilename
              HMMList  file to use.  Required when using triphone
              based HMMs.  This file provides a  mapping  between
              the  logical  triphones  names  genertated from the
              phonetic representation in the dictionary  and  the
              HMM definition names.

       -iwcd1 {max|avg}
              When  using a triphone model, select method to han-
              dle inter-word triphone context on  the  first  and
              last phone of a word in the first pass.

              max: use maximum likelihood of the same
                   context triphones (default)
              avg: use average likelihood of the same
                   context triphones

       -force_ccd / -no_ccd
              Normally  Julius  determines  whether the specified
              hmmdefs is a context-dependent model by  the  model
              definition  names,  i.e.,  whether  the model names
              contain character '+' and '-'.  In case  the  auto-
              matic  detection  fails, you can explicitly specify
              by these options.  These options will override  the
              automatic detection result.

       -notypecheck
              Disable   check   of   the  input  parameter  type.
              (default: enabled)

   Acoustic Computation
       Gaussian Pruning will be automatically enabled when  using
       tied-mixture  based  acoutic  model.   Gaussian  Selection
       needs a monophone model converted by mkgshmm to  activate.

       -tmix K
              With  Gaussian Pruning, specify the number of Gaus-
              sians to compute per codebook. (default: 2)

       -gprune {safe|heuristic|beam|none}
              Set the Gaussian pruning technique to use.
              (default: safe (setup=standard) beam (setup=fast))

       -gshmm hmmdefs
              Specify monophone hmmdefs to use for Gaussian  Mix-
              ture  Selectio.   Monophone model for GMS is gener-
              ated from an ordinary  monophone  HMM  model  using
              mkgshmm.   This  option is disabled by default. (no
              GMS applied)

       -gsnum N
              When using GMS, specify number of  monophone  state
              to  select  from  whole monophone states. (default:
              24)

   Short-pause Segmentation
       The short pause segmentation can  be  used  for  sucessive
       decoding  of a long utterance.  Enabled when compiled with
       '--enable-sp-segment'.

       -spdur Set the short-pause duration threshold in number of
              frames.   If  a  short-pause  word  has the maximum
              likelihood in successive frames  longer  than  this
              value,  then interrupt the first pass and start the
              second pass. (default: 10)

   Search Parameters (First Pass)
       -b beamwidth
              Beam width (Number of HMM nodes).   As  this  value
              increases  the  precision  also increases, however,
              processing time and memory usage also increase.

              default value: acoustic model dependent
                400 (monophone)
                800 (triphone,PTM)
               1000 (triphone,PTM, setup=v2.1)

       -sepnum N
              Number of high frequency words to be separated from
              the lexicon tree. (default: 150)

       -1pass Only  perform  the first pass search.  This mode is
              automatically set when no 3-gram language model has
              been specified (-nlr).

       -realtime

       -norealtime
              Explicitly  specify  whether  real-time  (pipeline)
              processing will be done in the first pass  or  not.
              For  file  input, the default is OFF (-norealtime),
              for microphone, adinnet  and  NetAudio  input,  the
              default  is ON (-realtime).  This option relates to
              the way CMN is performed: when OFF  CMN  is  calcu-
              lated  for each input independently, when the real-
              time option is ON the previous 5 second of input is
              always used.  Also refer to -progout.

       -cmnsave filename
              Save last CMN parameters computed while recognition
              to the specified  file.   The  parameters  will  be
              saved  to  the  file in each time a input is recog-
              nized, so the output file always keeps the last CMN
              parameters.   If output file already exist, it will
              be overridden.

       -cmnload filename
              Load initial CMN parameters previously saved  in  a
              file  by "-cmnsave".  This option enables Julius to
              recognize the first utterance of a live  microphone
              input or adinnet input with CMN.

   Search Parameters (Second Pass)
       -b2 hyponum
              Beam  width  (number of hypothesis) in second pass.
              If the count of word expantion at a certain  length
              of  hypothesis  reaches  this  limit  while search,
              shorter hypotheses are not expanded further.   This
              prevents  search to fall in breadth-first-like sta-
              tus stacking on  the  same  position,  and  improve
              search failure.  (default: 30)


       -n candidatenum
              The  search continues till 'candidate_num' sentence
              hypotheses have been found.  The obtained  sentence
              hypotheses are sorted by score, and final result is
              displayed in the  order  (see  also  the  "-output"
              option).

              The  possibility  that  the  optimum  hypothesis is
              found increases as this value is increased, but the
              processing time also becomes longer.

              Default  value depends on the  engine setup on com-
              pilation time:
                10  (standard)
                 1  (fast, v2.1)

       -output N
              The top N sentence hypothesis will be Output at the
              end of search.  Use with "-n" option. (default: 1)

       -sb score
              Score  envelope  width for enveloped scoring.  When
              calculating hypothesis  score  for  each  generated
              hypothesis, its trellis expansion and viterbi oper-
              ation will be pruned in the middle of the speech if
              score  on a frame goes under [current maximum score
              of the frame- width].   Giving  small  value  makes
              computation  cost  of  the second pass smaller, but
              computation error may occur.  (default: 80.0)

       -s stack_size
              The maximum number of hypothesis that can be stored
              on the stack during the search.  A larger value may
              give more stable results, but increases the  amount
              of memory required. (default: 500)

       -m overflow_pop_times
              Number  of  expanded hypotheses required to discon-
              tinue  the  search.   If  the  number  of  expanded
              hypotheses is greater then this threshold then, the
              search is discontinued at that point.   The  larger
              this value is, the longer the search will continue,
              but processing time for search failures  will  also
              increase. (default: 2000)

       -lookuprange nframe
              When  performing  word  expansion, this option sets
              the number of frames before and after in  which  to
              determine  next word hypotheses.  This prevents the
              omission of short words but, with  a  large  value,
              the  number  of  expanded  hypotheses increases and
              system becomes slow. (default: 5)

   Forced Alignment
       -walign
              Do viterbi alignment per word units from the recog-
              nition  result.   The  word boundary frames and the
              average acoustic scores per frame are calculated.

       -palign
              Do viterbi alignment per phoneme (model) units from
              the   recognition  result.   The  phoneme  boundary
              frames and the average acoustic  scores  per  frame
              are calculated.


       -salign
              Do  viterbi alignment per HMM state from the recog-
              nition result.  The state boundary frames  and  the
              average acoustic scores per frame are calculated.

   Server Module Mode
       -module [port]
              Run Julius on "Server Module Mode".  After startup,
              Julius waits for  tcp/ip  connection  from  client.
              Once connection is established, Julius start commu-
              nication with the client to process  incoming  com-
              mands  from  the  client,  or to output recognition
              results, input trigger information and other system
              status  to  the  client.  The multi-grammar mode is
              only supported at this  Server  Module  Mode.   The
              default port number is 10500.

       -outcode [W][L][P][S][w][l][p][s]
              (Only  for Server Module Mode) Switch which symbols
              of recognized words to be sent to client.   Specify
              'W'  for  output symbol, 'L' for grammar entry, 'P'
              for phoneme sequence, 'S' for score,  respectively.
              Capital  letters  are  for  the  second pass (final
              result), and small letters are for results  of  the
              first  pass.  For example, if you want to send only
              the output symbols and phone sequences as a  recog-
              nition result to a client, specify "-outcode WP".

   Message Output
       -separatescore
              Output the language and acoustic scores separately.

       -quiet Omit phoneme sequence and score,  only  output  the
              best word sequence hypothesis.

       -progout
              Enable progressive output of the partial results on
              the first pass at regular intervals.

       -proginterval msec
              set the output time interval of "-progout" in  mil-
              liseconds.

       -demo  Equivalent to "-progout -quiet"

   OTHERS
       -debug (For  debug)  display  internal  status  and  debug
              information.

       -C jconffile
              Load the jconf file.  The options  written  in  the
              file  are included and expanded at the point.  This
              option can also be used within other jconf file.

       -check wchmm
              (For debug) turn on interactive check mode of  tree
              lexicon structure at startup.

       -check triphone
              (For debug) turn on interactive check mode of model
              mapping between Acoustic model, HMMList and dictio-
              nary at startup.

       -version
              Display version information and exit.

       -help  Display a brief description of all options.

EXAMPLES
       For  examples  of system usage, refer to the tutorial sec-
       tion in the Julius documents.

NOTICE
       Note about path names in jconf files: relative paths in  a
       jconf  file  are interpreted as relative to the jconf file
       itself, not to the current directory.

SEE ALSO
       julian(1), mkbingram(1), mkss(1), jcontrol(1), adinrec(1),
       adintool(1), mkdfa(1), mkgsmm(1), wav2mfcc(1)

       http://julius.sourceforge.jp/  (main)
       http://sourceforge.jp/projects/julius/ (development site)

DIAGNOSTICS
       Julius  normally  will  return  the  exit status 0.  If an
       error occurs, Julius exits abnormally with exit status  1.
       If  an  input file cannot be found or cannot be loaded for
       some reason then Julius  will  skip  processing  for  that
       file.

BUGS
       There  are  some  restrictions to the type and size of the
       models Julius can use.  For a detailed  explanation  refer
       to  the  Julius  documentation.  For bug-reports, inquires
       and comments please contact  julius@kuis.kyoto-u.ac.jp  or
       julius@is.aist-nara.ac.jp.

AUTHORS
       Rev.1.0 (1998/02/20)
              Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto
              University)

              Development by Akinobu LEE (Kyoto University)

       Rev.1.1 (1998/04/14)

       Rev.1.2 (1998/10/31)

       Rev.2.0 (1999/02/20)

       Rev.2.1 (1999/04/20)

       Rev.2.2 (1999/10/04)

       Rev.3.0 (2000/02/14)

       Rev.3.1 (2000/05/11)
              Development of above versions by Akinobu LEE (Kyoto
              University)

       Rev.3.2 (2001/08/15)

       Rev.3.3 (2002/09/11)
              Development  of above versions by Akinobu LEE (Nara
              Institute of Science and Technology)

THANKS TO
       From Rev.3.2 Julius is released by the  "Information  Pro-
       cessing Society, Continuous Speech Consortium".

       The  Windows  DLL  version  was  developed and released by
       Hideki BANNO (Nagoya University).

       The Windows Microsoft Speech API  compatible  version  was
       developed by Takashi SUMIYOSHI (Kyoto University).



                              LOCAL                     JULIUS(1)
