JULIAN(1)                                               JULIAN(1)



NAME
       Julian  -  open  source  grammar  based  continuous speech
       recognition parser

SYNOPSIS
       julian [-C jconffile] [options ...]

DESCRIPTION
       Julian is a multi-purpose speech recognition parser  based
       on  finite  state  grammar.  It is an another variation of
       Julius , and is included in the  distribution  of  Julius.
       It  has a capability of performing almost real-time recog-
       nition of continuous speech with  over  ten  thousands  of
       words on most current PCs.

       Written  finite  state  grammar  and triphone HMM acoustic
       model of any units and sizes can  be  used.   The  grammar
       format  is original one, and tools to create a recognirion
       grammar are included in the distribution.   Standard  for-
       mats are also adopted for acoustic models.  Users can make
       their own grammars, their own acoustic models with  Julian
       to build recognition system of their own.

       Julian can perform recognition on audio files, live micro-
       phone input, network input and  feature  parameter  files.
       The maximum size of vocabulary is 65,535 words.

RECOGNITION MODELS
       Julian supports the following models.

       Acoustic Models
                 Same  as  Julius:  Sub-word  HMM  (Hidden Markov
                 Model) in HTK  format  are  supported.   Phoneme
                 models  (monophone),  context  dependent phoneme
                 models  (triphone),  tied-mixture  and  phonetic
                 tied-mixture  models  of  any  unit can be used.
                 When using context dependent  models,  interword
                 context is also handled.

       Lanaguage model
                 For  the  task  grammar, sentence structures are
                 written in a BNF style using word categories  as
                 terminating  symbols  to  a grammar file. A voca
                 file   contains   the   pronunciation   (phoneme
                 sequence) for all words within each category are
                 created.   These   files   are   converted    by
                 mkdfa.pl(1)  to a deterministic finite automaton
                 file (.dfa) and a dictionary file (.dict).

SPEECH INPUT
       Same as Julius: Speech waveform files (16bit WAV (no  com-
       pression), RAW format, and many other if used with libsnd-
       file library) and feature parameter files (HTK format) can
       be  used as speech input.  Live input from either a Micro-
       phone, a DatLink (NetAudio) system, or via  tcpip  network
       is also supported.

       Notice:  Julian  can  only  extract  MFCC_E_D_N_Z features
       internally.  If you want to use HMMs based on another type
       of  feature  extraction  then  microphone input and speech
       waveform files cannot be used.  Use an external tool  such
       as  Hcopy  or  wav2mfcc  to create the appropriate feature
       parameter files.

SEARCH ALGORITHM
       Recognition algorithm of Julian is  based  on  a  two-pass
       strategy.   In  the  first  pass, a high-speed approximate
       search is performed  using  weaker  constraints  then  the
       given  grammar.   Here  a LR beam search using only inter-
       category constraints extracted from the  grammar  is  per-
       formed.  The  second pass re-searches the input, using the
       original grammar rules and intermediate results  from  the
       first  pass,  to gain a high precision result quickly.  In
       the second pass  the  optimal  solution  is  theoretically
       guaranteed using the A* search.

       When using context dependent phones (triphones), interword
       contexts are taken into consideration.   For  tied-mixture
       and  phonetic  tied-mixture  models,  high-speed  acoustic
       likelihood calculation is possible using gaussian pruning.

       For  more  details,  see  the related document or web site
       below.

OPTIONS
       The options below allow you to specify the models and  set
       system  parameters.   You can set these option at the com-
       mand line, however it  is  recommended  that  you  combine
       these  options in a "jconf settings file" and use the "-C"
       option to read it at run time.

       Most are the same as Julius.
       Options only in Julian: -dfa, -penalty1,  -penalty2,  -sp,
       -looktrellis
       Options  only  in  Julius:  -nlr,  -nrl,  -d, -lmp, -lmp2,
       -transp,  -silhead,  -siltail,  -spdur,  -sepnum,   -sepa-
       ratescore

       Below is an explanation of all the available options.

   Speech Input
       -input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
              Select speech data input source.  'rawfile' is from
              waveform file (file name should be specified  after
              startup).   'mfcfile'  is  a  feature  vector  file
              extracted by HTK  HCopy  tool.   'mic'  means  live
              microphone  input,  and  'adinnet'  means receiving
              waveform data via tcpip  network  from  an  adinnet
              client. 'stdin' means standard tty input.

              The  supported waveform file format varies based on
              compilation time configuration.  To see what format
              is  actually  supported, see the help message using
              option "-help".  (for stdin  input,  only  WAV  (no
              compression) and RAW (16bit, BE) is supported.)
              (default: mfcfile)

       -filelist file
              (with  -input  rawfile|mfcfile) perform recognition
              on all files contained within the target file.

       -adport portnum
              (with -input adinnet) adinnet port number (default:
              5530)

       -NA server:unit
              (with -input netaudio) set the server name and unit
              ID of the Datlink unit.

   Speech Detection
       -cutsilence

       -nocutsilence
              Force silence cutting (=speech  segment  detection)
              ON/OFF.  (default:  ON  for  mic/adinnet,  OFF  for
              files)

       -lv threslevel
              Amplitude threshold (0 - 32767).  If the  amplitude
              passes  this  threshold  it is considered to be the
              beginning of a speech segment, if  it  drops  below
              this  level  then  it is the end of the speech seg-
              ment. (default: 3000)

       -zc zerocrossnum
              Zero crossing threshold per a second (default: 60)

       -headmargin msec
              Margin at the start of the speech segment  in  mil-
              liseconds. (default: 300)

       -tailmargin msec
              Margin  at  the  end  of the speech segment in mil-
              liseconds. (default: 400)

       -nostrip
              On some sound devices, invalid "0" samples  may  be
              recorded at the start and end of recording.  Julian
              remove them automatically by default.  This  option
              inhibit the automatic removal.

   Acoustic Analysis
       -smpFreq frequency
              Sampling frequency (Hz).
              (default: 16kHz = 625ns).

       -smpPeriod period
              Sampling rate (nanoseconds).
              (default: 625ns = 16kHz).

       -fsize sample
              Analysis  window size (No. samples) (default: 400).

       -fshift sample
              Frame shift (No. samples) (default: 160).

       -delwin frame
              Delta window size (No. frames) (default: 2).

       -hipass frequency
              High-pass filter cutoff frequency (Hz).
              (default: -1 = disabled)

       -lopass frequency
              Low-pass filter cutoff frequency (Hz).
              (default: -1 = disabled)

       -sscalc
              Perform spectral subtraction using the head silence
              of files.  Valid only for rawfile input.

       -sscalclen
              Specify  the length of head silence in milliseconds
              (default: 300)

       -ssload filename
              Perform spectral subtraction for speech input using
              pre-estimated  noise spectrum from file.  The noise
              spectrum data  should  be  computed  beforehand  by
              mkss.

       -ssalpha value
              Alpha  coefficient  of spectral subtraction.  Noise
              will be subtracted  stronger  as  this  value  gets
              larger, but distortion of the resulting signal also
              becomes remarkable.  (default: 2.0)

       -ssfloor value
              Flooring coefficient of spectral subtraction.   For
              spectral  parameters  that go under zero after sub-
              traction, the source signal is assigned  with  this
              coefficient multiplied. (default: 0.5)

   Language Model (Finite State Grammar)
       -dfa dfa_filename
              finite state automaton grammar file. (required)

       -penalty1 float
              Word   insertion   penalty   for  the  first  pass.
              (default: 0.0)

       -penalty2 float
              Word  insertion  penalty  for   the   first   pass.
              (default: 0.0)

   Word Dictionary
       -v dictionary_file
              Word dictionary file (required)

       -sp {WORD|WORD[OUTSYM]|#num}
              Name  of  short  pause  model  as  defined  in  the
              hmmdefs.  (default: "sp")

              For Words that has this model  as  a  pronunciation
              and  intended  to  match  the  short pauses between
              words, Julian handle them especially to  deal  with
              short  pause  insertion.   They  can  be defined as
              shown below.

                                       Example
           Word_name                     <s>
           Word_name[output_symbol]   <s>[silB]
           #Word_ID                      #14

            (Word_ID is the word position in the dictionary
             file starting from 0)

       -forcedict
              Disregard dictionary errors.  Word definitions with
              errors will be skipped on startup.

   Acoustic Model (HMM)
       -h hmmfilename
              HMM definition file to use. (required)

       -hlist HMMlistfilename
              HMMList  file to use.  Required when using triphone
              based HMMs.  This file provides a  mapping  between
              the  logical  triphones  names  genertated from the
              phonetic representation in the dictionary  and  the
              HMM definition names.

       -iwcd1 {max|avg}
              When  using a triphone model, select method to han-
              dle inter-word triphone context on  the  first  and
              last phone of a word in the first pass.

              max: use maximum likelihood of the same
                   context triphones (default)
              avg: use average likelihood of the same
                   context triphones

       -force_ccd / -no_ccd
              Normally  Julian  determines  whether the specified
              hmmdefs is a context-dependent model by  the  model
              definition  names,  i.e.,  whether  the model names
              contain character '+' and '-'.  In case  the  auto-
              matic  detection  fails, you can explicitly specify
              by these options.  These options will override  the
              automatic detection result.

       -notypecheck
              Disable   check   of   the  input  parameter  type.
              (default: enabled)

   Acoustic Computation
       Gaussian Pruning will be automatically enabled when  using
       tied-mixture  based  acoutic  model.   Gaussian  Selection
       needs a monophone model converted by mkgshmm to  activate.

       -tmix K
              With  Gaussian Pruning, specify the number of Gaus-
              sians to compute per codebook. (default: 2)

       -gprune {safe|heuristic|beam|none}
              Set the Gaussian pruning technique to use.
              (default: safe (setup=standard) beam (setup=fast))

       -gshmm hmmdefs
              Specify monophone hmmdefs to use for Gaussian  Mix-
              ture  Selectio.   Monophone model for GMS is gener-
              ated from an ordinary  monophone  HMM  model  using
              mkgshmm.   This  option is disabled by default. (no
              GMS applied)

       -gsnum N
              When using GMS, specify number of  monophone  state
              to  select  from  whole monophone states. (default:
              24)

   Search Parameters (First Pass)
       -b beamwidth
              Beam width (Number of HMM nodes).   As  this  value
              increases  the  precision  also increases, however,
              processing time and memory usage also increase.

              default value: acoustic model dependent
                400 (monophone)
                800 (triphone,PTM)
               1000 (triphone,PTM, setup=v2.1)

       -1pass Only perform the first pass search.  This  mode  is
              automatically set when no 3-gram language model has
              been specified (-nlr).



       -realtime

       -norealtime
              Explicitly  specify  whether  real-time  (pipeline)
              processing  will  be done in the first pass or not.
              For file input, the default is  OFF  (-norealtime),
              for  microphone,  adinnet  and  NetAudio input, the
              default is ON (-realtime).  This option relates  to
              the  way  CMN  is performed: when OFF CMN is calcu-
              lated for each input independently, when the  real-
              time option is ON the previous 5 second of input is
              always used.  Also refer to -progout.

       -cmnsave filename
              Save last CMN parameters computed while recognition
              to  the  specified  file.   The  parameters will be
              saved to the file in each time a  input  is  recog-
              nized, so the output file always keeps the last CMN
              parameters.  If output file already exist, it  will
              be overridden.

       -cmnload filename
              Load  initial  CMN parameters previously saved in a
              file by "-cmnsave".  This option enables Julian  to
              recognize  the first utterance of a live microphone
              input or adinnet input with CMN.

   Search Parameters (Second Pass)
       -b2 hyponum
              Beam width (number of hypothesis) in  second  pass.
              If  the count of word expantion at a certain length
              of hypothesis  reaches  this  limit  while  search,
              shorter  hypotheses are not expanded further.  This
              prevents search to fall in breadth-first-like  sta-
              tus  stacking  on  the  same  position, and improve
              search failure.  (default: 30)

       -n candidatenum
              The search continues till 'candidate_num'  sentence
              hypotheses  have been found.  The obtained sentence
              hypotheses are sorted by score, and final result is
              displayed  in  the  order  (see  also the "-output"
              option).

              The possibility  that  the  optimum  hypothesis  is
              found increases as this value is increased, but the
              processing time also becomes longer.

              Default value depends on the  engine setup on  com-
              pilation time:
                10  (standard)
                 1  (fast, v2.1)

       -output N
              The top N sentence hypothesis will be Output at the
              end of search.  Use with "-n" option. (default: 1)

       -sb score
              Score envelope width for enveloped  scoring.   When
              calculating  hypothesis  score  for  each generated
              hypothesis, its trellis expansion and viterbi oper-
              ation will be pruned in the middle of the speech if
              score on a frame goes under [current maximum  score
              of  the  frame-  width].   Giving small value makes
              computation cost of the second  pass  smaller,  but
              computation error may occur.  (default: 80.0)

       -s stack_size
              The maximum number of hypothesis that can be stored
              on the stack during the search.  A larger value may
              give  more stable results, but increases the amount
              of memory required. (default: 500)

       -m overflow_pop_times
              Number of expanded hypotheses required  to  discon-
              tinue  the  search.   If  the  number  of  expanded
              hypotheses is greater then this threshold then, the
              search  is  discontinued at that point.  The larger
              this value is, the longer the search will continue,
              but  processing  time for search failures will also
              increase. (default: 2000)

       -lookuprange nframe
              When performing word expansion,  this  option  sets
              the  number  of frames before and after in which to
              determine next word hypotheses.  This prevents  the
              omission  of  short  words but, with a large value,
              the number of  expanded  hypotheses  increases  and
              system becomes slow. (default: 5)

       -looktrellis
              Expand  only  the trellis words instead of grammar-
              permitted words.  This  option  makes  second  pass
              decoding faster, but may increase deletion error of
              short words. (default: disabled)

   Forced Alignment
       -walign
              Do viterbi alignment per word units from the recog-
              nition  result.   The  word boundary frames and the
              average acoustic scores per frame are calculated.

       -palign
              Do viterbi alignment per phoneme (model) units from
              the   recognition  result.   The  phoneme  boundary
              frames and the average acoustic  scores  per  frame
              are calculated.

       -salign
              Do  viterbi alignment per HMM state from the recog-
              nition result.  The state boundary frames  and  the
              average acoustic scores per frame are calculated.

   Server Module Mode
       -module [port]
              Run Julian on "Server Module Mode".  After startup,
              Julian waits for  tcp/ip  connection  from  client.
              Once connection is established, Julian start commu-
              nication with the client to process  incoming  com-
              mands  from  the  client,  or to output recognition
              results, input trigger information and other system
              status  to  the  client.  The multi-grammar mode is
              only supported at this  Server  Module  Mode.   The
              default port number is 10500.

       -outcode [W][L][P][S][w][l][p][s]
              (Only  for Server Module Mode) Switch which symbols
              of recognized words to be sent to client.   Specify
              'W'  for  output symbol, 'L' for grammar entry, 'P'
              for phoneme sequence, 'S' for score,  respectively.
              Capital  letters  are  for  the  second pass (final
              result), and small letters are for results  of  the
              first  pass.  For example, if you want to send only
              the output symbols and phone sequences as a  recog-
              nition result to a client, specify "-outcode WP".

   Message Output
       -quiet Omit  phoneme  sequence  and score, only output the
              best word sequence hypothesis.

       -progout
              Enable progressive output of the partial results on
              the first pass at regular intervals.

       -proginterval msec
              set  the output time interval of "-progout" in mil-
              liseconds.

       -demo  Equivalent to "-progout -quiet"

   OTHERS
       -debug (For  debug)  display  internal  status  and  debug
              information.

       -C jconffile
              Load  the  jconf  file.  The options written in the
              file are included and expanded at the point.   This
              option can also be used within other jconf file.

       -check wchmm
              (For  debug) turn on interactive check mode of tree
              lexicon structure at startup.

       -check triphone
              (For debug) turn on interactive check mode of model
              mapping between Acoustic model, HMMList and dictio-
              nary at startup.

       -version
              Display version information and exit.

       -help  Display a brief description of all options.

EXAMPLES
       For examples of system usage, refer to the  tutorial  sec-
       tion in the Julian documents.

NOTICE
       Note  about path names in jconf files: relative paths in a
       jconf file are interpreted as relative to the  jconf  file
       itself, not to the current directory.

SEE ALSO
       julius(1), mkbingram(1), mkss(1), jcontrol(1), adinrec(1),
       adintool(1), mkdfa(1), mkgsmm(1), wav2mfcc(1)

       http://julius.sourceforge.jp/  (main)
       http://sourceforge.jp/projects/julius/ (development site)

DIAGNOSTICS
       Julian normally will return the  exit  status  0.   If  an
       error  occurs, Julian exits abnormally with exit status 1.
       If an input file cannot be found or cannot be  loaded  for
       some  reason  then  Julian  will  skip processing for that
       file.

BUGS
       There are some restrictions to the type and  size  of  the
       models  Julian  can use.  For a detailed explanation refer
       to the Julius documentation.   For  bug-reports,  inquires
       and  comments  please contact julius@kuis.kyoto-u.ac.jp or
       julius@is.aist-nara.ac.jp.

AUTHORS
       Rev.1.0 (1998/07/20)
              Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto
              University)

       Rev.2.0 (1999/02/20)

       Rev.2.1 (1999/04/20)

       Rev.2.2 (1999/10/04)

       Rev.3.1 (2000/05/11)
              Development of above versions by Akinobu LEE (Kyoto
              University)

       Rev.3.2 (2001/08/15)

       Rev.3.3 (2002/09/11)
              Development of above versions by Akinobu LEE  (Nara
              Institute of Science and Technology)

THANKS TO
       From  Rev.3.2  Julian is released in the "Information Pro-
       cessing Society, Continuous Speech Consortium".

       The Windows Microsoft Speech API  compatible  version  was
       developed by Takashi SUMIYOSHI (Kyoto University).



                              LOCAL                     JULIAN(1)
