alex_asr - Python documentation


class alex_asr.Decoder

Speech recognition decoder.

__init__(self, model_path)

Initialise recognizer with audio input stream parameters.

Parameters:model_path (str) – Path where the speech recognition models are stored.
accept_audio(self, bytes frame_str)

Insert given buffer of audio to the decoder for decoding.

The buffer is interpreted according to the bits configuration parameter of the loaded model. Usually bits=16 therefore bytes is interpreted as an array of 16bit little-endian signed integers. Can be modified by set_bits_per_sample.

Parameters:frame_str (bytes) – Audio data.
decode(self, max_frames=10)

Proceed with decoding the audio.

Parameters:max_frames (int) – Maximum number of frames to decode.
Returns:Number of decoded frames.

Has an endpoint been detected?

Configuration of endpointing is loaded from the model.

Returns:bool whether endpoint was detected

Finalize the decoding and prepare the internal representation for lattice extration.


Get current 1-best decoding hypothesis.

Returns:(hypothesis likelihood, list of word id’s)
Return type:tuple

Get number of bits each input sample has.

Returns:int number of bits

Get the relative cost of the decoding so far of the final states.

Returns:float cost

Get Ivector of the latest decoded frame.

Ivector extraction is specified in the model configuration. If the model does not use Ivectors this function does not return anything.

Returns:list of floats with the ivector

Get word posterior lattice and its likelihood.

NOTE: It may last 100 ms so consideration is needed when used in a timing-critical applications.

Returns:(lattice likelihood, lattice)
Return type:tuple
get_nbest(self, n=1)

Get n best decoding hypotheses (from word posterior lattice).

Parameters:n (int) – How many hypotheses to generate.
Returns:list of hypotheses; each hypothesis is a tuple (hypothesis probability, list of word ids)

Get number of frames decoded so far.

Returns:int number of frames decoded

Get number of consecutive silence frames from the end of utterance.

Returns:int number of frames in the best hypothesis, for which silence was consecutively decoded, from the end of utterance
get_word(self, word_id)

Get word string form given word id.

Parameters:word_id (int) – Word id (e.g. as returned by get_best_path).
Returns:Word string (str).

Signalize to the decoder that no more input will be added.


Reset the decoder for decoding a new utterance.

set_bits_per_sample(self, n_bits)

Set number of bits each input sample will have.