alex_asr - Python documentation

Contents:

class alex_asr.Decoder

Speech recognition decoder.

__init__(self, model_path)

Initialise recognizer with audio input stream parameters.

Parameters:model_path (str) – Path where the speech recognition models are stored.
accept_audio(self, bytes frame_str)

Insert given buffer of audio to the decoder for decoding.

The buffer is interpreted according to the bits configuration parameter of the loaded model. Usually bits=16 therefore bytes is interpreted as an array of 16bit little-endian signed integers. Can be modified by set_bits_per_sample.

Parameters:frame_str (bytes) – Audio data.
decode(self, max_frames=10)

Proceed with decoding the audio.

Parameters:max_frames (int) – Maximum number of frames to decode.
Returns:Number of decoded frames.
endpoint_detected(self)

Has an endpoint been detected?

Configuration of endpointing is loaded from the model.

Returns:bool whether endpoint was detected
finalize_decoding(self)

Finalize the decoding and prepare the internal representation for lattice extration.

get_best_path(self)

Get current 1-best decoding hypothesis.

Returns:(hypothesis likelihood, list of word id’s)
Return type:tuple
get_bits_per_sample(self)

Get number of bits each input sample has.

Returns:int number of bits
get_final_relative_cost(self)

Get the relative cost of the decoding so far of the final states.

Returns:float cost
get_ivector(self)

Get Ivector of the latest decoded frame.

Ivector extraction is specified in the model configuration. If the model does not use Ivectors this function does not return anything.

Returns:list of floats with the ivector
get_lattice(self)

Get word posterior lattice and its likelihood.

NOTE: It may last 100 ms so consideration is needed when used in a timing-critical applications.

Returns:(lattice likelihood, lattice)
Return type:tuple
get_nbest(self, n=1)

Get n best decoding hypotheses (from word posterior lattice).

Parameters:n (int) – How many hypotheses to generate.
Returns:list of hypotheses; each hypothesis is a tuple (hypothesis probability, list of word ids)
get_num_frames_decoded(self)

Get number of frames decoded so far.

Returns:int number of frames decoded
get_trailing_silence_length(self)

Get number of consecutive silence frames from the end of utterance.

Returns:int number of frames in the best hypothesis, for which silence was consecutively decoded, from the end of utterance
get_word(self, word_id)

Get word string form given word id.

Parameters:word_id (int) – Word id (e.g. as returned by get_best_path).
Returns:Word string (str).
input_finished(self)

Signalize to the decoder that no more input will be added.

reset(self)

Reset the decoder for decoding a new utterance.

set_bits_per_sample(self, n_bits)

Set number of bits each input sample will have.