alex_asr - Python documentation¶

Contents:

class alex_asr.Decoder¶

Speech recognition decoder.

__init__(self, model_path)¶

Initialise recognizer with audio input stream parameters.

Parameters:	model_path (str) – Path where the speech recognition models are stored.

accept_audio(self, bytes frame_str)¶

Insert given buffer of audio to the decoder for decoding.

The buffer is interpreted according to the bits configuration parameter of the loaded model. Usually bits=16 therefore bytes is interpreted as an array of 16bit little-endian signed integers. Can be modified by set_bits_per_sample.

Parameters:	frame_str (bytes) – Audio data.

decode(self, max_frames=10)¶

Proceed with decoding the audio.

Parameters:	max_frames (int) – Maximum number of frames to decode.
Returns:	Number of decoded frames.

endpoint_detected(self)¶

Has an endpoint been detected?

Configuration of endpointing is loaded from the model.

Returns:	bool whether endpoint was detected

finalize_decoding(self)¶: Finalize the decoding and prepare the internal representation for lattice extration.

get_best_path(self)¶

Get current 1-best decoding hypothesis.

Returns:	(hypothesis likelihood, list of word id’s)
Return type:	tuple

get_bits_per_sample(self)¶

Get number of bits each input sample has.

Returns:	int number of bits

get_final_relative_cost(self)¶

Get the relative cost of the decoding so far of the final states.

Returns:	float cost

get_ivector(self)¶

Get Ivector of the latest decoded frame.

Ivector extraction is specified in the model configuration. If the model does not use Ivectors this function does not return anything.

Returns:	list of floats with the ivector

get_lattice(self)¶

Get word posterior lattice and its likelihood.

NOTE: It may last 100 ms so consideration is needed when used in a timing-critical applications.

Returns:	(lattice likelihood, lattice)
Return type:	tuple

get_nbest(self, n=1)¶

Get n best decoding hypotheses (from word posterior lattice).

Parameters:	n (int) – How many hypotheses to generate.
Returns:	list of hypotheses; each hypothesis is a tuple (hypothesis probability, list of word ids)

get_num_frames_decoded(self)¶

Get number of frames decoded so far.

Returns:	int number of frames decoded

get_trailing_silence_length(self)¶

Get number of consecutive silence frames from the end of utterance.

Returns:	int number of frames in the best hypothesis, for which silence was consecutively decoded, from the end of utterance

get_word(self, word_id)¶

Get word string form given word id.

Parameters:	word_id (int) – Word id (e.g. as returned by get_best_path).
Returns:	Word string (str).

input_finished(self)¶: Signalize to the decoder that no more input will be added.

reset(self)¶: Reset the decoder for decoding a new utterance.

set_bits_per_sample(self, n_bits)¶: Set number of bits each input sample will have.