alex_asr - Python documentation¶
Contents:
-
class
alex_asr.Decoder¶ Speech recognition decoder.
-
__init__(self, model_path)¶ Initialise recognizer with audio input stream parameters.
Parameters: model_path (str) – Path where the speech recognition models are stored.
-
accept_audio(self, bytes frame_str)¶ Insert given buffer of audio to the decoder for decoding.
The buffer is interpreted according to the bits configuration parameter of the loaded model. Usually bits=16 therefore bytes is interpreted as an array of 16bit little-endian signed integers. Can be modified by set_bits_per_sample.
Parameters: frame_str (bytes) – Audio data.
-
decode(self, max_frames=10)¶ Proceed with decoding the audio.
Parameters: max_frames (int) – Maximum number of frames to decode. Returns: Number of decoded frames.
-
endpoint_detected(self)¶ Has an endpoint been detected?
Configuration of endpointing is loaded from the model.
Returns: bool whether endpoint was detected
-
finalize_decoding(self)¶ Finalize the decoding and prepare the internal representation for lattice extration.
-
get_best_path(self)¶ Get current 1-best decoding hypothesis.
Returns: (hypothesis likelihood, list of word id’s) Return type: tuple
-
get_bits_per_sample(self)¶ Get number of bits each input sample has.
Returns: int number of bits
-
get_final_relative_cost(self)¶ Get the relative cost of the decoding so far of the final states.
Returns: float cost
-
get_ivector(self)¶ Get Ivector of the latest decoded frame.
Ivector extraction is specified in the model configuration. If the model does not use Ivectors this function does not return anything.
Returns: list of floats with the ivector
-
get_lattice(self)¶ Get word posterior lattice and its likelihood.
NOTE: It may last 100 ms so consideration is needed when used in a timing-critical applications.
Returns: (lattice likelihood, lattice) Return type: tuple
-
get_nbest(self, n=1)¶ Get n best decoding hypotheses (from word posterior lattice).
Parameters: n (int) – How many hypotheses to generate. Returns: list of hypotheses; each hypothesis is a tuple (hypothesis probability, list of word ids)
-
get_num_frames_decoded(self)¶ Get number of frames decoded so far.
Returns: int number of frames decoded
-
get_trailing_silence_length(self)¶ Get number of consecutive silence frames from the end of utterance.
Returns: int number of frames in the best hypothesis, for which silence was consecutively decoded, from the end of utterance
-
get_word(self, word_id)¶ Get word string form given word id.
Parameters: word_id (int) – Word id (e.g. as returned by get_best_path). Returns: Word string (str).
-
input_finished(self)¶ Signalize to the decoder that no more input will be added.
-
reset(self)¶ Reset the decoder for decoding a new utterance.
-
set_bits_per_sample(self, n_bits)¶ Set number of bits each input sample will have.
-