(Top) Training time (in seconds) required for each method on the InaSpoof-vl dataset. (Bottom) Inference time (in seconds) required by each method for predicting the evaluation subset. The prefix ‘Raw-’ indicates the use of raw speech waveform features as input.