My religion is audio input, when it comes to language learning.
Text input is useful, if not necessary, but it shouldn’t be the main focus.
Learning a language is like learning how to walk. You need your hands at first, or crutches after an injury. But the end goal is to walk without those. Just with your legs.
The legs are the audio input. The hands (or crutches) are the text input.
The end goal is to be able to communicate. Without relying on visual cues.