You told the software what to do, it responded, and nothing about the surface itself changed.
GenAI is breaking that model. The interface is no longer a static skin over a static app. It is becoming adaptive, multimodal, and in its most advanced form, agentic-reading your context, anticipating your needs, and sometimes acting before you ask. The jump from static to adaptive was big. The leap to agentic is bigger still, because it moves the interface from being a passive listener to an active collaborator. That leap will also demand trust, visible memory, and the ability to roll back actions.
Voice as the proof point
Alexa and Google Assistant taught consumers to talk to devices, but most flows still end with a generic response. In India, vernacular voice search for commerce has shown what happens when you close the loop: checkout conversion rates roughly double compared to typed English queries. The magic is not just in recognising the words. It is in remembering that you ordered turmeric last week, asking if you want it again, and placing the order in one flow.
Emotion as a surface
Emotion is making its way into the interface. Affectiva can read micro-expressions. Hume adjusts tone in real time. In mental health companions, stress detection triggers shorter, simpler suggestions, cutting drop-off. Even in non-therapeutic contexts, emotion-aware error handling can keep people in the loop: a system that acknowledges frustration gets more retries than one that fails silently.
Gaze as input
Gaze is emerging as both an accessibility win and a commerce driver. Tobii’s eye-tracking lets users navigate without touch. In AR retail pilots, dwell time on a product triggers an instant overlay with reviews and offers. It is powerful, but it underlines a gap: without a trust layer showing exactly what is being logged and why, adoption will stall.
Gesture and scene understanding
Gesture, once limited to gaming demos, becomes much more valuable when combined with GenAI scene understanding. In surgical training, gesture-driven UIs adapt difficulty based on precision. In fitness and factory training, they adjust pacing to fatigue cues. The India-specific opening here is in cost: low-cost camera hardware plus on-device models could make sub-₹5k gesture-first devices viable for learning, sport, or work.
Multimodal surfaces as the real prize
Meta’s AR glasses, Humane’s AI Pin, and Apple’s Vision Pro are all trying to make voice, gaze, gesture, and text just different ways into the same shared state. In that world, the “app” is just the agent’s current context, wherever you meet it.
What is working now
- Low-latency vernacular voice interfaces are producing measurable lifts in commerce.
- Adaptive pacing in learning platforms like Osmo and Supernova is driving higher completion rates.
- Health UIs like Oura and Whoop have stripped back complexity to deliver one clear decision metric a day.
- Cross-device continuity: picking up a task on your watch where you left it on your phone- is starting to feel natural.
What is still missing
- Few interfaces make an AI’s memory visible.
- You cannot easily see, edit, or delete what it knows about you in the moment.
- Inline explainability is rare; rollback is even rarer.
- Unified state across devices in low-connectivity environments is almost non-existent, yet critical for scale in markets like India.
Why India is the proving ground
Vernacular multimodal interaction is a must, not a novelty. GPU-light, edge-first design is essential for mid-tier hardware and offline use. Cultural UX hooks like astrology, cricket, and Bollywood can drive engagement in ways that do not translate from the West. The constraints here force better systems, and those systems travel.
Lessons for builders
- Design for action closure so every suggestion is one tap from execution.
- Put trust into the surface by showing reasoning and giving override.
- Optimise for speed: sub-500ms from input to feedback matters more than marginal gains in model intelligence.
- Treat interface state as shared memory across devices.
- In India, build for vernacular and low-bandwidth first. Tier-2 adoption is the real scale story.
In AI-native products, the interface is no longer a wrapper around the product. It is the product. Static surfaces will be outcompeted by those that adapt, remember, and act. The teams that make these surfaces effortless, and earn trust while doing it, will own the next platform shift.
If you are building interface infrastructure, trust rails, or adaptive surfaces, we want the first call.
→ build@boundlessvc.com