User-friendly LLMs
Posts
Speed of Feedback: How can we keep our users engaged, if the AI keeps them waiting?

Speed of Feedback: How can we keep our users engaged, if the AI keeps them waiting?

The Doherty threshold: only sub-seconds count.

June 07, 2023

You're sitting there, staring at the blinking cursor, waiting for ChatGPT to generate an answer to your question. But nothing seems to be happening. Frustrated, you switch tabs and check your emails while waiting. It's a common occurrence. What’s going on? Are the servers drowning in requests again or didn’t the model grasp your question? ChatGPT, like many AI systems, often grapples with providing timely feedback.

When we engage in activities within the physical world, we usually receive feedback almost instantly. Most of the time the speed of processing the feedback is limited by the speed of your nervous system - how fast can it send the signals from the sensory receptors to your brain? When you interact with any AI, we face a different issue: we often have to wait.

In general, humans are capable of processing information relatively quickly, especially when it comes to basic sensory perception. Woodworth in 1899 concluded that the minimum time to process and react to visual feedback was 450ms. Some studies go as low as 50ms. Typically we expect a processing time of 150 to 250 milliseconds for basic visual tasks.

In UX design, there is a different famous number. The Doherty threshold: When a computer and its users interact at a pace below 400ms, productivity soars. Nobody has to wait on the other. We work in perfect harmony. Our collaboration with AI is at unprecedented heights, but our pace hasn’t caught up yet. LLMs in particular are too slow to answer within 400ms. When they fail to do so, they fail to keep the users’ attention and decrease productivity.

How can we maintain the users’ attention even though our AI isn’t fast enough?

We do not need to provide the actual response within 400ms. We only have to give feedback within that time:

If we are loading data (for example recommendations, which videos to watch), we can show skeleton screens, until our model is finished and we can show the recommended videos.

When we expect a longer load and we do not know the form of the output (as in ChatGPT, we can show loading screens that approximate the time till the response is finished. We even can take inspiration from video games. We can show progress bars, give hints, and also inform the user what is currently going on - is the model working, is the data being transferred? Imagine you are Uber Eats. Keep the user informed on his order: where is it and when can he expect it.

We could even give an intermediate answer. Some ML models that compute large amounts of data, first provide a response based on a cached table and then iteratively refine the answer by loading more and more data into memory.

The approach isn’t the most important thing, but rather keeping the users’ attention on your tool, so they can stay productive and avoid switching tasks.

Sources

Reply

or to participate.