As mentioned in my previous article, we have integrated ChatGPT in our Atlas product. This was both challenging and fun. The OpenAI API is well documented, so is the Azure OpenAI service. However, one of the things that is not completely documented, is the Stream option. This allows you to call the API and get the response as a stream of data, so you are getting data from the very first second, and you don´t have to wait for the entire response. Note that depending on your request, ChatGPT can take some seconds until you get the response, so from a UX point of view, it is potentially a better experience if you start showing data while the response is fully completed.
Let´s see how we can do that.
First, when calling OpenAI and Azure OpenAI APIs, we can use the “stream” parameter in the request body, to tell the API we want to get the data as a stream, instead of waiting for the full response. The Azure OpenAI stream parameter is documented as:
We have a bit more explanation in the OpenAI documentation:
As we can see, the data is sent as “server-sent-events”. As per Mozilla documentation here (Server-sent events - Web APIs | MDN (mozilla.org)), this is: “With server-sent events, it's possible for a server to send new data to a web page at any time, by pushing messages to the web page. These incoming messages can be treated as Events + data inside the web page.”
In the same Mozilla documentation, we have an example on how to request data using server-sent events, and how to treat the response:
Pretty simple, right? … Well, not really. This EventSource object (the standard provided by the browser), only accepts a GET request, and we need a POST request with a bunch of parameters in the body. To achieve this, we have a library provided by Microsoft, called:
@MICROSOFT/FETCH-EVENT-SOURCE
With that library, we can do the POST request, and process the data like this:
We call the “fetchEventSource” function from the library, passing the entire HTTP request: method, headers, body…) and then, the methods for the event handlers. The main method is the “onmessage”, that is fired when the server sends a partial chunk of the response. In the snippet above, when the event is fired, we get the partial response and update the React component state, so the UI is updated with that partial data. Here is it in action:
The screenshot below shows how it looks in DevTools, so you can see how the stream of data is sent back from the server:
Hope it helps!