Skip to main content

Digital Trends may earn a commission when you buy through links on our site. Why trust us?

All RTX GPUs now come with a local AI chatbot. Is it any good?

A window showing Nvidia's Chat with RTX.
Nvidia

It’s been difficult to justify packing dedicated AI hardware in a PC. Nvidia is trying to change that with Chat with RTX, which is a local AI chatbot that leverages the hardware on your Nvidia GPU to run an AI model.

It provides a few unique advantages over something like ChatGPT, but the tool still has some strange problems. There are the typical quirks you get with any AI chatbot here, but also larger issues that prove Chat with RTX needs some work.

Recommended Videos

Meet Chat with RTX

Here’s the most obvious question about Chat with RTX: How is this different from ChatGPT? Chat with RTX is a local large language model (LLM). It’s using TensorRT-LLM compatible models — Mistral and Llama 2 are included by default — and applying them to your local data. In addition, the actual computation is happening locally on your graphics card, rather than in the cloud. Chat with RTX requires an Nvidia RTX 30-series or 40-series GPU and at least 8GB of VRAM.

Get your weekly teardown of the tech behind PC gaming
Check your inbox!

A local model unlocks a few unique features. For starters, you load your own data into Chat with RTX. You can put together a folder full of documents, point Chat with RTX to it, and interact with the model based on that data. It offers a closer level of specificity, allowing the model to provide information on detailed documents rather than the more generic answers you see with something like Bing Chat or ChatGPT.

Chat with RTX answering a question about frame interpolation.
Jacob Roach / Digital Trends

And it works. I loaded up a folder with a range of research papers detailing Nvidia’s DLSS 3, AMD’s FSR 2, and Intel’s XeSS and asked some specific questions about how they’re different. Rather than scraping the internet and rewording an article explaining the differences — a common tactic for something like Bing Chat — Chat with RTX was able to provide detailed responses based on the actual research papers.

I wasn’t shocked that Chat with RTX was able to pull information out of some research papers, but I was shocked that it was able to distill that information so well. The documents I provided were, well, research papers, filled with academic speak, equations that will make your head spin, and references to details that aren’t explained in the paper itself. Despite that, Chat with RTX broke down the papers into information that was easy to understand.

You can also point Chat with RTX toward a YouTube video or playlist, and it will take down information from the transcripts. The pointed nature of the tool is what really shines, aa it allows you to focus the session in a single direction rather than ask questions about anything like you would with ChatGPT.

The other upside is that everything happens locally. You don’t have to send your queries to a server, or upload your documents and fear that they’ll be used to train the model further. It’s a streamlined approach to interacting with an AI model — you use your data, on your PC, and ask the questions you need to without any concerns about what’s happening on the other side of the model.

There are some downsides to the local approach of Chat with RTX, however. Most obviously, you’ll need a powerful PC packing a recent Nvidia GPU and at least 8GB of VRAM. In addition, you’ll need around 100GB of free space. Chat with RTX actually downloads the models it uses, so it takes up quite a bit of disk space.

Hallucinations

You didn’t think Chat with RTX would be free of issues, did you? As we’ve come to see with just about every AI tool, there’s a certain tolerance for flat-out incorrect responses from the AI, and Chat with RTX isn’t above that. Nvidia provides a sampling of recent Nvidia news articles with a new installation, and even then, the AI wasn’t always on the money.

Chat with RTX answering a question about Counter-Strike 2.
Jacob Roach / Digital Trends

For example, above you can see that the model said Counter-Strike 2 supports DLSS 3. It does not. I can only assume the model made some sort of connection between the DLSS 3.5 article it references and another article in the included dataset that mentions Counter-Strike 2. 

Chat with RTX answering a question about latency.
Jacob Roach / Digital Trends

The more pressing limitation is that Chat with RTX only has the sample data to go on. This leads to some weird situations where bias within the small dataset would lead to incorrect answers. For example, you can see above how the model says in one response that DLSS Frame Generation doesn’t introduce additional latency to gameplay, while in the next response, it says frame interpolation introduces additional latency to gameplay. DLSS Frame Generation uses frame interpolation.

Chat with RTX answering a question about Nvidia Reflex.
Jacob Roach / Digital Trends

In another response (above), Chat with RTX said that DLSS 3 doesn’t require Nvidia Reflex to work, and that’s not true. Once again, the model is going off of the data I provided, and it’s not perfect. It’s a reminder that an AI model can be incorrect with a straight face, even when it has a narrow focus like Chat with RTX allows.

I expected some of these oddities, but Chat with RTX still managed to surprise me. At various points in different sessions, I would ask a random question completely unrelated to the data I provided. In most situations, I would get a response noting that there’s not enough information for the model to go on to provide an answer. Makes sense.

Chat with RTX answering a question about tying shoelaces.
Jacob Roach / Digital Trends

Except in one situation, the model provided an answer. Using the default data, I asked it how to tie a shoelace, and the model provided step-by-step instructions and referenced an Nvidia blog post about ACE (Nvidia notes this prerelease version occasionally gets the reference files incorrect). When I asked again immediately after, it provided the same standard response about lacking context information.

I’m not sure what’s going on here. It could be that there’s something in the model that allows it to answer this question, or it might be pulling the details from somewhere else. Regardless, it’s clear Chat with RTX isn’t just using the data you provide to it. It has the capability, at least, to get information elsewhere. That became even more clear once I started asking about YouTube videos.

The YouTube incident

One of the interesting aspects of Chat with RTX is that it can read transcripts from YouTube videos. There are some limitations to this approach. The key is that the model only ever sees the transcript, not the actual video. If something happens within the video that’s not included in the transcript, the model never sees it. Even with that limitation, it’s a pretty unique feature.

I had a problem with it, though. Even when starting a completely new session with Chat with RTX, it would remember videos I had linked previously. That shouldn’t happen, as Chat with RTX isn’t supposed to remember the context of your current or previous conversations.

Chat with RTX answering a question about a YouTube video.
Jacob Roach / Digital Trends

I’ll walk through what happened because it can get a little hairy. In my first session, I linked to a video from the YouTube channel Commander at Home. It’s a Magic: the Gathering channel, and I wanted to see how Chat with RTX would respond to a complex topic that’s not explained in the video. It unsurprisingly didn’t do well, but that’s not what’s important.

I removed the old video and linked to an hourlong interview with Nvidia’s CEO Jensen Huang. After entering the link, I clicked the dedicated button to rebuild the database, basically telling Chat with RTX that we’re chatting about new data. I started this conversation out the same as I did in the previous one by asking, “what is this video about?” Instead of answering based on the Nvidia video I linked, it answered based on the previous Commander at Home video.

Chat with RTX answering a question about a YouTube video.
Note that the URL in this screenshot is different from the previous screenshot. Jacob Roach / Digital Trends

I tried rebuilding the database three more times, always with the same result. Eventually, I started a brand new session, completely exiting out of Chat with RTX and starting fresh. Once again, I linked the Nvidia video and downloaded the transcript, starting off with asking what the video was about. It again answered about the Commander at Home video.

Chat with RTX answering a question about a YouTube video.
Jacob Roach / Digital Trends

I was only able to get Chat with RTX to answer about the Nvidia video when I asked a specific question about that video. Even after chatting for a bit, any time I asked what the video was about, it’s answer would relate to the Commander at Home video. Remember, in this session, Chat with RTX never saw that video link.

Nvidia says this is because Chat with RTX reads all of the transcripts you’ve downloaded. They’re stored locally in a folder, and it will continue answering questions about all the videos you’ve entered, even when you start a new session. You have to delete the transcripts manually.

In addition, Chat with RTX will struggle with general questions if you have multiple video transcripts. When I asked what the video was about, Chat with RTX decided that I was asking about the Commander at Home video, so that was the video it referenced. It’s a little confusing, but you’ll need to manually choose the transcripts you want to chat about, especially if you’ve entered YouTube links previously.

You find the usefulness

If nothing else, Chat with RTX is a demonstration of how you can leverage local hardware to use an AI model, which is something that PCs have sorely lacked over the past year. It doesn’t require a complex setup, and you don’t need to have deep knowledge of AI models to get started. You install it, and as long as you’re using a recent Nvidia GPU, it works.

It’s hard to pin down how useful Chat with RTX is, though. In a lot of cases, a cloud-based tool like ChatGPT is strictly better due to the wide swath of information it can access. You have to find the usefulness with it. If you have a long list of documents to parse, or a stream of YouTube videos you don’t have the time to watch, Chat with RTX provides something you won’t find with a cloud-based tool — assuming you respect the quirks inherent to any AI chatbot.

This is just a demo, however. Through Chat with RTX, Nvidia is demonstrating what a local AI model can do, and hopefully it’s enough to gather interest from developers to explore local AI apps on their own.

Jacob Roach
Lead Reporter, PC Hardware
Jacob Roach is the lead reporter for PC hardware at Digital Trends. In addition to covering the latest PC components, from…
Nearly two years later, AMD’s RX 7000 GPUs don’t even make up 1% of Steam players
RX 7900 XTX and RX 7900 XT on a pink background.

AMD's latest RX 7000 GPUs may be some of the best graphics cards you can buy, but they aren't popular among gamers, at least according to the latest Steam hardware survey. Only one of AMD's RDNA 3 graphics cards even shows up on the survey, with the RX 7900 XTX occupying just 0.37% -- down by 0.03% compared to last month.

It's worth noting that Steam doesn't list every GPU represented in the hardware survey each month, but it at least lists every GPU that represents a decent chunk of players. For context, the lowest-ranking GPU on the list is AMD's RX 5500 XT at just 0.16% of players. Other RX 7000 GPUs like the excellent RX 7900 GRE are likely represented further down, though with a share of only one-tenth of 1% or less.

Read more
The RTX 5090 will reportedly require 600 watts of power
The back of the Nvidia RTX 4080 Super graphics card.

Rumors have been circulating about the RTX 5090 for a while, but we're finally getting a clearer picture of how Nvidia's flagship RTX 50-series GPU is shaping up. Well-known hardware leaker Kopite7kimi is claiming the RTX 5090, which seems destined to earn a spot among the best graphics cards, will come with an obscene power requirement of 600 watts.

The leaker shared specs for the RTX 5090 and RTX 5080 on X (formerly Twitter). We've seen vague claims from Kopite7kimi in the past concerning RTX 50-series graphics cards, but this is the first time the leaker has shared full specs. According to the leak, the RTX 5090 will require 600W of power and come with a staggering 32GB of GDDR7 memory across a 512-bit bus.

Read more
Z by HP Boost slices up GPUs to speed up AI workloads
The Nvidia GPU shown inside the HP Z2 G9.

HP is trying to make AI development, training, and deployment easier with Z by HP Boost. Instead of deploying several expensive workstations, or trying to share one full workstation across a team, Z by HP Boost gives direct, instant access to the GPU resources of a workstation.

We haven't heard a ton of details about the service yet. HP says it will be available in early 2025 for the U.S. and U.K., and it'll share pricing closer to when the service becomes available. It sounds like Z by HP Boost is using local GPU resources, however. Earlier this month, Lenovo announced its TruScale GPUaaS (GPUs as a service), which allows enterprise customers to scale up GPU resources on-demand through a cloud-based, pay-as-you-go model. Z by HP Boost sounds like something different.

Read more