Ready or not, gen AI is here, and it’s in your hands. ChatGPT took the world by storm and remains popular despite competition from heavy hitters such as Google, Samsung and Meta. AI tools are being built into web browsers including Microsoft Bing, phones such as the Galaxy S24 and even cars, including the VW Golf. If there’s a task you want done, chances are there’s an AI assistant to help.
And now there are CNET reviews to help you decide which AI to use and what to expect. Our editors are testing AI chatbots, image generators and other AI hands-on to figure out their strengths and weaknesses. Our goal: to help guide you as you decide which will work best for you.
To perform the testing, we use the generative AI chatbots, photo generators and other AI tools we’re reviewing, just as we use a phone to review it. But the reviews themselves, like CNET’s other hands-on reviews, are written by our human team of in-house experts. For more, check out CNET’s AI policy.
Current reviews of AI products and services on CNET are broken down into the following categories. As our reviews evolve, we plan to add more.
No matter the tool or service, our reviews try to answer the same basic question: How good is it relative to the competition and which purposes does it serve best? In any CNET review, we’ll report key information that you’ll need to know, including:
We score each AI we review on a scale of 1 to 10, with 10 being the best. We consider factors such as accuracy, creativity of responses, number of hallucinations, and response speed. This rating is based on our reviewer’s first-hand experience using the test methodology outlined below.
As “everything engines,” gen AI tools like ChatGPT don’t lend themselves to many quantitative, labs-based tests, like battery life for phones or brightness for TVs. Instead, our evaluations are largely based on hands-on experience during the test phase, during which our reviewers ask questions and set tasks before the AI and judge the responses and process.
Our evaluations aim to answer the following questions:
Beyond getting a general sense of what it’s like to use the AI, we also test specific tasks and use cases. To account for accuracy or hallucinations, we spot-check facts and report any erroneous information we find. Our reviewers make sure to test the chatbots on topics that they personally know well. For example, one reviewer asked ChatGPT to suggest a recipe for chicken tikka masala — a dish he knows well from cooking and eating it over many years.
Test prompts may include, but aren’t limited to:
In the reviews, we report on specific prompts (what we input) and responses (what the AI outputs), but we also want to keep our tests relatively open-ended, evolve our methodology over time and prevent the AI from “learning” how we test it. For that reason we’re not listing specific prompts here.
Generative AI services can also take your written descriptions and use them to create images. As with chatbots, our reviews of these services are largely subjective and based on the reviewer’s hands-on experience. Our evaluations of AI text-to-image generators evaluations aim to answer the following questions:
As with our testing of chatbots, test prompts will be varied but might include things like:
For AI tools that are neither chatbots nor text-to-image generators, our testing will be tailored to suit the tool. We’ll strive to determine how good the AI is at performing the task it promises to assist, and to call out how beneficial, or not, AI is in helping complete that task.
A review of Otter AI, an audio transcription and note-taking service, focuses on how well the features like gen AI chat and automatic meeting summaries work compared to conventional methods. Our review of Grammarly, a service designed to assist writers, evaluates how well it responds to prompts and whether its AI-suggested revisions like “shorten it” and “improve it” actually help the process.
We can’t test everything, and we don’t try to. There are plenty of areas that lie outside the scope of our current AI tests. They include:
Resistance to abuse: We don’t perform tests designed to cause AIs to deliver illegal, harmful, abusive, discriminatory, biased or copyrighted information.
Current events: Because AIs are trained on large sets of data that aren’t necessarily recent, we don’t quiz all chatbots and other assistants on recent “in the news” events.
Outcomes for AI recommendations: As part of our reviews process, we don’t commit to evaluating all the AI’s responses and suggestions in depth. We don’t cook and taste-test recipes, for example, nor can we take the trips suggested by an itinerary.
Multiple answers: In general we rely on the first reply provided by an AI for our reviews because that’s how most people typically behave. In some instances, we might run the same query multiple times to compare the results, but that’s not the norm.
Generative AI is still a new consumer product, so think of these reviews as version 1.0. In the last year, AI chatbots and other tools have evolved significantly, more options have entered the market and numerous models, sets of training data and AI-driven devices have debuted. We expect that evolution to continue and our AI reviews to grow and expand as well. As AI becomes more familiar and ingrained in our lives, humans at CNET will explain, review and rate them for other humans’ benefit.
https://www.cnet.com/rss/all/
CNET staff
Actor Scarlett Johansson has accused OpenAI of copying her voice for one of the voice…
https://www.youtube.com/watch?v=z27X11-gfDc
https://www.youtube.com/watch?v=PG4q0NweDrg
Apple's latest iPad Air and iPad Pro 2024 officially hit the market last week and…