HomeAIHugging Face sets a health task baseline for generative AI.

Hugging Face sets a health task baseline for generative AI.

April 19, 2024

Hugging Face Sets A Health Task Baseline For Generative Ai. 16

Healthcare settings are introducing generative AI models, sometimes before they are fully prepared. Those who are quick to embrace new technology are confident that it will lead to improved productivity and a deeper understanding of data. On the other hand, critics highlight the flaws and biases in these models that may potentially lead to poorer health outcomes.

Is there a way to determine a model’s effectiveness or potential drawbacks when it comes to tasks like summarizing patient records or answering health-related questions?

Hugging Face, the AI startup, introduces a solution in a recently launched benchmark test known as Open Medical-LLM. Developed in collaboration with researchers from the nonprofit Open Life Science AI and the University of Edinburgh’s Natural Language Processing Group, Open Medical-LLM seeks to establish a standardized approach for assessing the effectiveness of generative AI models across various medical tasks.

Hugging Face Sets A Health Task Baseline For Generative Ai. 18 — Hugging Face Sets A Health Task Baseline For Generative Ai. 20

Open Medical-LLM is not created from scratch but rather a compilation of existing test sets like MedQA, PubMedQA, MedMCQA, and others. It aims to evaluate models for their understanding of general medical knowledge and related fields, including anatomy, pharmacology, genetics, and clinical practice. The benchmark includes a variety of questions that test your medical reasoning and understanding. These questions are based on material from U.S. and Indian medical licensing exams, as well as college biology test question banks.

“[Open Medical-LLM] empowers researchers and practitioners to assess the merits and drawbacks of various approaches, propel further progress in the field, and ultimately enhance patient care and outcomes,” stated Hugging Face in a blog post.

Hugging Face Sets A Health Task Baseline For Generative Ai. 19 — Hugging Face Sets A Health Task Baseline For Generative Ai. 21

Hugging Face presents the benchmark as a comprehensive evaluation of healthcare-focused generative AI models. However, a few medical experts on social media advised against relying too heavily on Open Medical-LLM, as it could potentially result in uninformed implementations.

On X, Liam McCoy, a resident physician in neurology at the University of Alberta, highlighted the significant disparity between the artificial setting of medical question-answering and real-life clinical practice.

Clémentine Fourrier, a Hugging Face research scientist who also contributed to the blog post, concurred.

“These leaderboards can serve as an initial guide for determining which generative AI model to explore for a specific use case. However, it is important to conduct thorough testing to truly understand the model’s limitations and relevance in real-world conditions,” Fourrier responded on X. Patients should not solely rely on medical models; instead, medical professionals should train them to serve as helpful tools.

It reminds me of Google’s attempt to introduce an AI screening tool for diabetic retinopathy to healthcare systems in Thailand.

Google developed a deep learning system that analyzed eye images to detect signs of retinopathy, a major contributor to vision impairment. However, despite its high theoretical accuracy, the tool proved to be impractical during real-world testing. This led to frustration among both patients and nurses due to inconsistent results and a lack of alignment with on-the-ground practices.

It’s interesting to note that out of the 139 AI-related medical devices approved by the U.S. Food and Drug Administration so far, none of them utilize generative AI. Testing the performance of a generative AI tool in a controlled environment is challenging enough, but the real question is how it will fare in real-world settings like hospitals and outpatient clinics. Equally important is understanding how the outcomes will evolve over time.

It is important to note that Open Medical-LLM is indeed useful and informative. The results leaderboard, if nothing else, is a stark reminder of the inadequate performance of models in addressing fundamental health inquiries. However, Open Medical-LLM, or any other benchmark for that matter, cannot replace the importance of thorough real-world testing.

April 19, 2024

byAlex Harper

Add a comment Add a comment

Meta adds its Llama 3-powered AI chatbot to its app search bar.

April 18, 2024

Internet users are increasing younger; the UK is considering if AI can safeguard them.

April 19, 2024

Recommended for You

Internet users are increasing younger; the UK is considering if AI can safeguard them.

byAlex Harper

With the help of General Catalyst, Langdock gets $3M to help businesses escape being locked into one provider for LLMs.

byAlex Harper

Webflow buys Intellimize to add personalized webpages driven by AI

byAlex Harper

The conversational AI platform Parloa has raised $66 million.

byAlex Harper

Startup for AI workload management is acquired by Nvidia. Run:ai for $700M, according to sources.

byAlex Harper

SafeBase automates software security evaluations using AI.

byAlex Harper

Now renamed Q Developer, Amazon CodeWhisperer is growing in capabilities.

byAlex Harper

Pinterest claims that its collages driven by AI are now more interesting than Pins.

byAlex Harper

Postjer Group – Creating Amazing Digital Creations for the Future of the Web

Meta will allow third-party applications to call WhatsApp and Messenger in 2027.

Roblox gives artists new ways to make money and hints at a creative AI project.

Google is accused temporarily of antitrust in the UK for “self-preferencing” its ad exchange.

Karo is a task management application that allows you to allocate work to your acquaintances and relatives.

Meta will allow third-party applications to call WhatsApp and Messenger in 2027.

Roblox gives artists new ways to make money and hints at a creative AI project.

Hugging Face sets a health task baseline for generative AI.

Leave a Reply Cancel reply

Meta adds its Llama 3-powered AI chatbot to its app search bar.

Internet users are increasing younger; the UK is considering if AI can safeguard them.

Recommended for You

Internet users are increasing younger; the UK is considering if AI can safeguard them.

With the help of General Catalyst, Langdock gets $3M to help businesses escape being locked into one provider for LLMs.

Webflow buys Intellimize to add personalized webpages driven by AI

The conversational AI platform Parloa has raised $66 million.

Startup for AI workload management is acquired by Nvidia. Run:ai for $700M, according to sources.

SafeBase automates software security evaluations using AI.

Now renamed Q Developer, Amazon CodeWhisperer is growing in capabilities.

Pinterest claims that its collages driven by AI are now more interesting than Pins.

Keep Up to Date with the Most Important News

Meta will allow third-party applications to call WhatsApp and Messenger in 2027.

Roblox gives artists new ways to make money and hints at a creative AI project.

Google is accused temporarily of antitrust in the UK for “self-preferencing” its ad exchange.

Karo is a task management application that allows you to allocate work to your acquaintances and relatives.

Meta will allow third-party applications to call WhatsApp and Messenger in 2027.

Roblox gives artists new ways to make money and hints at a creative AI project.

Keep Up to Date with the Most Important News

Hugging Face sets a health task baseline for generative AI.

Keep Up to Date with the Most Important News

Leave a Reply Cancel reply

Meta adds its Llama 3-powered AI chatbot to its app search bar.

Internet users are increasing younger; the UK is considering if AI can safeguard them.

Recommended for You

Internet users are increasing younger; the UK is considering if AI can safeguard them.

With the help of General Catalyst, Langdock gets $3M to help businesses escape being locked into one provider for LLMs.

Webflow buys Intellimize to add personalized webpages driven by AI

The conversational AI platform Parloa has raised $66 million.

Startup for AI workload management is acquired by Nvidia. Run:ai for $700M, according to sources.

SafeBase automates software security evaluations using AI.

Now renamed Q Developer, Amazon CodeWhisperer is growing in capabilities.

Pinterest claims that its collages driven by AI are now more interesting than Pins.