K Health boosts its Artificial Intelligence physician using Gemma 3 and clinical data

K Health migrated its Artificial Intelligence physician to Gemma 3 on Vertex AI to create a more natural, clinically grounded intake chat while sharply cutting training and inference costs. Fine-tuning smaller Gemma models with decision-focused data delivered higher business scores than larger, domain-specific systems.

K Health is expanding its virtual primary care platform by upgrading an Artificial Intelligence physician that supports licensed clinicians across urgent care, chronic conditions, medical weight loss, and mental health. The organization moved its existing model to Gemma 3 hosted on Google Cloud’s Vertex AI, targeting a more conversational, empathetic, and professional intake experience while also reducing inferencing costs. The development strategy focused on the idea that a smaller, well-tuned model can outperform larger ones when trained to internalize decision-making logic instead of merely generating content.

After evaluating Llama and other open models, the team selected Gemma 3 on Vertex AI as the best balance of computational performance and cost. Engineers set up a structured procurement flow for multi-node (16) H100 GPU clusters and built reusable scripts to streamline training and inference across Gemma 3 4B, 12B, and 27B parameter variants, along with the MedGemma 27B model tailored for medical use. Using direct preference optimization, they generated 10 synthetic chats for each case, scored them on medical accuracy, conversational coherence, and clinical outcomes such as referrals, lab tests, or prescriptions, and then used a mix of best and worst conversations to teach the model the logic behind effective patient interactions. Gemma 3 4B showed a business score improvement from 0.48 to 0.76 with 10 epochs, while Gemma 3 12B reached a score of 0.81 after 20 training epochs, and MedGemma 27B recorded a business score of 0.71 but with higher inference cost.

The team ultimately highlighted Gemma 3 4B as the most successful configuration, validating the hypothesis that a smaller, general-purpose model fine-tuned with high-quality decision data can surpass a larger, domain-specific model in this setting. They adopted Axolotl Artificial Intelligence with Accelerate on a custom multi-node virtual machine as the optimal training stack, cutting training time by 66%, from 4.5 hours to just 1.5 hours. Techniques such as gradient checkpointing and 8-bit precision helped control memory use and prevent overfitting, achieving 90-95% accuracy under the chosen configuration. A self-reflection mechanism allowed the model to check its own outputs for factual consistency and conversational flow, which reduced the average number of API calls per chat from 100 to 60. Combined with Gemma’s lower inferencing costs, these gains produced substantial savings and yielded an intake system that K Health characterizes as significantly more natural, efficient, and conversational for clinical use.

55

Impact Score

EU Artificial Intelligence Act amendments delay some deadlines and add new bans

A provisional Digital Omnibus on Artificial Intelligence would push back several EU Artificial Intelligence Act deadlines, refine how the law interacts with sector rules, and introduce new prohibited practices. The package also expands limited bias-testing allowances and strengthens centralized oversight for some high-impact systems.

Qwen 3.5 raises concerns about censorship embedded in model weights

A technical analysis of Alibaba Cloud’s Qwen 3.5 points to political censorship circuits embedded directly in the model’s learned weights. The findings highlight operational, compliance, and product risks for startups building on third-party Artificial Intelligence models.

Laptop prices rise as memory shortages hit PCs

Laptop prices are climbing as memory makers redirect production toward data center demand driven by Artificial Intelligence. The squeeze is spreading beyond RAM to graphics memory and SSDs, raising costs across the PC market.

Artificial Intelligence models split on job disruption estimates

A new working paper finds that leading Artificial Intelligence models give sharply different answers when asked which jobs they are most likely to disrupt. The findings raise doubts about using model-generated exposure scores to guide labor policy or economic analysis.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.