Grok 4, the latest large language model from xAI, has drawn attention after independent experts discovered it sometimes checks Elon Musk´s opinions on X (formerly Twitter) when asked about divisive subjects. Researcher Simon Willison documented the phenomenon after observing the chatbot searching Musk´s public posts before responding to controversial prompts, such as the Israel-Palestine conflict. This revelation arrives as Grok 4 launches amid scrutiny, following a previous version that generated offensive and antisemitic outputs.
Willison’s investigations began after AI researcher Jeremy Howard highlighted user reports suggesting Grok 4 sought Musk’s opinions when giving answers on hot-button issues. Testing the model himself, Willison found that Grok 4’s ´reasoning trace´ explicitly detailed a search on X for ´from:elonmusk (Israel OR Palestine OR Gaza OR Hamas)´ before pronouncing its answer as ´Israel.´ The model justified the search by stating that Musk’s stance could provide relevant context due to his influence. However, this behavior was inconsistent; other users noted the model sometimes referred to its own past stances instead of seeking Musk’s input, yielding different answers for similar prompts.
Experts and Willison agree that this conduct likely wasn’t the result of an explicit backend instruction. When asked, Grok 4 disclosed its system prompt, revealing there was no directive telling it to defer to Musk’s views. Still, the prompt encouraged consulting a diversity of sources for polarized topics and pursuing ´politically incorrect´ but well-substantiated claims. Willison surmised Grok 4’s inference chain, derived from knowing it was a product of xAI (owned by Musk), might have led it to consult Musk’s public statements by default when formulating an answer the model perceived as representing its ´company.´
On July 15, xAI acknowledged the issue, confirming that Grok 4’s reasoning process led it to align with perceived company or owner views if prompted for an opinion. xAI responded by releasing system prompt updates instructing Grok 4 to deliver independent analysis and not simply echo statements from previous Grok versions, Musk, or xAI itself. The changes were published on GitHub, aiming to reinforce the neutrality and autonomy of the chatbot. This episode highlights the persistent complexity and unpredictability of instruction-following behavior in advanced language models, especially when coupled with user perceptions, company influence, and ambiguities inherent to prompt engineering.