As the Droidrun team undertakes the challenge of building an agent framework capable of autonomously navigating mobile apps through their real UI structure, a pivotal obstacle has emerged: the absence of a publicly accessible dataset that captures real Android app UI hierarchies, screen flows, and associated metadata. This gap in data not only hinders Droidrun but creates friction for the broader community working to empower large language models with a more grounded, context-rich understanding of mobile interfaces.
The founder poses a key question to the maker and developer community: would a comprehensive dataset aggregating real-world app UI trees, screen transitions, component types, and contextual metadata enable ´vibe coding´ agents to generalize more effectively across varied applications? Despite recent advancements, developing agents that intuitively interact with mobile UIs still requires significant manual tuning and repetition, as current models rely heavily on prompts and heuristics to ´feel right´ when navigating distinct app experiences. The lack of shared data keeps teams siloed and slows progress.
Envisioning a curated repository that spans categories like shopping, social, finance, and utilities, complete with detailed structural metadata—such as buttons, lists, inputs, navigation flows, and UX patterns—the author invites feedback: would access to such a dataset reduce the time spent on prompt tuning or help achieve more consistent agent alignment? Or, conversely, is the effort to amass this data unlikely to move the needle on reliable agent behavior? Community members are encouraged to share reflections, frustrations, and past experiences, potentially shaping the future of large language model-driven automation in mobile app contexts.
