In this paper, we present a promising approach to systematically testing graphical user interfaces (GUI) in a platform independent manner. Our framework uses standard computer vision techniques through a python-based scripting language (Sikuli script) to identify key graphical elements in the screen and automatically interact with these elements by simulating keypresses and pointer clicks. The sequence of inputs and outputs resulting from the interaction is analyzed using grammatical inference techniques that can infer the likely internal states and transitions of the GUI based on the observations. Our framework handles a wide variety of user interfaces ranging from traditional pull down menus to interfaces built for mobile platforms such as Android and iOS. Furthermore, the automaton inferred by our approach can be used to check for potentially harmful patterns in the interface's internal state machine such as design inconsistencies (eg,. a keypress does not have the intended effect) and mode confusion that can make the interface hard to use. We describe an implementation of the framework and demonstrate its working on a variety of interfaces including the user-interface of a safety critical insulin infusion pump that is commonly used by type-1 diabetic patients.