As the number and heterogeneity of appliances in smart buildings increases, identifying and controlling them becomes challenging. Existing methods face various challenges when deployed in large commercial buildings. For example, voice command assistants require users to memorize many control commands. Attaching Bluetooth dongles or QR codes to appliances introduces considerable deployment overhead. In comparison, identifying an appliance by simply pointing a smartphone camera at it and controlling the appliance using a graphical overlay interface is more intuitive. We introduce SnapLink, a responsive and accurate vision-based system for mobile appliance identification and interaction using image localization. Compared to the image retrieval approaches used in previous vision-based appliance control systems, SnapLink exploits 3D models to improve identification accuracy and reduce deployment overhead via quick video captures and a simplified labeling process. We also introduce a feature sub-sampling mechanism to achieve low latency at the scale of a commercial building. To evaluate SnapLink, we collected training videos from 39 rooms to represent the scale of a modern commercial building. It achieves a 94% successful appliance identification rate among 1526 test images of 179 appliances within 120 ms average server processing time. Furthermore, we show that SnapLink is robust to viewing angle and distance differences, illumination changes, as well as daily changes in the environment. We believe the SnapLink use case is not limited to appliance control: it has the potential to enable various new smart building applications.