Boris Katz: Telling Machines about the World, and Daniel Harari: Innate Mechanisms and Learning: Developing Complex Visual Concepts from Unlabeled Natural Dynamic Scenes
Topics: (Boris Katz) Limitations of recent AI successes (Goggles, Kinect, Watson, Siri); brief history of computer vision system performance; scene understanding tasks: object detection, verification, identification, categorization, recognition of activities or events, spatial and temporal relationships between objects, explanation (e.g. what past events caused the scene to look as it does?), prediction (e.g. what will happen next?), filling in gaps in objects and events; enhancing computer vision systems by combining vision and language processing (e.g. creating a knowledge base about objects for the scene recognition system and testing performance with natural language questions); overview of the START system: syntactic analysis producing parse trees, semantic representation using ternary expressions, language generation, matching ternary expressions and transformational rules, replying to questions, object-property-value data model, decomposition of complex questions into simpler questions; recent progress on understanding and describing simple activities in video (Daniel Harari) Supervised vs. unsupervised learning; lack of feasibility to have labeled training data for all visual concepts; toward social understanding: hand recognition and following gaze direction; toward scene understanding: object segmentation and containment; detecting “mover” events as a pattern of interaction between a moving hand and an object (co-training by appearance and context); using mover events to generate training data for a kNN classifier to determine the direction of gaze; model for object segmentation using common motion and motion discontinuity; learning the concept of containment