SCEM Seminar - Multimodal news article analysis
- Event Name
- SCEM Seminar - Multimodal news article analysis
- Date
- 8 September 2017
- Time
- 10:30 am - 11:30 am
- Location
- Campbelltown Campus; Penrith (Kingswood) Campus; Parramatta Campus
Address (Room): Presented from Parramatta (Room EB.1.32), accessible from Campbelltown (Room 26.1.50) and Penrith (Room Y.2.39)
- Description
- Abstract: Current approaches lying in the intersection of computer vision and NLP have achieved unprecedented breakthroughs in tasks like automatic captioning or image retrieval. Most of these methods, though, rely on training sets of images associated with annotations that specifically describe the visual content. This seminar proposes going a step further and explores more complex cases where textual descriptions are loosely related to images. We focus on the particular domain of News. We introduce new deep learning methods that address source and popularity prediction, article illustration, and article geolocation. An adaptive CNN is proposed, that shares most of the structure for all tasks, and is suitable for multitask and transfer learning. Deep CCA is deployed for article illustration, and a new loss function based on Great Circle Distance is proposed for geolocation. Furthermore, we present BreakingNews, a novel dataset with approximately 100K news articles including images, text, captions, and enriched with heterogeneous meta-data. BreakingNews allows exploring all aforementioned problems, for which we provide baseline performances using various CNN architectures, and different representations of the textual and visual features. We report promising results and bring to light several limitations of current state-of-the-art, which we hope will help spur progress in the field. Biography: Arnau Ramisa received the MSc degree in computer science (computer vision) from the Autonomous University of Barcelona (UAB) in 2006, and in 2009 completed a PhD at the Artificial Intelligence Research Institute (IIIA-CSIC) and the UAB. Between 2009 and 2011, he was a postdoctoral fellow at the LEAR team in INRIA Grenoble / Rhone-Alpes, and between 2011 and 2015 a research fellow at the Institut de Robòtica i Informàtica Industrial in Barcelona (IRI). Since 2015 he is working as a computer vision researcher at Wide Eyes Tech. His research interests include object classification and detection, image retrieval, robot vision and natural language processing.
Speakers: Dr Arnau Ramisa
- Contact
-
Name: Teresa Cheong
Phone: 9685 9408
School / Department: School of Computing, Engineering and Mathematics
Mobile options: