Skip to content

Latest commit

 

History

History
79 lines (51 loc) · 4.65 KB

publications.md

File metadata and controls

79 lines (51 loc) · 4.65 KB
layout permalink
default
/publication/

Research & Services

Services

Served as Program Committee Member/Invited Reviewer at some of the leading conferences in Machine Learning:

Publications

  • Arxiv 2024

    • Streaming on-device detection of device directed speech from voice and touch-based invocation

      Akanksha Bindal, Sudarshan Ramanujam, Dave Golland, TJ Hazen, Tina Jiang, Fengyu Zhang, Peng Yan. IMPROVED CONTENT UNDERSTANDING WITH EFFECTIVE USE OF MULTI-TASK CONTRASTIVE LEARNING. 2024

    • Paper

  • ARXIV 2024

    • Streaming on-device detection of device directed speech from voice and touch-based invocation

      Oggi Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, and Sachin Kajarekar. Streaming ondevice detection of device directed speech from voice and touch-based invocation. 2022

    • Paper

  • ICASSP 2020

    • Generating natural questions from images for multimodal assistants

      Alkesh Patel, Akanksha Bindal, Hadas Kotek, Christopher Klein, and Jason Williams. 2020. Generating Natural Questions from Images for Multimodal Assistants. arXiv:2012.03678 [cs] (nov 2020).

    • Paper

Datasets

  • Apple Visual Question Generation Dataset [Released: May 2019]

    • Generating natural, diverse, and meaningful questions from images is an essential task for multimodal assistants as it confirms whether they have understood the object and scene in the images properly. The research in visual question answering (VQA) and visual question generation (VQG) is a great step. However, this research does not capture questions that a visually-abled person would ask multimodal assistants. Recently published datasets such as KB-VQA, FVQA, and OK-VQA try to collect questions that look for external knowledge which makes them appropriate for multimodal assistants. However, they still contain many obvious and common-sense questions that humans would not usually ask a digital assistant. In this paper, we provide a new benchmark dataset that contains questions generated by human annotators keeping in mind what they would ask multimodal digital assistants. Large scale annotations for thousands of images are expensive and time-consuming, so we also present an effective way of automatically generating questions from unseen images. In this paper, we present an approach for generating diverse and meaningful questions that consider image content and metadata of image (e.g., location, associated keyword). We evaluate our approach using standard evaluation metrics such as BLEU, METEOR, ROUGE, and CIDEr to show the relevance of generated questions with human-provided questions. We also measure the diversity of generated questions using generative strength and inventiveness metrics. We report new state-of-the-art results on the public and our datasets.

    • Please cite these articles if you use the dataset (click to reveal the bibtex)
      @INPROCEEDINGS{9413599,
      

    author={Patel, Alkesh and Bindal, Akanksha and Kotek, Hadas and Klein, Christopher and Williams, Jason}, booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={Generating Natural Questions from Images for Multimodal Assistants}, year={2021}, volume={}, number={}, pages={2270-2274}, keywords={Visualization;Annotations;Conferences;Signal processing;Benchmark testing;Knowledge discovery;Acoustics;Multimodal assistant;computer vision;visual question generation;long-short-term memory}, doi={10.1109/ICASSP39728.2021.9413599}}

    
    
    </details>
    
    

Missing Citations