Short description of portfolio item number 1
Posts by Collection
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in 2015 International Conference on Computing and Network Communications (CoCoNet)
The Coordinate Rotational Digital Computer (CORDIC) algorithm allows computation of trigonometric, hyperbolic, natural log and square root functions. This iterative algorithm uses only shift and add operations to converge. Multiple fixed radix variants of the algorithm have been implemented on hardware. These have demonstrated faster convergence at the expense of reduced accuracy. High radix adaptive variants of CORDIC also exist in literature. These allow for faster convergence at the expense of hardware multipliers in the datapath without compromising on the accuracy of the results. This paper proposes a 12 stage deep pipeline architecture to implement a high radix adaptive CORDIC algorithm. It employs floating point multipliers in place of the conventional shift and add architecture of fixed radix CORDIC. This design has been synthesised on a FPGA board to act as a coprocessor. The paper also studies the power, latency and accuracy of this implementation.
Citation: S. S. Oza, A. P. Shah, T. Thokala and S. David, "Pipelined implementation of high radix adaptive CORDIC as a coprocessor," 2015 International Conference on Computing and Network Communications (CoCoNet), Trivandrum, 2015, pp. 333-342. [Paper Link]
Experiments on DCASE Challenge 2016 Acoustic Scene Classification and Sound Event Detection in Real Life Recording
Published in IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events.
In this paper we present our work on Task 1 Acoustic Scene Classification and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specific to each task. Our 14 performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the baseline of 72.6% and for Task 3 we achieved a Segment-Based Error Rate of 0.76 compared to the baseline of 0.91'
Citation: Elizalde, Benjamin, Anurag Kumar, Ankit Shah, Rohan Badlani, Emmanuel Vincent, Bhiksha Raj, and Ian Lane. "Experimentation on the DCASE challenge 2016: Task 1—Acoustic scene classification and task 3—Sound event detection in real life audio." IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events (2016). [Paper Link]
Published in 25th European Signal Processing Conference (EUSIPCO)
Audio Event Detection (AED) aims to recognize sounds within audio and video recordings. AED employs machine learning algorithms commonly trained and tested on annotated datasets. However, available datasets are limited in number of samples and hence it is difficult to model acoustic diversity. Therefore, we propose combining labeled audio from a dataset and unlabeled audio from the web to improve the sound models. The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube. Whenever the detectors recognized any of the known sounds with high confidence, the unlabeled audio was use to re-train the detectors. The performance of the re-trained detectors is compared to the one from the original detectors using the annotated test set. Results showed an improvement of the AED, and uncovered challenges of using web audio from videos
Citation: Ankit Shah, Rohan Badlani, Anurag Kumar, Benjamin Elizalde, Bhiksha Raj; An Approach for Self-Training Audio Event Detectors Using Web Data",in 25th European Signal Processing Conference (EUSIPCO), 2017 [Paper Link]
Published in Detection and Classification of Acoustic Scenes and Events 2017 Workshop
DCASE 2017 Challenge consists of four tasks: acoustic scene classification, detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ in the structure of the output layer and the decision making process, as well as the evaluation of system output using task specific metrics
Citation: A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen, “DCASE 2017 challenge setup: Tasks, datasets and baseline system,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017) , November 2017 [Paper Link]
Published in IEEE International Conference on Acoustics , Speech and Signal Processing, 2018
In this paper, we focus on the problem of content-based retrieval for audio, which aims to retrieve all semantically similar audio recordings for a given audio clip query. We propose a novel approach which encodes the audio into a vector representation using Siamese Neural Networks. The goal is to obtain an encoding similar for files belonging to the same audio class, thus allowing retrieval of semantically similar audio. We used two similarity measures, Cosine similarity and Euclidean distance, to show that our method is effective in retrieving files similar in audio content. Our results indicate that our neural network-based approach is able to retrieve files similar in content and semantics
Citation: Manocha, Pranay, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, and Bhiksha Raj. "Content-based Representations of audio using Siamese neural networks." arXiv preprint arXiv:1710.10974 (2017). [Paper Link]
Published in IEEE International Conference on Acoustics, Speech and Signal Processing, 2018
The largest source of sound events is web videos. Most videos lack sound event labels at segment level, however, a significant number of them do respond to text queries, from a match found to their metadata by the search engine. In this paper we explore the extent to which a search query could be used as the true label for the presence of sound events in the videos. For this, we developed a framework for large-scale sound event recognition on web videos. The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets. The datasets are used to train three classifiers, which were then run on 3.7 million video segments. We evaluated performance using the search query as the true label and compare it (on a subset) with human labeling. Both types exhibited close performance, to within 10%, and similar performance trends as the number of evaluated segments increased. Hence, our experiments show potential for using search query as a preliminary true label for sound events in web videos.
Citation: Badlani, Rohan, Ankit Shah, Benjamin Elizalde, Anurag Kumar, and Bhiksha Raj. "Framework for evaluation of sound event detection in web videos." arXiv preprint arXiv:1711.00804 (2017). [Paper Link]
Published in Neural Information Processing Systems (NIPS 2017)
Sounds are essential to how humans perceive and interact with the world. These 10 sounds are captured in recordings and shared on the Internet on a minute-by- 11 minute basis. These recordings, which are predominantly videos, constitute the largest archive of sounds we’ve ever seen. However, most of these recordings have undescribed content making necessary methods for automatic audio content analysis, indexing and retrieval. These methods have to address multiple challenges, such as the relation between sounds and language, numerous and diverse sound classes, and large-scale evaluation. We propose a system that continuously learns from the web relations between sounds and language, improves sound recognition models over time and evaluates its learning competency in the large-scale without references. We introduce the Never-Ending Learner of Sounds (NELS), a project for continuously learning of sounds and their associated knowledge, available on line in nels.cs.cmu.edu
Citation: Elizalde, Benjamin, Rohan Badlani, Ankit Shah, Anurag Kumar, and Bhiksha Raj. "NELS-Never-Ending Learner of Sounds." [Paper Link]
Published in Preprint and Under Review
Audio content analysis in terms of sound events is an important research problem for a variety of applications. Recently, the development of weak labeling approaches for audio or sound event detection (AED) and availability of large scale weakly labeled dataset have finally opened up the possibility of large scale AED. However, a deeper understanding of how weak labels affect the learning for sound events is still missing from literature. In this work, we first describe a CNN based approach for weakly supervised training of audio events. The approach follows some basic design principle desirable in a learning method relying on weakly labeled audio. We then describe important characteristics, which naturally arise in weakly supervised learning of sound events. We show how these aspects of weak labels affect the generalization of models. More specifically, we study how characteristics such as label density and corruption of labels affects weakly supervised training for audio events. We also study the feasibility of directly obtaining weak labeled data from the web without any manual label and compare it with a dataset which has been manually labeled. The analysis and understanding of these factors should be taken into picture in the development of future weak label learning methods. Audioset, a large scale weakly labeled dataset for sound events is used in our experiments.
Citation: Ankit Shah,Anurag Kumar, Alexander Hauptmann, Bhiksha Raj, "A Closer Look at Weak Label Learning for Audio Events", ArXiv e-prints, 2018 [Paper Link]
Published in IEEE International Workshop on Traffic and Street Surveillance for Safety and Security, 2018
This paper presents a novel dataset for traffic accidents analysis. Our goal is to resolve the lack of public data for research about automatic spatio-temporal annotations for traffic safety in the roads. Through the analysis of the proposed dataset, we observed a significant degradation of object detection in pedestrian category in our dataset, due to the object sizes and complexity of the scenes. To this end, we propose to integrate contextual information into conventional Faster R-CNN using Context Mining (CM) and Augmented Context Mining (ACM) to complement the accuracy for small pedestrian detection. Our experiments indicate a considerable improvement in object detection accuracy: +8.51% for CM and +6.20% for ACM. Finally, we demonstrate the performance of accident forecasting in our dataset using Faster R-CNN and an Accident LSTM architecture. We achieved an average of 1.684 seconds in terms of Time-To-Accident measure with an Average Precision of 47.25%.
Citation: Ankit Shah*, Jean Baptiste Lamare*, Tuan Nguyen Anh*, Alexander Hauptmann, "CADP: A Novel Dataset for CCTV Traffic Camera based Accident Analysis" international Workshop on Traffic and Street Surveillance for Safety and Security, Nov 2018. [Paper Link][Webpage]
Published in Detection and Classification of Acoustic Scenes and Events 2018
This paper presents DCASE 2018 task 4. The task evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries). The target of the systems is to provide not only the event class but also the event time boundaries given that multiple events can be present in an audio recording. Another challenge of the task is to explore the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeled training set to improve system performance. The data are Youtube video excerpts from domestic context which have many applications such as ambient assisted living. The domain was chosen due to the scientific challenges (wide variety of sounds, time-localized events.. .) and potential industrial applications
Citation: Romain Serizel, Nicolas Turpault, Hamid Eghbal-Zadeh, Ankit Parag Shah. "Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments", Detection and Classification of Acoustic Scenes and Events 2018 [Paper Link][Webpage]
Published in 2nd Proceedings of Alexa Prize (Alexa Prize 2018).
This paper describes the Tartan conversational agent built for the 2018 Alexa Prize Competition. Tartan is a non-goal-oriented socialbot focused around providing users with an engaging and fluent casual conversation. Tartan's key features include an emphasis on structured conversation based on flexible finite-state models and an approach focused on understanding and using conversational acts. To provide engaging conversations, Tartan blends script-like yet dynamic responses with data-based generative and retrieval models. Unique to Tartan is that our dialog manager is modeled as a dynamic Finite State Machine. To our knowledge, no other conversational agent implementation has followed this specific structure.
Citation: George Larionov, Zachary Kaden, Hima Varsha Dureddy, Gabriel Bayomi T. Kalejaiye, Mihir Kale, Srividya Pranavi Potharaju, Ankit Parag Shah, Alexander I Rudnicky, "Tartan: A retrieval-based socialbot powered by a dynamic finite-state machine architecture", 2nd Proceedings of Alexa Prize (Alexa Prize 2018). [Paper Link]
Published in 2019 International Joint Conference on Artificial Intelligence
In the last couple of years, weakly labeled learning for sound events has turned out to be an exciting approach for audio event detection. In this work, we introduce webly labeled learning for sound events in which we aim to remove human supervision altogether from the learning process. We first develop a method of obtaining labeled audio data from the web (albeit noisy), in which no manual labeling is involved. We then describe deep learning methods to efficiently learn from these webly labeled audio recordings. In our proposed system, WeblyNet, two deep neural networks co-teach each other to robustly learn from webly labeled data, leading to around 17% relative improvement over the baseline method. The method also involves transfer learning to obtain efficient representations.
Published in Detection and Classification of Acoustic Scenes and Events 2019
This paper presents Task 4 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge and provides a first analysis of the challenge results. The task is a follow-up to Task 4 of DCASE 2018, and involves training systems for large-scale detection of sound events using a combination of weakly labeled data, i.e.~training labels without time boundaries, and strongly-labeled synthesized data. We introduce the Domestic Environment Sound Event Detection (DESED) dataset, mixing a part of last year's dataset and an additional synthetic, strongly labeled, dataset provided this year that we describe in more detail. We also report the performance of the submitted systems on the official evaluation (test) and development sets as well as several additional datasets. The best systems from this year outperform last year's winning system by about 10\% points in terms of F-measure.
Citation: Nicolas Tarpault, Romain Serizel, Ankit Shah, Justin Salamon, "Sound event detection in domestic environments with weakly labeled data and soundscape synthesis", Detection and Classification of Acoustic Scenes and Events 2019 [Paper Link][Webpage]
Published in 21st ACM International Conference on Multimodal Interaction 2019
Suicide is one of the leading causes of death in the modern world. In this digital age, individuals are increasingly using social media to express themselves and often use these platforms to express suicidal intent. Various studies have inspected suicidal intent behavioral markers in controlled environments but it is still unexplored if such markers will generalize to suicidal intent expressed on social media. In this work, we set out to study multimodal behavioral markers related to suicidal intent when expressed on social media videos. We explore verbal, acoustic and visual behavioral markers in the context of identifying individuals at higher risk of suicidal attempt. Our analysis reveals a set of predominant multimodal behavioral markers indicative of suicidal intent on social media videos.
Citation: Ankit Shah*, Vasu Sharma*, Vaibhav Vaibhav*, Mahmoud Alismail*, Louis-Philippe Morency, "Multimodal Behavioral Markers Exploring Suicidal Intent in Social Media Videos", 21st ACM International Conference on Multimodal Interaction 2019 [Paper Link][Webpage]
Elements of Electronics and Communication - EC 110, National Institute of Technology Karnataka Surathkal, Department of Electronics and Communication, 2013
Teaching assistant for course as part of the peer mentoring program at NITK. Taught Elements of Electronics and Communication to peers between 2013-2014.
Data Structures and Algorithms - EC 232, National Institute of Technology Karnataka Surathkal, Department of Electronics and Communication, 2014
Teaching assistant for course as part of the peer mentoring program at NITK. Taught Data Structures and Algorithms to peers between 2014-2015.
Global STEM Alliance, ARM, 2017
- Unique opportunity to participate in a fast paced programme to develop research driven solution addressing pressing challenges at a global scale.
- Mentored a young team of students on wearables challenge implementing an innovative water filtration system.
- Demonstrated and presented the idea with working prototype eventually winning the innovation challenge.
IEEE-DCASE 2017 challenge - Task 4 - Large Scale weakly supervised sound event detection for smart cars, Carnegie Mellon University, 2017
Organizer of Task 4 “Large-scale weakly labeled semi-supervised sound event detection in domestic environments”. Accountable for code development, paper reviews and system submissions as well as providing technical support to participants.
IEEE-DCASE 2018 challenge - Task 4 - Large-scale weakly labeled semi-supervised sound event detection in domestic environments, Carnegie Mellon University, 2018
Organizer of Task 4 “Large-scale weakly labeled semi-supervised sound event detection in domestic environments”. Accountable for code development, dataset development, paper reviews and system submissions as well as providing technical support to participants via email and DCASE forum.
IEEE-DCASE 2019 challenge - Task 4 - Sound event detection in domestic environments, Carnegie Mellon University, 2019
Organizer of Task 4 “Sound event detection in domestic environments”. Accountable for code development, dataset development, paper reviews and system submissions as well as providing technical support to participants via email and DCASE forum.