Dr Yoshi Gotoh
PhD
School of Computer Science
Lecturer
Student Projects Officer
Foundation Year Tutor
Member of the Speech and Hearing (SpandH) research group
y.gotoh@sheffield.ac.uk
+44 114 222 1908
+44 114 222 1908
Regent Court (DCS)
Full contact details
Dr Yoshi Gotoh
School of Computer Science
Regent Court (DCS)
211 Portobello
91Ö±²¥
S1 4DP
School of Computer Science
Regent Court (DCS)
211 Portobello
91Ö±²¥
S1 4DP
- Profile
-
Yoshi is a lecturer in the Department of Computer Science. He has a first degree in Engineering form the University of Tokyo and a PhD from Brown University.
- Research interests
-
Yoshi has been working in the field of speech and spoken language processing for years. His current interests include audio visual processing, in particular, video analysis and video information retrieval.
- Publications
-
Journal articles
- . Machine Vision and Applications, 31.
- . Machine Vision and Applications, 28(3-4), 243-265.
- A statistical model for annotating videos with human actions. Pakistan Journal of Statistics, 32(2), 109-123.
- . Information Sciences, 303, 61-82.
- . Neurocomputing, 161, 56-64.
- . International Journal of Advanced Robotic Systems, 9.
- . NAT LANG ENG, 15, 193-213.
- Glasgow University at TRECVID 2009. 2009 TREC Video Retrieval Evaluation Notebook Papers.
- . IEEE T AUDIO SPEECH, 16(1), 151-161.
- Information extraction from broadcast news. PHILOS T ROY SOC A, 358(1769), 1295-1309.
- Topic-based mixture language modelling. Natural Language Engineering, 5(4), 355-375.
- Efficient training algorithms for HMM's using incremental estimation. IEEE T SPEECH AUDI P, 6(6), 539-548.
- . Artificial Intelligence, 85(1-2), 45-57.
- . Artificial Intelligence, 84(1-2), 357-357.
- . IEEE Signal Processing Letters, 3(4), 103-106.
Conference proceedings papers
- University of Engineering & Technology, Lahore the University of 91Ö±²¥ at TRECVID 2015: Instance search. 2015 TREC Video Retrieval Evaluation, TRECVID 2015
- 91Ö±²¥ and University of Engineering & Technology, Lahore at TECVID 2014: Instance search task. 2014 TREC Video Retrieval Evaluation, TRECVID 2014
- . ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 2367-2371). Brighton, 12 May 2019 - 17 May 2019.
- Graph-based correlated topic model for motion patterns analysis in crowded scenes from tracklets. British Machine Vision Conference 2018, BMVC 2018
- Graph-based correlated topic model for motion patterns analysis in crowded scenes from tracklets. British Machine Vision Conference 2018, BMVC 2018
- . IEEE Winter Conference on Applications of Computer Vision (pp 1029-1037), 12 March 2018 - 14 March 2018.
- . Medical Image Understanding and Analysis, Vol. 723 (pp 571-580)
- Natural language descriptions for human activities in video streams. INLG 2017 - 10th International Natural Language Generation Conference, Proceedings of the Conference (pp 85-94)
- Natural language descriptions of human activities scenes: corpus generation and analysis. 5th Workshop on Vision and Language. Berlin
- Analysis of visemes in the GRID corpus. Abstract of UKspeech
- Overlapped interest and the impact of visual and audio information in the human perception. Abstract of UKspeech
- 91Ö±²¥ and University of Engineering & Technology, Lahore at TRECVID 2016: Video to text description task. 2016 TREC Video Retrieval Evaluation, TRECVID 2016
- Corpus generation and analysis: incorporating audio data towards curbing missing information. Proceedings of KDWEB
- Describing spatio-temporal relations between object volumes in video streams. AAAI Workshop - Technical Report, Vol. WS-15-14 (pp 2-8)
- University of Engineering & Technology, Lahore the University of 91Ö±²¥ at TRECVID 2015: Instance search. 2015 TREC Video Retrieval Evaluation, TRECVID 2015
- . ICISP. Cherbourg, 30 June 2014.
- Alignment of nearly-repetitive contents in a video stream with manifold embedding. ICASSP. Firenze
- Video clip retrieval by graph matching. ECIR. Amsterdam
- 91Ö±²¥ and University of Engineering & Technology, Lahore at TECVID 2014: Instance search task. 2014 TREC Video Retrieval Evaluation, TRECVID 2014
- Action recognition: spatio-temporal human body region tracking approach. CAIP - REACTS workshop. York
- Spatio-temporal manifold embedding for nearly-repetitive contents in a video stream. CAIP. York
- Spatio-temporal human body segmentation from video stream. CAIP. York
- 91Ö±²¥, Harbin Engineering University and University of Engineering & Technology, Lahore at TRECVID 2013: Instance search & semantic indexing. 2013 TREC Video Retrieval Evaluation, TRECVID 2013
- 91Ö±²¥, Harbin Engineering University and University of Engineering & Technology, Lahore at TRECVID 2013: Instance Search & Semantic indexing. TRECVID
- 91Ö±²¥ and Harbin Engineering University at TRECVID 2012: Instance Search. TRECVID
- Human focused video description. ICCV - VECTaR workshop. Barcelona
- Video scene classification based on natural language description. ICCV - ARTEMIS workshop. Barcelona
- Towards coherent natural language description of video streams. ICCV - SIG workshop. Barcelona
- Nearly-repetitive video synchonisation using nonlinear manifold embedding. ICASSP. Dallas
- University of 91Ö±²¥ at TRECVID 2008: Rushes Summarisation and Video Copy Detection.. TRECVID
- Shot alignment in pre-production video. MLMI. Utrecht
- University of 91Ö±²¥ at TRECVID 2007: Shot Boundary Detection and Rushes Summarisation.. TRECVID
- Speaker Role Based Structural Classification of Broadcast News Stories. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 (pp 141-144)
- Relative Evaluation of Informativeness in Machine Generated Summaries. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 (pp 145-148)
- Mutli-stage compaction approach to broadcast news summarisation. Interspeech. Lisbon
- On the subjectivity of human authored short summaries. ACL Workshop: Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarizati. Ann Arbor
- Maximum entropy segmentation of broadcast news. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 (pp 1029-1032)
- Decremental feature-based compaction. DUC Workshop. Boston
- From text summarisation to style-specific summarisation for broadcast news. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, Vol. 2997 (pp 223-237)
- Are extractive text summarisation techniques portable to broadcast news?. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03 (pp 489-494)
- Exploring the style-technique interaction in extractive summarization of broadcast news. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03 (pp 495-500)
- Statistical language modelling. TEXT- AND SPEECH-TRIGGERED INFORMATION ACCESS, Vol. 2705 (pp 78-105)
- Punctuation Annotation Using Statistical Prosody Models. Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding (pp 35-40)
- Sentence boundary detection in broadcast speech transcripts. ISCA ASR Workshop. Paris
- Variable word rate n-grams. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI (pp 1591-1594)
- Integrated transcription and identification of named entities in broadcast speech. Eurospeech. Budapest
- Statistical annotation of named entities in spoken audio. ESCA Workshop: Accessing Information in Spoken Audio. Cambridge
- Named entity tagged language models. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI (pp 513-516)
- Document space models using latent semantic analysis. Eurospeech. Rhodes
- Microphone-array speech recognition via incremental MAP training.. ICASSP. Atlanta
- Incremental ML estimation of HMM parameters for efficient training. ICASSP. Atlanta
- Incremental MAP estimation of HMMs for efficient training and improved performance. ICASSP. Detroit
- Using MAP estimated parameters to improve HMM speech recognition performance. ICASSP. Adelaide
- Improving audiovisual active speaker detection in egocentric recordings with the data-efficient image transformer. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2023). Taipei, Taiwan, 16 December 2023 - 16 December 2023.
- Exploration of verbal descriptions and dynamic indoors environments for people with sight loss. proceedings of ACM CHI 2023
- Natural language descriptions for video streams. V&L Net Workshop. 91Ö±²¥, December 2012.
- Spatio-temporal SIFT and its application to human action classification. ECCV - VECTaR workshop. Firenze, October 2012.
- Spatio-temporal video representation with locality-constrained linear coding. ECCV - ARTEMIS workshop. Firenze, October 2012.
- Generating coherent natural language annotations for video streams. ICIP. Orlando, September 2012.
- Natural language descriptions of visual scenes: corpus generation and analysis. EACL workshop. Avignon, April 2012.
- Describing video contents in natural language. EACL workshop. Avignon, April 2012.
- . Interspeech 2007
- . Interspeech 2007
Working papers
- Grants
-
Current Grants
- Multimedia Analysis for Unsupervised Dubbing In Entertainment (MAUDIE), InnovateUK, 04/2018 to 03/2021, £393,115, as Co-PI
Previous Grants
- , EPSRC, 12/2001 to 09/2005, £284,248, as Co-PI
- Professional activities and memberships
-
Member of the research group