Gotoh, Yoshi, Dr

Dr Yoshi Gotoh

PhD

School of Computer Science

Lecturer

Student Projects Officer

Member of the Speech and Hearing (SpandH) research group

y.gotoh@sheffield.ac.uk

Regent Court (DCS)

Full contact details

Dr Yoshi Gotoh
School of Computer Science
Regent Court (DCS)
211 Portobello
91ֱ��
S1 4DP

Profile: Yoshi is a lecturer in the Department of Computer Science. He has a first degree in Engineering form the University of Tokyo and a PhD from Brown University.

Research interests: Yoshi has been working in the field of speech and spoken language processing for years. His current interests include audio visual processing, in particular, video analysis and video information retrieval.

Publications

Journal articles

Al Ghamdi M & Gotoh Y (2020) . Machine Vision and Applications, 31.
Khan MUG & Gotoh Y (2017) . Machine Vision and Applications, 28(3-4), 243-265.
Khan MUG, Nasir A, Riaz O, Gotoh Y & Amiruddin M (2016) A statistical model for annotating videos with human actions. Pakistan Journal of Statistics, 32(2), 109-123.
Khan M, AlHarbi N & Gotoh Y (2015) . Information Sciences, 303, 61-82.
Al Harbi N & Gotoh Y (2015) . Neurocomputing, 161, 56-64.
Zhang L, Gotoh Y & Khan M (2012) . International Journal of Advanced Robotic Systems, 9.
Kolluru B & Gotoh Y (2009) . NAT LANG ENG, 15, 193-213.
Punitha P, Misra H, Ren R, Hannah D, Goyal A, Villa R & Jose JM (2009) Glasgow University at TRECVID 2009. 2009 TREC Video Retrieval Evaluation Notebook Papers.
Christensen H, Gotoh Y & Renals S (2008) . IEEE T AUDIO SPEECH, 16(1), 151-161.
Gotoh Y & Renals S (2000) Information extraction from broadcast news. PHILOS T ROY SOC A, 358(1769), 1295-1309.
Gotoh Y & Renals S (1999) Topic-based mixture language modelling. Natural Language Engineering, 5(4), 355-375.
Gotoh Y, Hochberg MM & Silverman HF (1998) Efficient training algorithms for HMM's using incremental estimation. IEEE T SPEECH AUDI P, 6(6), 539-548.
Charniak E, Carroll G, Adcock J, Cassandra A, Gotoh Y, Katz J, Littman M & McCanna J (1996) . Artificial Intelligence, 85(1-2), 45-57.
Charniak E, Caroll G, Adcock J, Cassandra A, Gotoh Y, Katz J, Littman M & McCann J (1996) . Artificial Intelligence, 84(1-2), 357-357.
Mashao D, Gotoh Y & Silverman HF (1996) . IEEE Signal Processing Letters, 3(4), 103-106.

Conference proceedings papers

Alvi M, Khan MUG, Gotoh Y, Sadiq M & Aslam M (2020) University of Engineering & Technology, Lahore the University of 91ֱ�� at TRECVID 2015: Instance search. 2015 TREC Video Retrieval Evaluation, TRECVID 2015
Amanat S, Khan MUG, Nida N & Gotoh Y (2020) 91ֱ�� and University of Engineering & Technology, Lahore at TECVID 2014: Instance search task. 2014 TREC Video Retrieval Evaluation, TRECVID 2014
Algadhy R, Gotoh Y & Maddock S (2019) . ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 2367-2371). Brighton, 12 May 2019 - 17 May 2019.
Al Ghamdi M & Gotoh Y (2019) Graph-based correlated topic model for motion patterns analysis in crowded scenes from tracklets. British Machine Vision Conference 2018, BMVC 2018
Al Ghamdi M & Gotoh Y (2019) Graph-based correlated topic model for motion patterns analysis in crowded scenes from tracklets. British Machine Vision Conference 2018, BMVC 2018
Al Ghamdi M & Gotoh Y (2018) . IEEE Winter Conference on Applications of Computer Vision (pp 1029-1037), 12 March 2018 - 14 March 2018.
Khan MUG, Gotoh Y & Nida N (2017) . Medical Image Understanding and Analysis, Vol. 723 (pp 571-580)
Al Harbi N & Gotoh Y (2017) Natural language descriptions for human activities in video streams. INLG 2017 - 10th International Natural Language Generation Conference, Proceedings of the Conference (pp 85-94)
AlHarbi N & Gotoh Y (2016) Natural language descriptions of human activities scenes: corpus generation and analysis. 5th Workshop on Vision and Language. Berlin
Algadhy R, Gotoh Y & Maddock S (2016) Analysis of visemes in the GRID corpus. Abstract of UKspeech
Masrani A & Gotoh Y (2016) Overlapped interest and the impact of visual and audio information in the human perception. Abstract of UKspeech
Wahla SQ, Waqar S, Ghani Khan MU & Gotoh Y (2016) 91ֱ�� and University of Engineering & Technology, Lahore at TRECVID 2016: Video to text description task. 2016 TREC Video Retrieval Evaluation, TRECVID 2016
Masrani A & Gotoh Y (2015) Corpus generation and analysis: incorporating audio data towards curbing missing information. Proceedings of KDWEB
Al Harbi N & Gotoh Y (2015) Describing spatio-temporal relations between object volumes in video streams. AAAI Workshop - Technical Report, Vol. WS-15-14 (pp 2-8)
Alvi M, Khan MUG, Gotoh Y, Sadiq M & Aslam M (2015) University of Engineering & Technology, Lahore the University of 91ֱ�� at TRECVID 2015: Instance search. 2015 TREC Video Retrieval Evaluation, TRECVID 2015
Al Ghamdi M & Gotoh Y (2014) . ICISP. Cherbourg, 30 June 2014.
Al Ghamdi M & Gotoh Y (2014) Alignment of nearly-repetitive contents in a video stream with manifold embedding. ICASSP. Firenze
Al Ghamdi M & Gotoh Y (2014) Video clip retrieval by graph matching. ECIR. Amsterdam
Amanat S, Khan MUG, Nida N & Gotoh Y (2014) 91ֱ�� and University of Engineering & Technology, Lahore at TECVID 2014: Instance search task. 2014 TREC Video Retrieval Evaluation, TRECVID 2014
Al Harbi N & Gotoh Y (2013) Action recognition: spatio-temporal human body region tracking approach. CAIP - REACTS workshop. York
Al Ghamdi M & Gotoh Y (2013) Spatio-temporal manifold embedding for nearly-repetitive contents in a video stream. CAIP. York
Al Harbi N & Gotoh Y (2013) Spatio-temporal human body segmentation from video stream. CAIP. York
Khan MUG, Bashir K, Shah AA, Zhang L, Gotoh Y, Khan PI & Amiruddin M (2013) 91ֱ��, Harbin Engineering University and University of Engineering & Technology, Lahore at TRECVID 2013: Instance search & semantic indexing. 2013 TREC Video Retrieval Evaluation, TRECVID 2013
Khan M, Bashir K, Shah A, Zhang L, Gotoh Y, Khan P & Amiruddin M (2013) 91ֱ��, Harbin Engineering University and University of Engineering & Technology, Lahore at TRECVID 2013: Instance Search & Semantic indexing. TRECVID
Al Ghamdi M, Khan M, Zhang L & Gotoh Y (2012) 91ֱ�� and Harbin Engineering University at TRECVID 2012: Instance Search. TRECVID
Khan M, Zhang L & Gotoh Y (2011) Human focused video description. ICCV - VECTaR workshop. Barcelona
Zhang L, Khan M & Gotoh Y (2011) Video scene classification based on natural language description. ICCV - ARTEMIS workshop. Barcelona
Khan M, Zhang L & Gotoh Y (2011) Towards coherent natural language description of video streams. ICCV - SIG workshop. Barcelona
Chantamunee S & Gotoh Y (2010) Nearly-repetitive video synchonisation using nonlinear manifold embedding. ICASSP. Dallas
Chantamunee S & Gotoh Y (2008) University of 91ֱ�� at TRECVID 2008: Rushes Summarisation and Video Copy Detection.. TRECVID
Chantamunee S & Gotoh Y (2008) Shot alignment in pre-production video. MLMI. Utrecht
Chantamunee S & Gotoh Y (2007) University of 91ֱ�� at TRECVID 2007: Shot Boundary Detection and Rushes Summarisation.. TRECVID
Kolluru B & Gotoh Y (2007) Speaker Role Based Structural Classification of Broadcast News Stories. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 (pp 141-144)
Kolluru B & Gotoh Y (2007) Relative Evaluation of Informativeness in Machine Generated Summaries. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 (pp 145-148)
Kolluru B, Christensen H & Gotoh Y (2005) Mutli-stage compaction approach to broadcast news summarisation. Interspeech. Lisbon
Kolluru B & Gotoh Y (2005) On the subjectivity of human authored short summaries. ACL Workshop: Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarizati. Ann Arbor
Christensen H, Kolluru BK, Gotoh Y & Renals S (2005) Maximum entropy segmentation of broadcast news. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 (pp 1029-1032)
Kolluru B, Christensen H & Gotoh Y (2004) Decremental feature-based compaction. DUC Workshop. Boston
Christensen H, Kolluru BK, Gotoh Y & Renals S (2004) From text summarisation to style-specific summarisation for broadcast news. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, Vol. 2997 (pp 223-237)
Christensen H, Gotoh Y, Kolluru B & Renals S (2003) Are extractive text summarisation techniques portable to broadcast news?. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03 (pp 489-494)
Kolluru B, Christensen H, Gotoh Y & Renals S (2003) Exploring the style-technique interaction in extractive summarization of broadcast news. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03 (pp 495-500)
Gotoh Y & Renals S (2003) Statistical language modelling. TEXT- AND SPEECH-TRIGGERED INFORMATION ACCESS, Vol. 2705 (pp 78-105)
Christensen H, Gotoh Y & Renals S (2001) Punctuation Annotation Using Statistical Prosody Models. Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding (pp 35-40)
Gotoh Y & Renals S (2000) Sentence boundary detection in broadcast speech transcripts. ISCA ASR Workshop. Paris
Gotoh Y & Renals S (2000) Variable word rate n-grams. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI (pp 1591-1594)
Renals S & Gotoh Y (1999) Integrated transcription and identification of named entities in broadcast speech. Eurospeech. Budapest
Gotoh Y & Renals S (1999) Statistical annotation of named entities in spoken audio. ESCA Workshop: Accessing Information in Spoken Audio. Cambridge
Gotoh Y, Renals S & Williams G (1999) Named entity tagged language models. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI (pp 513-516)
Gotoh Y & Renals S (1997) Document space models using latent semantic analysis. Eurospeech. Rhodes
Adcock J, Gotoh Y, Mashao D & Silverman HF (1996) Microphone-array speech recognition via incremental MAP training.. ICASSP. Atlanta
Gotoh Y & Silverman HF (1996) Incremental ML estimation of HMM parameters for efficient training. ICASSP. Atlanta
Gotoh Y, Hochberg MM, Mashao D & Silverman HF (1995) Incremental MAP estimation of HMMs for efficient training and improved performance. ICASSP. Detroit
Gotoh Y, Hochberg MM & Silverman HF (1994) Using MAP estimated parameters to improve HMM speech recognition performance. ICASSP. Adelaide
Clarke J, Gotoh Y & Goetze S () Speaker Embedding Informed Audiovisual Active Speaker Detection for Egocentric Recordings. Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing / sponsored by the Institute of Electrical and Electronics Engineers Signal Processing Society. ICASSP (Conference)
Clarke J, Gotoh Y & Goetze S () Improving audiovisual active speaker detection in egocentric recordings with the data-efficient image transformer. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2023). Taipei, Taiwan, 16 December 2023 - 16 December 2023.
Alrashidi A, Abhayaratne C, Cudd P & Gotoh Y () Exploration of verbal descriptions and dynamic indoors environments for people with sight loss. proceedings of ACM CHI 2023
Khan M, Al Harbi N & Gotoh Y () Natural language descriptions for video streams. V&L Net Workshop. 91ֱ��, December 2012.
Al Ghamdi M, Zhang L & Gotoh Y () Spatio-temporal SIFT and its application to human action classification. ECCV - VECTaR workshop. Firenze, October 2012.
Al Ghamdi M, Al Harbi N & Gotoh Y () Spatio-temporal video representation with locality-constrained linear coding. ECCV - ARTEMIS workshop. Firenze, October 2012.
Khan M, Zhang L & Gotoh Y () Generating coherent natural language annotations for video streams. ICIP. Orlando, September 2012.
Khan M & Gotoh Y () Natural language descriptions of visual scenes: corpus generation and analysis. EACL workshop. Avignon, April 2012.
Khan M & Gotoh Y () Describing video contents in natural language. EACL workshop. Avignon, April 2012.
Kolluru B & Gotoh Y () . Interspeech 2007
Kolluru B & Gotoh Y () . Interspeech 2007

Working papers

Urban J, Hilaire X, Hopfgartner F, Villa R, Jose JM, Chantamunee S & Gotoh Y (2006) Glasgow University at TRECVID 2006. TRECVID 2006 - Text REtrieval Conference TRECVid Workshop, 363-367.

Grants

Research Grants

Visual Understanding for Fake Imagery Detect, Innovate UK, 09/2021 - 03/2024, £218,226, as Co-PI
Multimedia Analysis for Unsupervised Dubbing In Entertainment (MAUDIE), Innovate UK, 04/2018 - 03/2021, £393,115, as Co-PI

S3L: Statistical Summarization of Spoken Language, EPSRC, 12/2001 - 09/2005, £284,248, as Co-PI

Professional activities and memberships: Member of the research group

91ֱ��

School of Computer Science

School of Computer Science

Dr Yoshi Gotoh

Journal articles

Conference proceedings papers

Working papers

Research Grants

Links