Toshiba’s Cambridge Research Laboratory (CRL) is excited to announce its acceleration of cutting-edge research into Embodied AI, a technology that combines physical presence with cooperative intelligence. This move reflects CRL’s commitment to advancing innovative research in the field of sustainable and human-centric AI. Our latest results are presented in two papers at the high-impact computer science conference, CVPR (Conference on Computer Vision and Pattern Recognition).
In today’s rapidly evolving landscape, AI is becoming the cornerstone of technological innovation. In this new world, conversational agents and virtual assistants have become common place, however, AI still hasn’t been effectively brought into physical domains, or to every industry; Fields such as logistics, maintenance and manufacturing cannot be fully addressed in cyberspace or using software alone. For example, Embodied AI plays an important role in reshaping the retail industry which has interaction in dynamic environments with ever-changing product offerings and customer demands.
CRL is at the forefront of research into Embodied AI, fueled by a £15 million investment from Toshiba in the next five years into the field. In line with this vision, Toshiba is excited to announce that CRL’s innovative core technologies are going to enhance Toshiba’s existing AI catalogue very soon. Our first industrial prototype of Embodied AI is planned for presentation in 2027, bringing us closer to a new era of intelligent collaboration between humans and machines.
Toshiba Newsroom
Read the latest news and announcements about our product and service solutions.
- News
- Regarding the Integration of Toshiba Energy Systems & Solutions Corporation into Toshiba Corporation
The Essence of Embodied AI
The renewed focus of CRL to Embodied AI reflects Toshiba’s ongoing commitment to improve the world through innovation. Building on a foundation of results from former Computer Vision Group and Speech Technology Group in 3D perception and human interaction, CRL’s new Vision & Learning Group (VLG) and Language & Interaction Group (LIG) will drive advancements towards key objectives:
- Fast Adaptation: CRL’s research will explore methods for enabling rapid adaptation of AI systems to new environments. By interacting with both humans and the environment, these systems can be deployed with minimal effort and cost.
- Continuous Learning: Leveraging past experiences and multiple deployments, CRL’s technology will generalise knowledge into “common sense.” This continuous learning process enhances functionality and ensures ever-improving AI technologies across diverse scenarios.
CRL’s strategic shift to Embodied AI aligns with Toshiba’s strategic plan for software defined services. As part of Toshiba’s broader digital transformation, CRL envisions an AI that can adapt to new hardware with minimal effort and facilitate the completion of long-horizon tasks by combining the collaborative strength of humans and machines.
Toshiba CRL’s Technical Presentations at CVPR 2024
As part of their latest research achievements, Toshiba’s Vision & Learning Group (VLG) is set to present two papers on Embodied AI at CVPR. CVPR is the largest and one of the most influential international conferences in this field. These breakthroughs address fundamental technologies on two core challenges of Embodied AI: simplified interaction and fast adaptation.
Links
1. Simplified Interaction: An Innovative Pose estimation through Natural Language
Traditionally, setting up robotic systems required expert knowledge, however, VLG’s technology aims to simplify this process, making it accessible to a wider audience. VLG’s Dialogue-Based Localization system pioneers the combination of natural language interaction with geometric computer vision tasks. By reasoning about possible robot poses within novel environments, the system iteratively refines pose estimates during dialogues. Key features include:
- Natural Language Reasoning: Our system leverages state-of-the-art machine learning methods that are trained on incredibly large language and vision datasets (called foundation models) to estimate poses based on textual input.
Chao Zhang, VLG’s expert on multi-modal foundation models, emphasises: “Not only does this present a world first in combining a vision and language foundation model in an iterative setting. Looking at it from a customer perspective, our technology is also privacy preserving, as no sensitive image data is required for the localization task.”
2. Fast Adaptation: Introducing ReCoRe, an Efficient Training Framework of World Models
In our ever-changing world, robots must quickly adapt to new environments and tasks. VLG’s approach ensures efficient learning and generalization across diverse scenarios. ReCoRe (Regularized Contrastive Representation Learning) guides the training of world models in autonomous systems. These models represent a simplified internal environment abstraction, capturing essential aspects without unnecessary complexity. Our approach:
- Guided Learning: By incorporating task-specific auxiliary tasks based on expert knowledge, our model learns faster and more efficiently (with fewer samples and reduced computation).
This technology will be presented by Rudra Poudel, VLG’s lead scientist on World Models for Reinforcement Learning, commenting on the results he says “World Models compress noisy sensor input, emphasizing task-relevant signals. They let robots ‘imagine’ future outcomes and choose optimal actions. Our ReCoRe framework leads in efficient world model learning for reinforcement learning and domain adaptation.”
Latest Publications
Information contained in news and other announcements is current on the date of posting, but subject to change without notice.
A. J. Shields, J. L. Osborne, M. Y. Simmons, M. Pepper and D. A. Ritchie
Phys. Rev. B, vol 52, no 8, pp. 5523R-5526R, 15 August 1995
R. A. Hogg, C. E. Norman, A. J. Shields, M. Pepper and N. Iizuka
Appl. Phys. Lett., vol 76, no 11, pp. 1428-1430, 18 January 2000
A. J. Shields, M. P. O’Sullivan, I. Farrer, C. E. Norman, D. A. Ritchie, K. Cooper and M. Pepper
Physica E, vol 7, no 3–4, pp. 479-483, 13 June 2000
R. A. Hogg, C. E. Norman, A. J. Shields, M. Pepper and N. Iizuka
Physica E, vol 7, no 3–4, pp. 924-928, 13 June 2000
R. Kaur, A. J. Shields, J. L. Osborne, M. Y. Simmons, D. A. Ritche and M. Pepper
Phys. Status Solidi A, vol 178, no 1, pp. 465-470, 14 June 2000
C. L. Foden, V. I. Talyanskii, G. J. Milburn, M. L. Leadbeater and M. Pepper
Phys. Rev. A, vol 62, 011803R (4 pages), 14 June 2000
A. J. Shields, M. P. O’Sullivan, I. Farrer, D. A. Ritchie, R. A. Hogg, M. L. Leadbeater, C. E. Norman and M. Pepper
Appl. Phys. Lett., vol 76, no 25, pp. 3673-3675, 19 June 2000
S. Vijendran, G. A. C. Jones, H. E. Beere and A. J. Shields
Microelectronic Eng., vol 53, no 1–4, pp. 631-634, 26 October 2000
D. Sanvitto, R. A. Hogg, A. J. Shields, D. M. Whittaker, M. Y. Simmonds, D. A. Ritchie and M. Pepper
Phys. Rev. B, vol 62, no 20, pp. 13294R-13297R, 15 November 2000
G. Webster
Proc. Interspeech 2004, Jeju Island, Korea, October 2004
G. Webster, T. Burrows and K. Knill
Proc. Interspeech 2005, Lisbon, Portugal, September 2005
T. Ingulfsen, T. Burrows and S. Buchholz
Proc. Interspeech 2005, Lisbon, Portugal, September 2005
T. Burrows, P. Jackson, K. Knill and D. Sityaev
Proc. Interspeech 2005, Lisbon, Portugal, September 2005
J. Bishop, M. Peake and D. Sityaev
Proc. Interspeech 2005, Lisbon, Portugal, September 2005
A Study on Endpoint Detection for Speech Recognition Based on Discriminative Feature Extraction
K. Yamamoto, F. Jabloun, K. Reinhard and A. Kawamura
Information Processing Society of Japan Audio Language Information Processing, December 2005
P. Olaszo, T. Burrows and K. Knill
Proc. Multiling 2006, Stellenbosch, South Africa, April 2006
K. Yamamoto, F. Jabloun, K. Reinhard and A. Kawamura
Proc. ICASSP 2006, Toulouse, France, May 2006
D. Sityaev, T. Burrows, P. Jackson and K. Knill
Proc. Speech Prosody 2006, Dresden, Germany, May 2006
S. Buchholz and D. Green
Proc. LREC Workshop 2006, Genoa, Italy, May 2006
N. Braunschweiler
Proc. Speech Prosody 2006, Dresden, Germany, May 2006
P. Bell, T. Burrows and P. Taylor
Proc. Speech Prosody 2006, Dresden, Germany, May 2006
G. Vogiatzis, C. Hernández and R. Cipolla
CVPR, June 2006
S. Buchholz and E. Marsi
Proc. Conf. Computational Natural Language Learning (CoNLL-X), New York City, USA, June 2006
D. Sityaev, K. Knill and T. Burrows
Proc. Interspeech 2006, Pittsburgh, PA, USA, September 2006
Careers
Committed to People,
Committed to the Future.
“Committed to People, Committed to the Future” is the Basic Commitment of the Toshiba Group.