̽��ֱ�� of Cambridge - machine vision

Robot ‘chef’ learns to recreate recipes from watching food videos

sc604 — Mon, 05 Jun 2023 01:00:00 +0000

̽��ֱ��researchers, from the ̽��ֱ�� of Cambridge, programmed their robotic chef with a ‘cookbook’ of eight simple salad recipes. After watching a video of a human demonstrating one of the recipes, the robot was able to identify which recipe was being prepared and make it.

In addition, the videos helped the robot incrementally add to its cookbook. At the end of the experiment, the robot came up with a ninth recipe on its own. Their results, reported in the journal IEEE Access, demonstrate how video content can be a valuable and rich source of data for automated food production, and could enable easier and cheaper deployment of robot chefs.

Robotic chefs have been featured in science fiction for decades, but in reality, cooking is a challenging problem for a robot. Several commercial companies have built prototype robot chefs, although none of these are currently commercially available, and they lag well behind their human counterparts in terms of skill.

Human cooks can learn new recipes through observation, whether that’s watching another person cook or watching a video on YouTube, but programming a robot to make a range of dishes is costly and time-consuming.

“We wanted to see whether we could train a robot chef to learn in the same incremental way that humans can – by identifying the ingredients and how they go together in the dish,” said Grzegorz Sochacki from Cambridge’s Department of Engineering, the paper’s first author.

Sochacki, a PhD candidate in Professor Fumiya Iida’s Bio-Inspired Robotics Laboratory, and his colleagues devised eight simple salad recipes and filmed themselves making them. They then used a publicly available neural network to train their robot chef. ̽��ֱ��neural network had already been programmed to identify a range of different objects, including the fruits and vegetables used in the eight salad recipes (broccoli, carrot, apple, banana and orange).

Using computer vision techniques, the robot analysed each frame of video and was able to identify the different objects and features, such as a knife and the ingredients, as well as the human demonstrator’s arms, hands and face. Both the recipes and the videos were converted to vectors and the robot performed mathematical operations on the vectors to determine the similarity between a demonstration and a vector.

By correctly identifying the ingredients and the actions of the human chef, the robot could determine which of the recipes was being prepared. ̽��ֱ��robot could infer that if the human demonstrator was holding a knife in one hand and a carrot in the other, the carrot would then get chopped up.

Of the 16 videos it watched, the robot recognised the correct recipe 93% of the time, even though it only detected 83% of the human chef’s actions. ̽��ֱ��robot was also able to detect that slight variations in a recipe, such as making a double portion or normal human error, were variations and not a new recipe. ̽��ֱ��robot also correctly recognised the demonstration of a new, ninth salad, added it to its cookbook and made it.

“It’s amazing how much nuance the robot was able to detect,” said Sochacki. “These recipes aren’t complex – they’re essentially chopped fruits and vegetables, but it was really effective at recognising, for example, that two chopped apples and two chopped carrots is the same recipe as three chopped apples and three chopped carrots.”

̽��ֱ��videos used to train the robot chef are not like the food videos made by some social media influencers, which are full of fast cuts and visual effects, and quickly move back and forth between the person preparing the food and the dish they’re preparing. For example, the robot would struggle to identify a carrot if the human demonstrator had their hand wrapped around it – for the robot to identify the carrot, the human demonstrator had to hold up the carrot so that the robot could see the whole vegetable.

“Our robot isn’t interested in the sorts of food videos that go viral on social media – they’re simply too hard to follow,” said Sochacki. “But as these robot chefs get better and faster at identifying ingredients in food videos, they might be able to use sites like YouTube to learn a whole range of recipes.”

̽��ֱ��research was supported in part by Beko plc and the Engineering and Physical Sciences Research Council (EPSRC), part of UK Research and Innovation (UKRI).

Reference:
Grzegorz Sochacki et al. ‘Recognition of Human Chef’s Intentions for Incremental Learning of Cookbook by Robotic Salad Chef.’ IEEE Access (2023). DOI: 10.1109/ACCESS.2023.3276234

Researchers have trained a robotic ‘chef’ to watch and learn from cooking videos, and recreate the dish itself.

We wanted to see whether we could train a robot chef to learn in the same incremental way that humans can – by identifying the ingredients and how they go together in the dish

Greg Sochacki

Robot ‘chef’ learns to recreate recipes from watching food videos

̽��ֱ��text in this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Images, including our videos, are Copyright © ̽��ֱ�� of Cambridge and licensors/contributors as identified. All rights reserved. We make our image and video content available in a number of ways – as here, on our main website under its Terms and conditions, and on a range of channels including social media that permit your use and sharing of our content under their respective Terms.

Yes

Phone-based measurements provide fast, accurate information about the health of forests

sc604 — Tue, 07 Mar 2023 01:21:40 +0000

̽��ֱ��researchers, from the ̽��ֱ�� of Cambridge, developed the algorithm, which gives an accurate measurement of tree diameter, an important measurement used by scientists to monitor forest health and levels of carbon sequestration.

̽��ֱ��algorithm uses low-cost, low-resolution LiDAR sensors that are incorporated into many mobile phones, and provides results that are just as accurate, but much faster, than manual measurement techniques. ̽��ֱ��results are reported in the journal Remote Sensing.

̽��ֱ��primary manual measurement used in forest ecology is tree diameter at chest height. These measurements are used to make determinations about the health of trees and the wider forest ecosystem, as well as how much carbon is being sequestered.

While this method is reliable, since the measurements are taken from the ground, tree by tree, the method is time-consuming. In addition, human error can lead to variations in measurements.

“When you’re trying to figure out how much carbon a forest is sequestering, these ground-based measurements are hugely valuable, but also time-consuming,” said first author Amelia Holcomb from Cambridge’s Department of Computer Science and Technology. “We wanted to know whether we could automate this process.”

Some aspects of forest measurement can be carried out using expensive special-purpose LiDAR sensors, but Holcomb and her colleagues wanted to determine whether these measurements could be taken using cheaper, lower-resolution sensors, of the type that are used in some mobile phones for augmented reality applications.

Other researchers have carried out some forest measurement studies using this type of sensor, however, this has been focused on highly-managed forests where trees are straight, evenly spaced and undergrowth is regularly cleared. Holcomb and her colleagues wanted to test whether these sensors could return accurate results for non-managed forests quickly, automatically, and in a single image.

“We wanted to develop an algorithm that could be used in more natural forests, and that could deal with things like low-hanging branches, or trees with natural irregularities,” said Holcomb.

̽��ֱ��researchers designed an algorithm that uses a smartphone LiDAR sensor to estimate trunk diameter automatically from a single image in realistic field conditions. ̽��ֱ��algorithm was incorporated into a custom-built app for an Android smartphone and is able to return results in near real time.

To develop the algorithm, the researchers first collected their own dataset by measuring trees manually and taking pictures. Using image processing and computer vision techniques, they were able to train the algorithm to differentiate trunks from large branches, determine which direction trees were leaning in, and other information that could help it refine the information about forests.

̽��ֱ��researchers tested the app in three different forests – one each in the UK, US and Canada – in spring, summer and autumn. ̽��ֱ��app was able to detect 100% of tree trunks and had a mean error rate of 8%, which is comparable to the error rate when measuring by hand. However, the app sped up the process significantly and was about four and a half times faster than measuring trees manually.

“I was surprised the app works as well as it does,” said Holcomb. “Sometimes I like to challenge it with a particularly crowded bit of forest, or a particularly oddly-shaped tree, and I think there’s no way it will get it right, but it does.”

Since their measurement tool requires no specialised training and uses sensors that are already incorporated into an increasing number of phones, the researchers say that it could be an accurate, low-cost tool for forest measurement, even in complex forest conditions.

̽��ֱ��researchers plan to make their app publicly available for Android phones later this spring.

̽��ֱ��research was supported in part by the David Cheriton Graduate Scholarship, the Canadian National Research Council, and the Harding Distinguished Postgraduate Scholarship.

Reference:
Amelia Holcomb, Linzhe Tong, and Srinivasan Keshav. ‘Robust Single-Image Tree Diameter Estimation with Mobile Phones.’ Remote Sensing (2023). DOI: 10.3390/rs15030772

Researchers have developed an algorithm that uses computer vision techniques to accurately measure trees almost five times faster than traditional, manual methods.

Ground-based measurements are hugely valuable, but also time-consuming. We wanted to know whether we could automate this process.

Amelia Holcomb

Baac3nes via Getty Images

Treetops seen from a low angle

̽��ֱ��text in this work is licensed under a Creative Commons Attribution 4.0 International License. Images, including our videos, are Copyright © ̽��ֱ�� of Cambridge and licensors/contributors as identified. All rights reserved. We make our image and video content available in a number of ways – as here, on our main website under its Terms and conditions, and on a range of channels including social media that permit your use and sharing of our content under their respective Terms.

Yes

Researchers design AI system to assess pain levels in sheep

sc604 — Wed, 31 May 2017 23:02:29 +0000

̽��ֱ��researchers have developed an AI system which uses five different facial expressions to recognise whether a sheep is in pain, and estimate the severity of that pain. ̽��ֱ��results could be used to improve sheep welfare, and could be applied to other types of animals, such as rodents used in animal research, rabbits or horses.

Building on earlier work which teaches computers to recognise emotions and expressions in human faces, the system is able to detect the distinct parts of a sheep’s face and compare it with a standardised measurement tool developed by veterinarians for diagnosing pain. Their results will be presented today (1 June) at the 12th IEEE International Conference on Automatic Face and Gesture Recognition in Washington, DC.

Severe pain in sheep is associated with conditions such as foot rot, an extremely painful and contagious condition which causes the foot to rot away; or mastitis, an inflammation of the udder in ewes caused by injury or bacterial infection. Both of these conditions are common in large flocks, and early detection will lead to faster treatment and pain relief. Reliable and efficient pain assessment would also help with early diagnosis.

As is common with most animals, facial expressions in sheep are used to assess pain. In 2016, Dr Krista McLennan, a former postdoctoral researcher at the ̽��ֱ�� of Cambridge who is now a lecturer in animal behaviour at the ̽��ֱ�� of Chester, developed the Sheep Pain Facial Expression Scale (SPFES). ̽��ֱ��SPFES is a tool to measure pain levels based on facial expressions of sheep, and has been shown to recognise pain with high accuracy. However, training people to use the tool can be time-consuming and individual bias can lead to inconsistent scores.

In order to make the process of pain detection more accurate, the Cambridge researchers behind the current study used the SPFES as the basis of an AI system which uses machine learning techniques to estimate pain levels in sheep. Professor Peter Robinson, who led the research, normally focuses on teaching computers to recognise emotions in human faces, but a meeting with Dr McLennan got him interested in exploring whether a similar system could be developed for animals.

“There’s been much more study over the years with people,” said Robinson, of Cambridge’s Computer Laboratory. “But a lot of the earlier work on the faces of animals was actually done by Darwin, who argued that all humans and many animals show emotion through remarkably similar behaviours, so we thought there would likely be crossover between animals and our work in human faces.”

According to the SPFES, when a sheep is in pain, there are five main things which happen to their faces: their eyes narrow, their cheeks tighten, their ears fold forwards, their lips pull down and back, and their nostrils change from a U shape to a V shape. ̽��ֱ��SPFES then ranks these characteristics on a scale of one to 10 to measure the severity of the pain.

“ ̽��ֱ��interesting part is that you can see a clear analogy between these actions in the sheep’s faces and similar facial actions in humans when they are in pain – there is a similarity in terms of the muscles in their faces and in our faces,” said co-author Dr Marwa Mahmoud, a postdoctoral researcher in Robinson’s group. “However, it is difficult to ‘normalise’ a sheep’s face in a machine learning model. A sheep’s face is totally different in profile than looking straight on, and you can’t really tell a sheep how to pose.”

To train the model, the Cambridge researchers used a small dataset consisting of approximately 500 photographs of sheep, which had been gathered by veterinarians in the course of providing treatment. Yiting Lu, a Cambridge undergraduate in Engineering and co-author on the paper, trained the model by labelling the different parts of the sheep’s faces on each photograph and ranking their pain levels according to SPFES.

Early tests of the model showed that it was able to estimate pain levels with about 80% degree of accuracy, which means that the system is learning. While the results with still photographs have been successful, in order to make the system more robust, they require much larger datasets.

̽��ֱ��next plans for the system are to train it to detect and recognise sheep faces from moving images, and to train it to work when the sheep is in profile or not looking directly at the camera. Robinson says that if they are able to train the system well enough, a camera could be positioned at a water trough or other place where sheep congregate, and the system would be able to recognise any sheep which were in pain. ̽��ֱ��farmer would then be able to retrieve the affected sheep from the field and get it the necessary medical attention.

“I do a lot of walking in the countryside, and after working on this project, I now often find myself stopping to talk to the sheep and make sure they’re happy,” said Robinson.

Reference
Yuting Lu, Marwa Mahmoud and Peter Robinson. ‘Estimating sheep pain level using facial action unit detection.’ Paper presented to the IEEE International Conference on Automatic Face and Gesture Recognition, Washington, DC. 30 May – 3 June, 2017. http://www.fg2017.org/.

Inset image: Left: Localised facial landmarks; Right: Normalised sheep face marked with feature bounding boxes.

An artificial intelligence system designed by researchers at the ̽��ֱ�� of Cambridge is able to detect pain levels in sheep, which could aid in early diagnosis and treatment of common, but painful, conditions in animals.

You can see a clear analogy between these actions in the sheep’s faces and similar facial actions in humans when they are in pain.

Marwa Mahmoud

Sheep

̽��ֱ��text in this work is licensed under a Creative Commons Attribution 4.0 International License. For image use please see separate credits above.

Yes

Teaching machines to see: new smartphone-based system could accelerate development of driverless cars

sc604 — Mon, 21 Dec 2015 06:34:09 +0000

Two newly-developed systems for driverless cars can identify a user’s location and orientation in places where GPS does not function, and identify the various components of a road scene in real time on a regular camera or smartphone, performing the same job as sensors costing tens of thousands of pounds.

̽��ֱ��separate but complementary systems have been designed by researchers from the ̽��ֱ�� of Cambridge and demonstrations are freely available online. Although the systems cannot currently control a driverless car, the ability to make a machine ‘see’ and accurately identify where it is and what it’s looking at is a vital part of developing autonomous vehicles and robotics.

̽��ֱ��first system, called SegNet, can take an image of a street scene it hasn’t seen before and classify it, sorting objects into 12 different categories – such as roads, street signs, pedestrians, buildings and cyclists – in real time. It can deal with light, shadow and night-time environments, and currently labels more than 90% of pixels correctly. Previous systems using expensive laser or radar based sensors have not been able to reach this level of accuracy while operating in real time.

Users can visit the SegNet website and upload an image or search for any city or town in the world, and the system will label all the components of the road scene. ̽��ֱ��system has been successfully tested on both city roads and motorways.

For the driverless cars currently in development, radar and base sensors are expensive – in fact, they often cost more than the car itself. In contrast with expensive sensors, which recognise objects through a mixture of radar and LIDAR (a remote sensing technology), SegNet learns by example – it was ‘trained’ by an industrious group of Cambridge undergraduate students, who manually labelled every pixel in each of 5000 images, with each image taking about 30 minutes to complete. Once the labelling was finished, the researchers then took two days to ‘train’ the system before it was put into action.

“It’s remarkably good at recognising things in an image, because it’s had so much practice,” said Alex Kendall, a PhD student in the Department of Engineering. “However, there are a million knobs that we can turn to fine-tune the system so that it keeps getting better.”

SegNet was primarily trained in highway and urban environments, so it still has some learning to do for rural, snowy or desert environments – although it has performed well in initial tests for these environments.

̽��ֱ��system is not yet at the point where it can be used to control a car or truck, but it could be used as a warning system, similar to the anti-collision technologies currently available on some passenger cars.

“Vision is our most powerful sense and driverless cars will also need to see,” said Professor Roberto Cipolla, who led the research. “But teaching a machine to see is far more difficult than it sounds.”

As children, we learn to recognise objects through example – if we’re shown a toy car several times, we learn to recognise both that specific car and other similar cars as the same type of object. But with a machine, it’s not as simple as showing it a single car and then having it be able to recognise all different types of cars. Machines today learn under supervision: sometimes through thousands of labelled examples.

There are three key technological questions that must be answered to design autonomous vehicles: where am I, what’s around me and what do I do next. SegNet addresses the second question, while a separate but complementary system answers the first by using images to determine both precise location and orientation.

̽��ֱ��localisation system designed by Kendall and Cipolla runs on a similar architecture to SegNet, and is able to localise a user and determine their orientation from a single colour image in a busy urban scene. ̽��ֱ��system is far more accurate than GPS and works in places where GPS does not, such as indoors, in tunnels, or in cities where a reliable GPS signal is not available.

It has been tested along a kilometre-long stretch of King’s Parade in central Cambridge, and it is able to determine both location and orientation within a few metres and a few degrees, which is far more accurate than GPS – a vital consideration for driverless cars. Users can try out the system for themselves here.

̽��ֱ��localisation system uses the geometry of a scene to learn its precise location, and is able to determine, for example, whether it is looking at the east or west side of a building, even if the two sides appear identical.

“Work in the field of artificial intelligence and robotics has really taken off in the past few years,” said Kendall. “But what’s cool about our group is that we’ve developed technology that uses deep learning to determine where you are and what’s around you – this is the first time this has been done using deep learning.”

“In the short term, we’re more likely to see this sort of system on a domestic robot – such as a robotic vacuum cleaner, for instance,” said Cipolla. “It will take time before drivers can fully trust an autonomous car, but the more effective and accurate we can make these technologies, the closer we are to the widespread adoption of driverless cars and other types of autonomous robotics.”

̽��ֱ��researchers are presenting details of the two technologies at the International Conference on Computer Vision in Santiago, Chile.

Two technologies which use deep learning techniques to help machines to see and recognise their location and surroundings could be used for the development of driverless cars and autonomous robotics – and can be used on a regular camera or smartphone.

Vision is our most powerful sense and driverless cars will also need to see, but teaching a machine to see is far more difficult than it sounds.

Roberto Cipolla

Teaching machines to see

Alex Kendall

SegNet demonstration

̽��ֱ��text in this work is licensed under a Creative Commons Attribution 4.0 International License. For image use please see separate credits above.

Yes