Commanding a drone with a natural language is not only user-friendly but also opens the door for emerging language agents to control the drone. Emerging large language models (LLMs) provide a previously impossible opportunity to automatically translate a task description in a natural language to a program that can be executed by the drone. However, powerful LLMs and their vision counterparts are limited in three important ways. First, they are only available as cloud-based services. Sending images to the cloud raises privacy concerns. Second, they are expensive, costing proportionally to the request size. Finally, without expensive fine-tuning, existing LLMs are quite limited in their capability of writing a program for specialized systems like drones.
In this paper, we present a system called TypeFly that tackles the above three problems using a combination of edge-based vision intelligence, novel programming language design, and prompt engineering. Instead of the familiar Python, TypeFly gets a cloud-based LLM service to write a program in a small, custom language called MiniSpec, based on task and scene descriptions in English. Such MiniSpec programs are not only succinct (and therefore efficient) but also able to consult the LLM during their execution using a special skill called query. Using a set of increasingly challenging drone tasks, we show that design choices made by TypeFly can reduce both the cost of LLM service and the task execution time by more than 2×. More importantly, query and prompt engineering techniques contributed by TypeFly significantly improve the chance of success of complex tasks.
This is the demo for task: "Can you find something for me to eat? If you can, go for it and return. Otherwise, find and go to something drinkable." It tests the TypeFly's ability to handle complex tasks. The drone is able to find the food and drink based on conditional statements.
This is the demo for task: "If you can see more than one chair behind you, then turn and go to the one with books on it." It tests the TypeFly's ability to handle conditional statements. The drone is able to check the number of chairs and find the chair with books on it.
@misc{chen2023typefly,
author = {Guojun Chen and Xiaojing Yu and Lin Zhong},
title = {TypeFly: Flying Drones with Large Language Model},
booktitle = {arXiv:2312.14950},
year = {2023},
}