M
MercyNews
Home
Back
Apenas um LLM consegue pilotar um drone com sucesso
Tecnologia

Apenas um LLM consegue pilotar um drone com sucesso

Hacker News7h ago
3 min de leitura
📋

Fatos Principais

  • SnapBench é um novo benchmark projetado para testar grandes modelos de linguagem em sua capacidade de pilotar drones usando dados visuais.
  • GPT-4o foi o único modelo, entre todos os testados, que completou com sucesso o desafio de voo de drone.
  • O benchmark destaca uma lacuna significativa entre as capacidades de raciocínio da IA e sua capacidade de realizar tarefas físicas.
  • Essas descobertas sugerem que os LLMs atuais ainda não estão prontos para uso generalizado em aplicações de robótica autônoma.

O Desafio do Drone

Um novo benchmark revelou uma limitação surpreendente na inteligência artificial atual: apenas um grande modelo de linguagem demonstrou a capacidade de pilotar um drone com sucesso. Os resultados vêm do SnapBench, uma nova estrutura de teste projetada para avaliar o quão bem os sistemas de IA podem interpretar dados visuais e executar tarefas físicas.

O benchmark foi compartilhado recentemente no Hacker News, gerando discussão sobre a prontidão da IA para aplicações de robótica. Embora os LLMs tenham mostrado capacidades impressionantes na geração de texto e raciocínio, seu desempenho no mundo físico permanece um obstáculo significativo. Este último teste fornece evidências concretas dessa lacuna.

Dentro do SnapBench

SnapBench representa uma nova fronteira na avaliação de IA, indo além de benchmarks tradicionais baseados em texto para testar aplicações do mundo real. O framework apresenta aos modelos um desafio específico: interpretar instantâneos visuais e emitir comandos para navegar um drone através de um curso. Isso requer uma combinação de compreensão visual, raciocínio espacial e geração precisa de instruções.

O teste é projetado para ser rigoroso, simulando o tipo de tomada de decisão dinâmica necessária para a robótica autônoma. Ao contrário de problemas estáticos, o voo de drone exige adaptação contínua a condições em mudança. Os resultados do benchmark indicam que a maioria dos modelos atuais falha em preencher a lacuna entre o conhecimento abstrato e a execução prática.

Aspectos-chave do benchmark incluem:

  • Requisitos de processamento visual em tempo real
  • Tarefas complexas de navegação espacial
  • Geração contínua de comandos
  • Restrições de segurança e precisão

"Apenas 1 LLM pode pilotar um drone"

— Descobertas do SnapBench

A Única História de Sucesso

Entre todos os modelos testados, o GPT-4o surgiu como o único candidato bem-sucedido. Sua capacidade de processar entradas visuais e gerar comandos de voo precisos o destacou de seus concorrentes. Essa conquista destaca as capacidades avançadas do modelo em compreensão multimodal e seu potencial para integração com robótica.

O sucesso de um único modelo sublinha a dificuldade da tarefa. Embora muitos LLMs se destaquem em tarefas de linguagem, traduzir essa capacidade em ação física exige um nível mais profundo de compreensão. O desempenho do GPT-4o sugere que ele deu passos significativos nesta área, embora o fato de ter sido o único modelo a ter sucesso indique quão desafiador este domínio permanece.

Apenas 1 LLM pode pilotar um drone

A realidade nua dessa afirmação reflete o estado atual da IA na robótica. Embora o progresso esteja sendo feito, o caminho para agentes de IA autônomos generalizados no mundo físico ainda está em seus estágios iniciais.

Implicações para a IA

Os resultados do SnapBench têm implicações significativas para o futuro da IA robótica. Eles sugerem que simplesmente escalar modelos de linguagem pode não ser suficiente para resolver tarefas físicas complexas. Em vez disso, novas abordagens que integrem capacidades visuais, espaciais e de controle motor podem ser necessárias.

Essa descoberta é particularmente relevante para indústrias que exploram automação, de logística a defesa. A capacidade da IA de operar drones de forma confiável poderia transformar muitos setores, mas a tecnologia ainda não está madura o suficiente para implantação generalizada. O benchmark serve como um teste de realidade, moderando expectativas enquanto também fornece uma métrica clara para melhoria.

Áreas que exigirão foco incluem:

  • Raciocínio visual-espacial aprimorado
  • Integração de laços de feedback sensorial
  • Protocolos de segurança para autonomia física
  • Treinamento em cenários diversos do mundo real

O Caminho a Seguir

A conversa em torno do SnapBench e das capacidades de voo de drone faz parte de uma discussão maior sobre as limitações da IA. À medida que benchmarks como este se tornam mais comuns, os desenvolvedores terão melhores ferramentas para medir o progresso e identificar fraquezas. Este processo iterativo é crucial para avançar o campo.

Embora os resultados atuais possam parecer decepcionantes, eles fornecem uma linha de base valiosa. Os modelos futuros podem ser projetados com esses desafios específicos em mente, potencialmente levando a avanços em como a IA compreende e interage com o mundo físico. O sucesso do GPT-4o oferece um vislumbre do que é possível, enquanto o fracasso dos outros destaca o trabalho que ainda resta.

Principais Conclusões

O teste de drone do SnapBench revela que a tecnologia de IA atual tem um longo caminho a percorrer antes que possa lidar de forma confiável com tarefas físicas complexas. Apenas um modelo, o GPT-4o, conseguiu completar o desafio com sucesso, mostrando que a maioria dos LLMs carece da integração necessária de habilidades visuais e motoras.

Para a indústria de robótica, isso representa tanto um desafio quanto uma oportunidade. A lacuna clara de desempenho fornece direção para pesquisas e desenvolvimentos futuros. À medida que a IA continua a evoluir, benchmarks como o SnapBench serão essenciais para rastrear o progresso em direção a sistemas verdadeiramente autônomos.

Perguntas Frequentes

Qual é a principal descoberta do teste SnapBench?

A principal descoberta é que apenas um grande modelo de linguagem, GPT-4o, foi capaz de pilotar um drone com sucesso com base em instruções visuais. Todos os outros modelos testados falharam em completar a tarefa, revelando uma limitação importante na tecnologia de IA atual.

Por que isso é significativo para o desenvolvimento da IA?

Isso é significativo porque mostra que, embora os LLMs sejam bons em tarefas de linguagem, eles lutam com a integração complexa de dados visuais e execução física necessária para a robótica. Isso destaca uma área crítica onde a IA precisa de melhorias antes que possa ser usada de forma confiável em sistemas autônomos do mundo real.

O que isso significa para o futuro da IA na robótica?

Os resultados sugerem que novas abordagens são necessárias para preencher a lacuna entre o raciocínio da IA e a ação física. O desenvolvimento futuro provavelmente se concentrará em uma melhor integração do raciocínio visual-espacial e controle motor, usando benchmarks como o SnapBench para medir o progresso.

Continue scrolling for more

IA transforma a pesquisa e as provas matemáticas
Technology

IA transforma a pesquisa e as provas matemáticas

A inteligência artificial está se tornando uma realidade na matemática. Modelos de aprendizado de máquina agora geram teoremas originais, forçando uma reavaliação da pesquisa e do ensino.

Just now
4 min
410
Read Article
Europe must stop ‘dreaming’ about defence without US, Rutte warns
World_news

Europe must stop ‘dreaming’ about defence without US, Rutte warns

Nato chief says continent cannot afford to replace American security umbrella

30m
3 min
0
Read Article
Real_estate

Zoom's 'hidden gem' investment in Anthropic could be worth $2 billion to $4 billion, analysts say

Anthropic revealed that Zoom Ventures had invested in the AI startup in May 2023.

32m
3 min
0
Read Article
Billie Eilish Concert Doc Release Pushed to May; James Cameron Says ‘We’re Dialing in Cool, New 3D Tech’
Technology

Billie Eilish Concert Doc Release Pushed to May; James Cameron Says ‘We’re Dialing in Cool, New 3D Tech’

James Cameron revealed on Monday that the release of the Billie Eilish 3D concert documentary “Billie Eilish: Hit Me Hard and Soft,” which he co-directed with Eilish, has been pushed two more months to May 8 via Paramount. “We’re refining the cut; dialing in cool, new 3D tech; adding some special behind-the-scenes we know you’ll […]

44m
3 min
0
Read Article
Jensen Huang says it's 'ridiculous' to say Nvidia's $2 billion investment in CoreWeave is another circular deal
Technology

Jensen Huang says it's 'ridiculous' to say Nvidia's $2 billion investment in CoreWeave is another circular deal

Nvidia CEO Jensen Huang Markus Schreiber/AP Nvidia CEO Jensen Huang pushed back on criticism of the chipmaker's investment structures. Huang said its latest investment in CoreWeave was not a circular deal. Chipmakers' investments in leading tech companies, which are also customers, have raised worries about an AI bubble. Nvidia CEO Jensen Huang is done with the questions about circular financing. Huang called it "ridiculous" to suggest that Nvidia's latest deal, a $2 billion investment in CoreWeave, is the latest circular deal between AI chipmakers and tech companies, a trend that has sparked some concern among some investors. "These are generational companies — the investments that we make is confidence in them," Huang told Bloomberg News. "But it's a small percentage of the amount of money that they ultimately have to go raise, and so the idea that it is circular is — it's ridiculous." As part of the arrangement, Nvidia is expanding its previous investment in the cloud company by buying $2 billion worth of its shares. According to a joint statement, the money will assist CoreWeave's "procurement of land, power, and shell to build AI factories." The future AI factories will then be powered by Nvidia's chips. Huang portrayed the latest deal and past arrangements with the likes of OpenAI, Anthropic, and Elon Musk's xAI as just a small portion of what the companies need to raise to finance their massive AI expansion plans. For example, OpenAI is committed to spending roughly $1.4 trillion over the next eight years, largely on data centers. "Whatever we decide to invest is a small percentage, very small percentage of the overall amount of infrastructure, capital they're going to have to raise," Huang told CNBC in a separate interview. This is far from the first time Nvidia has bristled at concerns about its deals. In November, the world's largest company by market cap sent a letter to Wall Street analysts in response to investor Michael Burry of "The Big Short" fame, who has questioned whether Nvidia was on solid financial footing. "Nvidia's underlying business is economically sound, our reporting is complete and transparent, and we care about our reputation for integrity," the memo said. Burry has said he stands behind his analysis of the company, comparing it to one of Silicon Valley's giants before the Dotcom crash. "I am not claiming Nvidia is Enron," he wrote on his Substack. "It is clearly Cisco." Read the original article on Business Insider

46m
3 min
0
Read Article
watchOS 26.2.1 now available for Apple Watch, here’s what’s new
Technology

watchOS 26.2.1 now available for Apple Watch, here’s what’s new

Apple has just released watchOS 26.2.1, a new software update for Apple Watch users. Here’s what the update includes. more…

47m
3 min
0
Read Article
World_news

Google Books has been effectively killed by the last algorithm update

Article URL: https://old.reddit.com/r/google/comments/1qn1hk1/google_has_seemingly_entirely_removed_search/ Comments URL: https://news.ycombinator.com/item?id=46769201 Points: 3 # Comments: 0

49m
3 min
0
Read Article
How to generate AI images using ChatGPT
Technology

How to generate AI images using ChatGPT

Since March 2025, ChatGPT has been capable of generating images. Following a period where it briefly wasn't available to free users, you now don't even pay for one of OpenAI's subscriptions to use this feature. And while making images inside of ChatGPT is easy, there are some nuances worth explaining. For example, did you know you can ask ChatGPT to edit photos you've taken? It's more powerful than you might think. Here’s everything you need to know about generating AI images with ChatGPT. How to create images with ChatGPT using text prompts To begin making an image in ChatGPT, you can start by typing in the prompt bar. Igor Bonifacic for Engadget You can start generating images in ChatGPT simply by typing in the prompt bar what you want to see. There's no need to overthink things; as long as you have some version of "generate an image" followed by a description of your idea, ChatGPT will do the rest. Depending on the complexity of the prompt and whether you pay for ChatGPT, it may take a minute or two for the chatbot to complete your image request. Sometimes the process can take longer if OpenAI's servers are experiencing greater traffic than usual. At the end of last year, OpenAI updated the model powering image generation to make it faster, as well as better at rendering text and following instructions. At the same time, it added a dedicated "Images" section to ChatGPT's sidebar. Here you can see all the images you've made, alongside sample prompts and suggestions for styles to try out, making it a great place to start if you've never used an image generator before. How to create images with ChatGPT using existing photos You can also upload images to ChatGPT. Igor Bonifacic for Engadget In addition to generating images from text prompts, ChatGPT can modify existing photos or images you upload. This is my preferred way of making images with ChatGPT; I don't need to describe the composition, I can use an existing one to guide the chatbot. To use an existing image as a starting point for a new generation, follow these steps: Tap the "+" icon, located to the left of the prompt bar. Select Add photos & files. Select the image you want ChatGPT to edit. If uploading an image from your phone, you'll first need to grant ChatGPT access to your camera roll. Write a prompt describing the changes you want. If generating from the Images section, tap "Add photos" instead. Keep in mind any photos you upload to OpenAI's servers may be used by the company to train future models. You can opt out of allowing your data to be used for training by following these steps: Open the sidebar menu. On mobile, tap the two lines on the top left of the interface; on desktop, click instead on the OpenAI logo. Tap your name to access account settings. Tap Data controls. Toggle off Improve the model for everyone. How to edit the images ChatGPT generates ChatGPT gives you a few different ways to edit images. Igor Bonifacic for Engadget If you're unhappy with ChatGPT's output, you have two options. You can either prompt it to create an entirely new image, or edit parts of the picture it just generated. As always, the process for both involves simply typing what you want in the prompt bar. On mobile, OpenAI gives users a few different ways of accomplishing the same task. To generate an entirely new image: Tap the three dots icon below the image ChatGPT created. Select Retry. To edit part of an existing image generation: Tap the image ChatGPT created. Tap Select area. Use your finger to mask the section of the image you want ChatGPT to tweak. The slider on the left allows you to adjust the size of the masking brush. On desktop, masking is also available if you click on an image and then click on the paintbrush icon on the top right. Describe what you want ChatGPT to add, remove or replace through the prompt bar. ChatGPT can also blend one of your photos with an image it has generated. To do this: Tap an image ChatGPT created. Tap Blend in a photo. Upload the photo you wish Like all AI systems, ChatGPT is non-deterministic, meaning even if you prompt it in the same way multiple times, it won't generate the exact same response each time. Tips to create better images with ChatGPT The best advice I can offer is to be specific when prompting ChatGPT. The more detail you can provide when describing what you want from it, the better the results. And remember: ChatGPT can hallucinate — as you may have noticed from one of the example pictures I included above. In the image of the tortoiseshell cat, not only is the tortie not sitting on the window sill as instructed, it's sitting on a table that doesn't make much sense. So, most of all, be patient. Prompting an AI model is not exact science, and it can take a few tries before it creates the result you want. FAQs How do you access ChatGPT? ChatGPT is available on the web, desktop and mobile. To access it on your computer, open your preferred browser and navigate to chatgpt.com. OpenAI also offers dedicated Mac and Windows apps you can download from the company's website. On iOS and Android, you'll need to download the ChatGPT app from either the App Store or Google Play before you can start using the chatbot. Since ChatGPT runs on OpenAI's servers, as long as you can access the chatbot, you'll be able to use it to create images no matter the age of your phone or computer. Can ChatGPT generate images for free? Yes, ChatGPT can generate images for free, as long as you create an OpenAI account. However, there is a daily rate cap and GPT-5 will take longer to make a free image. Following March 27, 2025, OpenAI briefly limited free users to three image generations per day. The company has since relaxed that restriction, though it doesn't list a specific limit on its website. In my experience, you'll be able to generate about six to seven images every 24 hours. OpenAI offers three different subscription plans, each with their own set of image generation perks. ChatGPT Go, which costs $8 per month, offers "more image creation." ChatGPT Plus, which costs $20 per month, offers "expanded and faster image creation." ChatGPT Pro, which costs $200 per month, offers "unlimited and faster image creation." Note: ChatGPT Go will be included in OpenAI's forthcoming ads pilot, which will see the company display sponsored content alongside organic responses from ChatGPT. The company does not plan to display ads to Plus and Pro users. Can ChatGPT generate an existing photo? No. For copyright reasons, ChatGPT can't replicate photos or exact real world events. For example, when I asked it to recreate the photo of Zinedine Zidane's iconic 2006 World Cup headbutt, ChatGPT refused. "I can make an artistic reinterpretation inspired by the emotion or energy of that moment — for example, a stylized painting showing the tension and intensity of competition, without depicting real individuals," it told me. This article originally appeared on Engadget at https://www.engadget.com/ai/how-to-generate-ai-images-using-chatgpt-120000560.html?src=rss

50m
3 min
0
Read Article
World_news

House of Lords Votes to Ban UK Children from Using Internet VPNs

Article URL: https://www.ispreview.co.uk/index.php/2026/01/house-of-lords-votes-to-ban-uk-children-from-using-internet-vpns.html Comments URL: https://news.ycombinator.com/item?id=46769131 Points: 5 # Comments: 1

54m
3 min
0
Read Article
MCP unites Claude chat with apps like Slack, Figma, and Canva
Technology

MCP unites Claude chat with apps like Slack, Figma, and Canva

Anthropic's Claude got a bit livelier today thanks to a new extension to MCP, the open-source protocol that allows AI agents to easily access tools and data across the internet. Users will now be able to interact with apps directly inside the Claude chatbot, letting you draft and format Slack messages to colleagues and create presentations for clients in Canva without having to switch tabs. As of today, Anthropic said tools like Asana, Figma, Slack, and Canva will "open as interactive apps right inside of chat." While users could previously connect tools like Slack and Asana to the AI assistant, doing so meant getting text back. The company … Read the full story at The Verge.

55m
3 min
0
Read Article
🎉

You're all caught up!

Check back later for more stories

Voltar ao inicio