OpenAI recently announced the release of Operator — an AI agent that promises to help us do much more using AI, with even greater ease. They first made AI mainstream by introducing ChatGPT to the world, helping people realize AI’s incredible potential. Over the last few years, we have seen a massive increase in the availability and capability of AI tools. Even if we’re not using it directly, the technologies we interact with are increasingly integrating AI to deliver better experiences. AI agents represent the next leap forward in how we use technology and what we use it for. They promise to transform how we accomplish our goals by integrating helpful, intelligent assistants into our lives.
What Are AI Agents?
AI agents are software systems designed to execute tasks autonomously.
AI agents can make decisions, solve problems, and take action to achieve predefined goals on behalf of users. They can understand user inputs (instructions, goals, requirements, etc.). They can plan and execute the necessary tasks, interacting with external systems as needed. These agents can complete a broad range of tasks. As the underlying algorithms grow more powerful, these agents will be able to do much more. Therefore, the definition of what an agent is will continue to evolve.
Examples: OpenAI Operator, GitHub Copilot, Klarna AI Assistant, Intercom Fin
How Do AI Agents Work?
AI agents execute tasks by mimicking how people complete tasks.
AI Agents can "think" about how to best complete tasks. Here’s a general overview of how they do this:
Perception: They collect information from their environment, which can include user inputs, user/system interactions, preset triggers, sensor data, etc.
Reasoning: They interpret context (goals, intent, constraints, etc.) and determine how to complete the task, planning and evaluating the necessary actions.
Action: They execute the steps needed to accomplish the user’s overall goal based on their understanding of what the user wants to do.
AI agents are trained on datasets associated with specific tasks. Similar to how people identify the best way to do things through trial and error, they “learn” how to complete tasks from numerous examples (what steps were taken, what was the result, etc.). Eventually, these agents develop enough proficiency to complete these tasks just as well (if not better) than real people. Furthermore, they can also update their plans in real-time as things change, and adapt with minimal human intervention.
Memory helps AI agents continuously enhance their performance. They can improve task execution by learning from past knowledge, conversations, challenges, etc. They can recall specific user details, preferences, and requirements, and use that information to inform their actions. They can also ask for user clarification and/or confirmation to ensure their actions are aligned with what the user wants. Furthermore, when they combine user feedback with memory, they can improve their reasoning and complete tasks with greater accuracy and efficiency.
AI Agents Vs. AI Tools
Agents and tools differ in what and how much they can accomplish.
AI tools are task-oriented, excelling at performing specific, predefined functions. When someone uses an AI tool, they are likely to use it for a specific part of their workflow. For example, if you are solving a customer problem, you might use AI tools for several connected tasks (draft emails, summarize their feedback, etc.). However, you still need to think about how, when, and where to use tools to achieve your overall goals of helping the customer. Furthermore, you have to provide specific instructions to use each tool effectively. The quality of prompts entered can greatly influence their outputs.
In contrast, AI agents AI agents are goal-oriented, meaning they work toward achieving a broader objective. They can independently complete the series of tasks associated with what the user is trying to accomplish. They can make decisions and take action without constant human intervention. For example, AI agents might communicate with your customers, understand their problems, and interact with your systems, etc. They can manage a much bigger part of your workflow than individual AI tools.
The desire for agents has been clear since our first interaction with AI tools like ChatGPT. People have been building their own “agents” for a while now by linking AI tools and applications together using automation platforms like Zapier. However, the emergence of AI agents means that we can now accomplish much more using AI. The scope and complexity of the tasks AI agents can handle on our behalf are much greater than what individual AI tools can do. They are truly helpful, intelligent assistants that enhance what we can accomplish and how wow accomplish our goals.
While an AI tool might help you plan your travel itinerary, you will still need to book flights and hotels yourself. However, an AI Agent can create an itinerary and make the necessary bookings for you.
You Might Also Like: Exploring Strategy For AI Products
OpenAI’s Operator
OpenAI’s Operator AI Agent uses advanced AI capabilities to transform workflows.
OpenAI’s Operator Announcement
Last Thursday, OpenAI announced its research preview of Operator, a general-purpose AI agent that can interact with web browsers to autonomously accomplish tasks. Similar to ChatGPT, you enter a prompt and an operator will try to execute the task to the best of its capability. Operator is powered by a Computer-Using Agent model, or CUA, allowing the agent to interact with a browser to complete a task in the same way a person would. It interacts with the front end of websites, without requiring API access. It can “see” the screen, control the keyboard and the mouse, search the internet, navigate web pages, click buttons, enter information, select options, etc.
Operator provides a summarized chain of thought and a screen recording to show the users how it completed the task. It shows buttons clicked, information entered, options selected, etc. This provides insight into its process. It also asks questions and clarifications as necessary to ensure the task is completed as desired. It asks for user input for critical steps (payment, confirmation, etc.) to ensure the task is completed properly. At any point in time, the user can take control or give operator instructions. When the user takes over, the browser experience acts much like a normal web browser.
Let’s say you ask an operator to make a restaurant reservation. It will open a browser window and navigate to a booking site. If you want a specific time, it will check if that time is available. If the time is not available, it will present alternatives. Finally, it will ask for your confirmation, complete the reservation process, and provide the final reservation details.
Conclusion
AI agents will fundamentally change our relationship with technology. As these agents become more sophisticated, they will not only enhance productivity but also redefine how we approach workflows. The challenge for product teams now is delivering experiences that help users achieve their bigger goals while meeting new, evolving expectations about what products should be able to do. The teams that successfully address the broader ecosystem of users’ needs around their goals will become incredibly valuable. As AI agents take way more tedious, repetitive work, people will be able to focus on higher-value work that is inherently human.
Thanks For Reading
If you haven’t already, please consider subscribing and sharing this newsletter with a friend.
I hope you have a great week!