I watch a lot of YouTube – so you don’t have to. In this series I’ll bring you some of the best of what I’m watching, and sometimes, as in this post, I’ll provide my own commentary.
YouTube is buzzing this week. Sam Altman is rumoured to be making a big announcement sometime soon. He may do it before I even get to publish this. As the video I included in the header indicates – the big rumour is that “everything is about to change.” Will it be Artificial General Intelligence (AGI) – the autonomous thinking and operation of AI?
The most likely answer is no. AGI will come sooner than you think. I’ve heard estimates of the end of this decade, and they are very believable.
But long before AGI hits, there is another a transformation coming – AI agents.
You might be wondering, “don’t we already HAVE AI agents?”
But what we have now are single purpose, pre-programmed agents. While programmed with natural language, these agents are designed for specialized, narrow tasks. They operate independently and do not collaborate with other agents.
Again, you might say, I can now call multiple GPT’s in a single conversation. You would be right. However, integrating two or more of these agents in a single conversation poses some real challenges.
The fact that you can call two agents doesn’t mean they will work together. Unless you specifically designed them to work together, the results will be, at best suboptimal and at worst, they will be unable to work together at all. Add a third agent to the picture and the complexity grows.
To make these independent agents truly collaborate takes a lot of work and a very structured approach. To get a reliable result, you need a high degree of coordination both to direct traffic to the right agent and to manage their collaboration.
In this world of AI generated low and no code solutions, this may not need programmers, it does require a true disciplined designer mindset. But it does take discipline, design and time to develop, implement, test and manage. Agents of any kind require a controller or application to manage their operations. Building that is a task that the average person would find daunting or even impossible.
But what if there was a new type of controller? What if instead of being programmed, it could learn how to manage agents, programs and even physical devices the same way we humans do? That’s the next generation of agents – autonomous, intelligent and able to learn, adapt and collaborate.
It’s already here. Take a look at this next video.
The Rabbit r1 was a big hit at the CES show this year. Affordably priced, it provides direct access to an AI engine. However, the emphasis on the device might have overshadowed the real breakthrough introduced by Rabbit r1.
In the video, CEO Jesse Lyu teaches the AI to book a vacation, but he’s not visiting the web-sites, nor is he clicking on any web pages. The “controller” that oversees the interaction of agents and websites is doing all the work behind the scenes.
But the really interesting part is that it’s not just executing – it’s learning from this interaction how to do this on its own. After its first training, he is able to just say, “plan a trip” and it does the research and comes back with a well thought out plan for a full vacation experience with travel, lodging, meals and even where to visit.
There are systems that approximate this, but none that learn and execute autonomously from a single natural level voice command. Lyu claims that he has moved beyond a Large Language Model, to a new concept, the Large Action Model.
This Large Action Model (LAM) has no need for interfaces or APIs. There is no need for standalone or web apps. The LAM operates without it’s own computer or browser on its own. Once trained on a similar task, the AI system (not the device) acts on its own, without intervention, taking all the necessary actions to achieve a specific request outcome. Once it’s learned that task, it can adapt it and even modify it.
It’s an AI that goes beyond simple single purpose agents. It can coordinate, collaborate and take multiple complex actions without human intervention.
Where does this lead us? It leads us to a massive disruption. Why? Let’s look at the impact on three areas that are the foundation of our current app and device centric world.
Devices – Our lives are intertwined with gadgets such as phones, tablets, and laptops. These devices are tools required to translate human ideas into actions. They house the specialty apps and programs that we need to use. The human is the intelligence. We are the controller that knows the overall task, selects the tools, provides the input and assesses the results.
Once you break that cycle and the AI or operating system can learn to do all this, why do you even need a phone or even a laptop?
By itself this represents a major disruption. But let’s go a level deeper.
Apps – Apple, Google, Samsung and others sell us computers and phones which are really just containers that hold your apps. Most apps perform a pretty simple purpose. Much of their complexity, and their success or failure, goes into how they interface with the device and the person who plans and executes the actions. In a Large Action Model, all of this happens without the need for any specialized apps.
Earlier we showed you the Rabbit r1. It’s not the only device that has hit the market. This one is called Humane and it also eliminates the need for phone or laptop, this time with the elegant design we are used to from Apple, which is no coincidence, since the founders came from Apple.
Whether you go for the small orange handheld or elegance of a wearable pin, it becomes apparent that the apps and devices are ripe for disruption. But finally, these devices, assisted by generative AI and autonomous agents have another aspect – the will collect an incredible amount of data.
Data – If you think Facebook knows a lot about you, just think about what this new super-agent will know about you. If it books your travel, does your purchases, researches information for you – it will know EVERYTHING.
There was a scandal a few years ago when a company called Cambridge Analytics collected Facebook data and claimed they could use likes and dislikes to predict everything from your voting preferences to your sexual preferences. That’s going to seem “so 2016” when you consider that these LAMs or their agents no longer have to predict – it will know what you are going to do. OpenAI this week hinted at this when they gave chat GPT a “memory” to allow it to learn your individual preferences. The LAM will know your preferences and your actions.
That creates a huge issue in terms of privacy. Where will that be stored? Who will “own” it and control it? It also is a threat to the social media, search and even retail giants whose business models are built on their data profiles.
Could Metcalfe’s law be broken?
All of our lives since the turn of the century have been, if not controlled, at least largely managed by a handful of mega-companies. These companies grew up because of Moore’s law. Technology got more powerful and cheaper every year. The phone we have today is exponentially more powerful than the mainframes of the last century and anyone can afford it. That made the digital revolution, the internet, e-commerce and social media possible.
Now they hold onto their dominance because of another law – Metcalfe’s law.
For those not familiar with it, Metcalfe’s law, simply stated, says that the value of a network is the square of its nodes. Or to put it simply, once you get a critical mass of users in any platform, it becomes really hard for anyone else to compete.
No matter how we feel about LinkedIn or Facebook, sharing information with business contacts or friends means remaining on these platforms. Once a network has a critical mass, it’s hard – and perhaps almost impossible to leave it.
How strong is Metcalfe’s law? It’s so strong that even Elon Musk has not been able to totally kill X/Twitter. Millions have left for other platforms, but even so, few have actually deleted their Twitter account and totally moved on.
Despite new startups trying to supplant Twitter, as of yet, no-one has. The value of their network is still dwarfed by Twitter X/Twitter may lose enough money to destroy itself, but it hasn’t lost enough people. That’s the power of Metcalfe’s law.
These companies dominate based on their critical mass and they’ve divided us up in terms of personas and uses and sometimes our digital tribes. Meta/Facebook owns your personal networking. LinkedIn has your corporate netwworking. Google owns your web interactions.
It even extends to products. Microsoft owns your desktop. Apple and Windows “tribes” divide up the laptop world. While the actual division is by operating system, iOS or Android, the reality is that in terms of market share, Apple and Samsung are where your ‘tribe” gets their phones.
There are mavericks, even some niche players, but in the bigger scheme of things they are irrelevant. In a world dominated by Metcalfe’s law, the big get bigger, whether they deserve it or not.
Each of the companies I mentioned have a bigger value than the Hong Kong stock exchange or the GDP of many nations. That gives them more than network power. It gives them incredible wealth.
That wealth, in combination with critical mass, allows them to make stupid mistakes and still recover. They can fail to innovate. They still have massive size and economic power to buy anyone whose innovation threatens their dominance.
Microsoft dropped a cool 10 billion or more to have access to OpenAI’s ChatGPT. It’s not the first time Microsoft has bought an innovator. Apple has not made any great moves in the AI world, but it has quietly been buying a large number of AI startups.
This power makes it almost impossible to dislodge these giants by any normal competitive approach. It would take a seismic shift to disrupt them.
These disruptions do occur and some think that the move to autonomous agent networks or LAMs may be just such an event. If the things that made them dominant are no logner valued the big players are vulnerable.
Today they supply our access to the digital world with devices and apps. Today, they own our data. But what if we all “owned” our own data. What if our access to the global network was device and “app” independent? How do these companies protect their monopoly status?
One might think that patents and intellectual property rights could be the key. In the world where the cost of AI development is prohibitive for smaller players, only the biggest players can afford to develop their own models. But the courts have ruled against Meta using copyright to protect its AI model, stating that AI can’t be protected because it’s derived from other information. And when you have models that learn from your interactions, even the vast amount of training data and resources needed for a model may no longer be as relevant. The protections that might have worked with devices or even standalone apps may no longer be useful.
The potential for disruption is clear. But none of thes companies is going to go down without a fight. They still have enormous wealth and command enormous distribution markets. Equally, however, governments, particularly the European Union are also chipping away at the barriers that these companies can use to retain their market position.
Disruption is no longer coming. It is on the doorstep. Watch for the strategies they will use to retain control.
But no matter who wins or loses, the move from Large Language Models to Large Action Models and autonomous agents changes everything. It even changes how we see the goal of Artificial General Intelligence (AGI).
Geoffrey Hinton, the godfather of modern AI recently remarked we may be mistaken in what we think AGI is. When it occurs, AGI may achieve the same results as human intelligence, but without following the same processes and structures as human intelligence.
For example, we seem to require that AGI have some form of consciousness like ours. When AI today creates fictional answers, we say it is hallucinating. When it has behaviours where it avoids a negative outcome by answering one way in training and another in practice, we say it is deceiving us. But isn’t this just projection on our part?
Even those who design the models don’t really know how they work at maximum size and complexity. Which raises the question, if a model can act indepently, learn and pursue objectives, does it need consciousness or self-awareness?
Or to put it in a more simple metaphor – if it walks like a duck and talks like a duck, does it have to be aware it is a duck?
Given what autonomous agents or LAMs can do, when we get to AGI may not be the question. The real question may be, what will be the impact of the transformation and disruption we face as we get there.
Disruption is coming more quickly that we might want
YouTube videos can be sensationalistic. But in the midst of the hype, and even though these videos are demos there is clearly something big happening.
Buckle up. Here comes the future.