102. Autonomy AI with Adir Ben-Yehuda

Ben Byford:[00:00:04]

Hi, and welcome to episode 102 of the Machine Ethics podcast. This episode, we're chatting with Adier Ben Yehuda. This was recorded on the 16th of July, 2025. Adier and I talk about his company, Autonomy AI, AI automation for front-end web development, English as the new coding language, allowing an LLM to optimise itself, conversations around job displacement and my pessimism, and his optimism, vibe coding, recent events like Grok–self-identifying as Mecca Hitler, the ethics and guardrails of LLMs, and purposely trying to break a system, as well as being a plumber.

If you like this episode, you can find more at machine-ethics. Net. You can also contact us at hello@machine-ethics.net. You can follow us on Blue sky, machine-ethics.net. Instagram, Machine Ethics podcast, YouTube, @Machine-ethics. And if you can, you can support us on Patreon, patreon.com/machineethics. Thank you so much for listening, and hope you enjoy.

Hi. Welcome to the podcast. Tell our listeners who you are and what do you do.

Adir Ben-Yehuda:[00:01:19]

Yeah. So first of all, it's great to be here. I'm Adier Ben Yehuda, the co-founder and CEO of Autonomy AI. And we build an autonomous R&D platform with the focus on autonomous front-end. We help companies essentially accelerate their software development process.

Awesome. I always hit this first intro and go, oh, there's loads of questions, just from that. But the first question we always ask on the podcast to our interviewees is, we have this word AI. What is that thing to you? What are we talking about AI?

I think where AI is, at least the way I see it, the way I perceive it is the ability for human beings to actually communicate with machines right, with what we...today. If up to date, let's say a couple of years ago, people actually communicated with machines in a way, I asked you for a test and you provide some feedback, but it's not back and forth. It's not a dialogue. AI opened it to make it a smarter dialogue and have the ability for us, humans, essentially to ask for the machine to actually get back to us, but also challenge in a few things. If you look at a broader aspect of things, it's just an acceleration. It's basically taking all the bored things we hate, like the annoying things we hate, and help us remove this with AI, do the back and forth there, and Excel and accelerate in different aspects.

Awesome. So it's really interesting that you pick up on this way we interface with computers, right? I've got this almost like a dream, right, that in the future, we don't have keyboards, right? We're not tapping away at keyboards on the screen. We might be chatting. And I don't know if you saw the rabbit, was it last year? Or two years ago, that completely tanked. But the idea around that was like, you have this thing and you talk to it and it takes pictures, but you don't... And that's your interface, right? Is that part of that vision, do you think, that we go head towards this more free flowing audio being vocal and interfacing that way or in a different way completely?

I think absolutely. And also, if you think about it, right? Let's think about a couple of things here. If you look at the initial products in AI, the naming was Copilot. Copilot, right? I think it's all in the name, first of all. That's the first thing. If you, in the listeners here, probably have all tried to talk with ChatGPT, it's It's a lovely conversation. I mean, all the nuances, right? And the little giggles. So first of all, it's like it creates a better connection. I think when you look at the next steps and you look in the future, yes, I think the interface is going to be like the rabbit, like this dash on your clothing, right? You can actually engage with. There is a reason why Sam Altman and OpenAI essentially purchased, I don't know, acquired Jony Ive's company, like an acquirer, whatever. I don't know what accountants are inventing nowadays, but there is some structure, financial structure with Jony Ive because that's where the world is going clearly.

And I think like, Ben, don't ask us. Look at the younger folks. Look at the 12-year-olds and see how they engage with this. They have a different engagement with computers. And last point on this recent study from Google. You can look at this, by the way, it's a publicly traded company, so you can see there are papers. Search is down for Google, but voice search is up. Voice search via all those smart speakers and the phone is up. It's up mainly in the younger demographics, meaning the engagement is becoming different.

That's really interesting. I feel like we're going to completely go on this tangent. I'm going to briefly loop back a sec. So you mentioned, obviously, your company, and you're concentrating on front-end at the moment. If you could just tell us a little bit about what that offer is, and then maybe it sounds like you're alluding to that you move into other verticals as part of that in the future. Is that your vision?

Yeah. So where we operate, let's talk about it for a second, the space we're operating, right? So the space we're operating is working with software companies that has production code, that has an offering out there from that aspect. Essentially, if you look at today, the process of creating software, you have engineers in the middle. You probably have designers, product people, and then engineers at the end that are developing the software. And engineers essentially are utilising different AI tools. They're using copilots and cursor and a bunch of other things to accelerate their performance. What autonomy AI essentially build is we build a mechanism that can mimic or learn the entire code base of an organisation within a couple of minutes. Essentially on top of this, we have an agentic workflow that mimics an engineer. So just give you a little bit of a background where we operate. And the first step we launch is, yes, is autonomous front-end. So taking all the front-end work that the company has and provide and generate the outcome very easily just with natural language, English. Today, we support them in English. There are other languages there, but English is the main input language right now.

It's unfortunate that most coding languages is English, right?

Yeah. But where we're trying to take this and where we're starting to go is we're helping companies, and by companies, I mean, VPR and the CTO, CEOs of companies, to re-imagine the way they produce their product, right? So that's in a very broad level. So when you look at the software creation, you have, roughly speaking, too many components to it, right?. You have essentially the engine, the back-end, where all the logic is being built. If you're a cybersecurity product or you're an analytics product, that's where you do all the big calculations and the IP, and then the front-end, essentially everything you touch within the product. It's the manifestation, if you will, of the logic of the product itself. And that's the UI, the user interface, where people are clicking. So autonomy operates in the space of, 'Hey, we help you as an organisation to accelerate and essentially just outsource this entire front-end creation to AI based on agents'. We deliver and mimic a front-end engineer in the process itself. So your listeners probably heard about tools like Bolt.New or Lovable and a bunch of others. Essentially, those tools are great to to do something if you're an individual or you want to build a prototype.

You want to go ahead and raise money, you have a great new idea, you go ahead and use one of those tools and you have a mock or a prototype for that matter. The problem, those platforms are not scalable in a company environment, in a production-grade code, quote unquote. When you have thousands of users, when you have thousands of lines of code already, and that's what autonomy, unique selling point. We build a product that can actually know how to operate this environment. We understand how the organisation built the code and we engage with it and we deliver really good results on top of it.

I feel like you might have a preference, like when you're actually working with your service, does it have a preference towards a certain technology stack? Does it work better and ReactJS less good in something? You know what I mean? Does it do any of that or has it not really been a problem?

I think it's a great question. So let's think about the way we build the technology. As I mentioned, we learn the code base itself, right? So we mimic a human being. A human being can't possibly know all the code languages, and we want to be very accurate. We work mainly on React, React native and Vue. There is a front-end framework called Angular, which is the old-school one. Most of the projects we're doing actually are people are trying to move away from Angular to the more modern, I should say, frameworks, and they're utilising us to accelerate this process. But that's what guiding us. We want to learn in-depth what possibly can be done or what the code look like and then engage with it.

Yeah, those people were probably incumbent. They made their stuff 10 years ago or something like that, and the Angular was the thing, right? Yeah. I was in it. I was deep in the trenches at that point.

We can't be young forever.

Yeah, well, not currently. I guess I haven't had anyone on the podcast so far who's had invested time and effort and money into longevity stuff. But maybe if you're that person, let me know. So you're starting at the front end. Why the front end? And then what's going to to change to do more back-end stuff? Because it feels like if someone's listening to this and going code, code, right? Maybe. Or you just have this specific sector that you're focused on, and then you can broaden out from that? That's more of a strategic plan rather than a technology plan, as it were.

It's a great question. Let's again reiterate, we work in an organisational capacity. We work with companies, right? Let's talk about the pragmatic aspect of it. The pragmatic aspect is roughly, if you talk with most companies, 60 to 70% of the work is in the front-end, most of the QA is being done there. That's the issue. Someone wants to find it. It's a necessary evil. If CEO, CTOs of company could build a product, they want to build a core, they want to build the logic behind it, and then the front-end is the front-end, whatever. I know there are a lot of front of the engineers out there It's like it's an art for them. No, no. And I totally get it. But when you think about it from an organisational capacity, when you want to build a business very fast, that's something like most businesses wouldn't focus on that, at least at the beginning. It's a Maslow pyramid of needs. You want to get your product out of it first. So for us, it was like it answered the question of sense of urgency. This is something we look at. This is something we need. This is the first thing we're going to do.

The second thing, which is a very pragmatic reason as well, it was easier for us from a security perspective to penetrate the company. When we came in the beginning, we said, Hey, we want to look at the code base here and build a semantic representation of it. People told us, before we called the police, Get out of here. And we're like, Oh, okay. We hear you. We also understand what we said. When you look at front-end, it was easier for us to pass a security reviews and to penetrate companies. And the third part, which is pragmatic, but it's also serve the bigger purpose of this. When we start building the front-end, we start learning about the front-end, we have more ability to look at the code from a different perspective, start analysing and put different tracking pixels on the code itself. And then it's going to be easier for us to move downstream to the QA part of things and then move downstream to the logic and the back-end as well and build a full application. The overall vision of what we're imagining as a company is, yes, there's this roughly two buckets of front-end and back-end where you build the application.

There are users for the product, and those users are engaging with the product, and they see different drop points, and they see different feedback loops. We want to build the entire feedback loop completely. When a user is engaging with the product itself, and we're talking about masses, we're talking about 10,000 users see a drop point in a specific place, we have in an autonomous way to rewrite the ticket and engage with the front end first and then change the logic in the back when we need it. That's the logic and how we imagine this. We're less than a year away from releasing those products to the market.

Yeah, I mean, that's really interesting. It gives me In my world, we have this idea that we test stuff, and this is like I do lots of games development as well as AI stuff at the moment. You test it with the audience in a similar the way that you're describing, right? And they give you answers back, which are that there are issues, but their solutions may not be the correct solutions. So it's interesting that you might find that people autonomously let your system flags that people are dropping off at certain points, but you can't get the right answers from the people, right? They are dropping off. So it's interesting that at that point, do you then need a person to step in and go, they could be dropping off for this reason, go and do this AI stuff and fix that for that reason? Or does the AI initiate that itself and make a decision why people might be dropping off?

The AI will initiate this itself, right? So the mechanism would learn and understand. I'll give you an example of what we already see in the product today. So we launched the product in April, like what? Three and a half months ago, growing rapidly. We have tens and tens of clients already working on the platform. And one of the behaviours we started seeing, and that wasn't intended at the beginning, right? It was like a side effect that we now productize. People are starting to ask us, Hey, I have this sidebar and we have a drop there? Like manually today, what should I do? How would to re-imagine the design of it. Basically, autonomy is generating the result. It started to take a very interesting turn when people are like, Okay, I need to build a sidebar that will do one, two, three, and then it's one, three KPIs. How would you build it? And autonomy builds, and contextually to their design system, to their look and feel, build the sidebar itself. And all of a sudden, that was the initial vision from the beginning, but it happened faster than we thought. We were like, Okay, people are starting to engage with it.

That's my comment in the beginning. And when I said that's what AI for me, the ability to actually converse with something, that's what I see on a day-to-day basis. Now, let's take it the capacity we just talked about. If you have tens of thousands of users that are engaging, there is no way for the human brain to digest this and understand what's going on. You might look at Mixpanel or different analytics tools of engagement with your product. That's great. But then you will have roughly 20 different conclusions, roughly 20, and you'll be biassed to a specific thing, and maybe you read something in the morning and you would think that's what happening to you. It happens to everyone. It's ligit. AI is very deterministic. It could come in and say, Hey, I'm very pragmatic, I'm very deterministic, and this is what's happening, and I'm looking at the different use cases and case studies and learn everything. And that's how you should fix it. The challenge, still the challenge is, would be how can you make it a little bit more creative? AI is deterministic, but not creative. The creative comes with how you guided the beginning. That's kind of like the full scope.

Yeah, yeah, yeah. Are you allowed to tell us how it works? Can we get into the weeds a bit about the technology that you're using? Because for me, the question is more around, are you using other people's models? Are you using your models? Are you training things? Are you using a mixture of things? How does all that work? Can you tell us as much as you-

Yeah, I'm happy to. There are two main parts to the product, two main engines. One engine is what we call ACE, an agentic context engine that runs on the code base that you probably know MCP. It's MCP on steroids, something we build in-house. That part is essentially LLM guided algorithms. We build an algorithm that can build a semantic representation of the code base, basically creates a mirror of the code base itself and analyse it, break it down to different vectors vectors, ways that we can digest in a better way. The LLM that is guided is it varies. Right now, it's Opus, but it varies like Anthropic Opus, but it varies between LLMs. Then the vectors are feeding essentially five agents that are mimicking frontend engineer. It's everything from the ability to actually read the code in a good way to create the UI all the way down to actually test it in real-time. Those agents are pretty cool. The reason they're pretty cool is because we build an infrastructure that we work on. Essentially, we looked at this specific project, this specific code base. Once we do the first couple of iterations, we learn what are the gaps, and the agent itself rewrite the prompt to actually optimise the end result.

So at the beginning, you get a result that is helping you save 70% of your workload in a ticket, at the 20th run, you'll be 90% time saved because we learn and we have an agentic workflow that runs. It runs basically three times, sometimes even more, but it looks at the input you brought in, and then we have the ability to actually analyse the output from a visual perspective and see, Okay, does it match the input? Does it match all the components that you wanted to have here? Yes, no, it doesn't. I'm going to do another loop and basically find different components or add different components. Then once we learn the behaviour of the project, we can actually rewrite the prompts from the LLM perspective. The other cool part of this is the agents self decide the LLM or the model. They have a lot of fun with it. We don't tell them what to do. It's very interesting because it's self-optimized. We had to build something generic. Think about it. Think about the code bases out there. It's like everyone writes in a different way, in a different perspective. Yeah, it's like you have React, you have different engines, Veet, and you use Mui, Material UI,...

Yes, they're all big buckets. But inside those big buckets, there are millions of iterations. It's crazy. How do you do it? How do you approach it? You need to have something that is alive and self-sustained. The agents understand the result and we build a very good tool to help them assess the quality of the result itself. Then the second part, they rewrite the LLM. But they also decide what LLM they're using. Right now, they like... It's funny that I talk about it that way now. I'm listening to myself right now. Right now, they like Gemini 2. 5. Grok four just came out. No bueno. You can read this on our website. We had a result. It did a very bad job in terms of what it generates. But yeah, that's the high-level mechanism of what we built inside.

Yeah. So are those agents, are those some of internally trained Llama variant? Or what are those agents?

So it's agentic workloads that we build internally, and we build the entire workflow for them. They train... The goal, and I think the goal and I think the key point were real a genetic workflow, not LLM wrap or really good workflow for LLMs. Really a genetic workflow. It's a self-sustained agent, and it's a self-creation. Everyone I'm here, maybe, I don't know if everyone of your listeners, but most of the people heard about CURSOR. The cool thing about CURSOR, and the real interesting thing about companies like CURSOR, is essentially CURSOR is building CURSOR. It's understanding the results, and it's optimised to Ben's needs. It's the same thing that we build in-house. Our agents, the UI agent, we call it the magician, the magician builds itself on your project. It learns the different gaps and adapts to the different results. To answer your point, yes, it's a model that trains itself, and we expand it, but we don't build the model at the moment. We just help him build tools.

Think about a kid that you have and it learns. It goes to first grade and then it gets more tools, so it learns how to write and read. And then in second grade, you're getting pretty better at math and basic calculus. And by the fifth grade, you can do more things and you can be self-sustained and learn other things without the teacher coming to you and saying, Hey, you should learn this. So this is the same way we imagine the product we're building.

Right, right, right. So they have lots of flexibility in what they produce as long as it's the fitness has to exist somewhere, right? So you have to go, this is good code, this is bad code, somewhere. So I'm assuming you have a load of code that you've scraped from somewhere. I don't know if we're getting into the weeds too much about this because it almost feels like... Maybe you can talk about that if you want to. But I'd also like to cover, because this is the Machine Ethics podcast, how do you feel about...Because some people could imagine that your product and lots of other products, let's be fair, are eroding certain aspects of the job market. So you are literally taking away work from front-end developers. And then there were other people who would say, well, we're accelerating the creation of front-end. We're enabling people to do more. And it might be that we get more front-end done, and therefore there's more work there in the aggregate. Do you have an idea of how your company and how you will maybe progress, will fit into this dialogue?

Yes, but I want to ask you a question. Does it matter? Because the way I look at this, our disposition is we want to help front-end engineers essentially focus on the core, the logic. That's our disposition. I think it doesn't matter what I say. And I think it doesn't matter what other people say that are coming from this field. At the end of the day, when you look at the capitalist market and when you look at someone who's building a company, the basic thing for every mediocre CEO or a company leader, if you can do more with less, that's something you will try to do, right? At the end of the day. So I think at the bottom line, what we are seeing, what we are intending to do is we want to help engineers essentially focus on logic and take all the busy work from them. Effectively, What's happening and what's going to happen, not just with autonomy, with other companies as well, is the ability for companies to say, Okay, I can actually save here, or I can actually move faster. I need less workforce. I need different workforce. I think Marc Benioff said it in the most American way possible. It's a good way to reallocate your workforce. And I'm like, Okay, that's interesting.

Where are you reallocating them to? That's the question, isn't it? It's a nice sentiment, but it's like, cool, are they doing other interesting, important things? What's going on here? Yeah. I think my position on this stuff is both stark and also hopeful, right? Because you could be hopeful that all this stuff brings in a renaissance of the ability to dream of new things and make them and hopefully create wealth and hopefully well-being in the world. And the other way of looking at it is our world is not ready for that, and it's all going to come crashing down really quickly. So I think I'm tettering on the edge of this thing.

No, no, I think it's a good point, but let's do a quick history lesson for a second, right? I think the other big revolution that happened was the industrial revolution. And if you think about the industrial revolution, what happened immediately after, short term was horrendous, but what happened immediately after, essentially, psychoanalysis came to life. Freud came to life. And all of a sudden, people had more time and the self became something that people could actually practise. And if you look a few years later, and the professions that came out after World War II, which was horrendous. Globally, it was an insane shake-up for the world. But all of a sudden people started looking at things differently, and all of a sudden, children became children. That's when people started to treat children as a children's bracket. And humanity evolved from those things. Yes, it was a big, big impact. I'm with you. I think the short term, though, is going to be not great. I think a lot of things... Because that's what happens when humanity moves in big waves. Short term, it's not going to be great. But I think in the longer term, it's going to open up more resources.

And we're thinking about it from a very Western perspective. We're thinking about from people who sit in very advanced economies. But think about the abilities that could help people in Africa. All of a sudden, access to an Internet can help them actually build companies. It could go global. I actually talked with a few entrepreneurs from Nigeria and from Kenya, and they started thinking about those things. It was very nice to see and great to I see this because it's essentially democratising a lot of things. So yes, short term companies like Autonomy, yes, I would say this, but even companies like Cursor, for example, which maybe take a step back. Cursor, for example, only senior engineers are being trusted to use it. When you look at the hiring, the job market for junior people, it's very hard.

A recent study show that if sometimes...Ivy League junior grads in computer science used to take months to find a job. Now it's a year and a half. It's even creating an impact right now. Yes, short term is going to create a big impact. But guess what? Those junior individuals will find different way to build companies. They will build companies in a faster way. I have a nine year old son. He works with different vibe coding tools and he builds websites right now. He's trying to figure out different things. I'm not helping him. I'm like, Man, he's starting to think about those things. It's going to be very interesting.

Yeah. I think, I think I'm going to leave you with your optimism and I won't burden you with my pessimism. So during this whole process, I feel like the whole agent nature, or agentic AI, or agent-based AI, or what is it? I think when we first came in, it was called... What was it called? It was AutoAGI. Yeah. So we have all these things, things move fast the last couple of years, of course. But do you have any words of wisdom for people who are interested in agenetic AI things and words of things that can go wrong?

So first of all, I'm not a big believer in AGI. I think people are like... It's funny how the voices that started to change in the last six months. It's like all of a sudden, AGI, you hear about it way less. It's Because all of a sudden, it's like what happened with Microsoft and OpenAI and what Microsoft had mentioned about the AGI. And I think what agents and agentic workflow is trying to do is not bring the AGI to it, but it's more focused task-based AI. And I think that's a different aspect of this. So what agents can do is essentially build more guardrails to the AI process and be more task-oriented. They can be self-evolved and they can be proactive in terms of those things, meaning think about things before you think about them, but in their own guardrails, which is the cool part. And the way agent works is you give them essentially a point that you need to go, a tent pole that they need to reach, and that's what they do. They can self-evolve, but in this spectrum. I like the rumours about the AGI when it's off the rails and self-learning and all those things.

I think you can hear it more and more that people are actually not hear it. People are not talking about AGI. I think super intelligence or safe super intelligence like Ilyas Satzkover, ex-OpenAI, that's what people are starting to talk about more, more guardrails to AI. But the guardrails to AI, and that's something that it's a very clear nuance, are guardrails around tasks. Within the task, you can get more evolution, right? But it's not going to be like willy-nilly, I can do whatever I want, connect to a random computer and learn from it.

Right, right, right. So you give them quite contained ability to do stuff. You might have access to so much resource. You can only do this so much, for example. You could only access certain websites, maybe. You can only execute so much as a certain code and things like that. Because I mean, one of the things that agentic AI gives you is the ability to like, it has a result and it can run that code and it can then or like have access to all these different APIs and different bits that you give it access to, and then it can work within those systems. So is there an opportunity to make the wrong call on that and make the wrong task? Then do we get paper clip maximises at that point? Or is that so far away from where we are currently?

So I think we need to talk about it from two aspects, right? We need to talk about it from two aspects. We need to talk about it from the aspect of safety and how safe it is in terms of the hallucinations we're generating. I think, funny, everyone probably heard about the story about Grok, for example. So there are different tests. By the way, you heard the way they wanted to mitigate I think it's crazy. I think if you ask Grok now controversial questions, it will refer to what Elon Musk had to say in media, in different social media...

Maybe it's better. Let's go back to the Grok who's answering like a neo-nazi answers. I'm not sure what's better. But that's the way they try to deal with it. So there's a safety aspect and then there's a quality The safety aspect, I feel like, is more relevant to what consumers are consuming, the way you engage with it. I think there it's up to the... It's not a matter of guardrails or sorry, it's a matter of guardrails. It's not a matter of AGI and could be self-taught. Essentially, the model is a deterministic model.

What you bring in, it's the output you will get in a very simple, mindful way to describe it. The context you will provide is the context you will get, and companies can put guardrails to it. The guardrails that companies like LLM providers, like Anthropic or OpenAI, will put there, won't hurt the evolution of the model. They will just put guardrails. So I think it's a matter of choice.

From a B2B perspective, from a professional perspective, the agents, the agentic workflows are actually helping increase quality and build trust. So if you think about how an organisation is working today, If I go ahead and put code inside an LLM, I'll get a decent result, but it won't be contextual and high quality to what I need as a business. An agentic workflow build those guardrails and build the right context aspect. And if you do a good enough job in building the right context window to it, you will get a good result. So the training part from that aspect is, in both cases, our guardrails. Where do you put the guardrails and put the focus? And the North star for the agent to actually engage.

And for the consumer perspective is like, if you decide not to do those things, right? Grok didn't hallucinate. Grok just didn't have guardrails around those topics. It's very simple, right? So I think that's a key aspect to it.

Yeah, yeah. But I mean, it's fascinating that if people didn't see that Grok episode, it started calling itself a mega Hitler (MechaHitler) or Something like that?

Yeah, something like that.

Yeah. Yeah, yeah, yeah. So in its responses to questions, it would say, as uber Hitler or something like that, I think this. And it's like, what is going on here? And it's obviously gone down a training hole of suspect training material, I would suggest, maybe. Yeah, and like you said, maybe the guardrails weren't quite not specific enough for that. But yeah, so I'm assuming you're not going to have that problem because you're obviously focused on code for a start. Do you think that you have to bring in something that you're your own guardrails in respect to that first element of using other people's models to scan that thing? Or was that you're not asking it to do anything too unusual? It's not going to be so much of an issue.

Yeah. I have to share something. I think if you put a man or a person with a machine, it's very interesting the conversations that are happening. So yes, we are generating code. We're enabling at the end of the code generation to ask for changes on the code. But the inputs sometimes we're getting there are not code-related and are very interesting, are very strange. It's not just faul language in general, but weird. I'm not sure if I can share something, right?

Maybe not quite the specific words.

No, but we had more than once someone asked to write the code in the same way as if I were on drugs. And I'm like, Why? Why would you...

And it It happened more than once and not with a specific individual. So to answer your question, I think, A: if you put someone with a machine in a confined space, some very weird things will come out. It surprised me. So to answer your question, yes, we need to put guardrails there. We need to tame the beast. And in that case, the beast is the person. It's not the machine.

Yeah, it's the beast is the whole system, including the person. Yeah.

Yeah. So I was like, Okay, so what can you actually ask? Because the LLM and the agent will answer specific things. And by the way, it did a pretty good job...

This is very strange. Okay. So that's the first guardrail, I think. In terms of the first aspect, we need to build a mechanism to be more guided and I say, that's an input we can deal with. That's an input we can actually digest for lack of better words. And that's an input like, no, we don't know what to do with it. Ask a different question. The user experience of what people are used to from AI today when AI, you converse with AI and AI is answering about everything is lacking, it's less better. But also asking those kinds of questions, I don't know why.

Yeah, yeah, yeah. No, it's really interesting. Because it's one of those things where people might not be aware. There's all these different places where you can alter things, but add things during the whole process. And I feel like going back to the Grok circumstance that they must have added things in the training process, which were things which they weren't picking up post-inference. So you have this training and then you have data, you put it into this model. And by put it in, I mean you do lots of processing with the data to move towards certain aspects of what the data represents. And then you can do post-training. But then also when someone asks a question, you can just say, no, you said this word and I'm not going to let you... It's not going to fly. I'm not even going to pass it on to the LLM. And then you get this post thing where the LLM said this thing and it's not great. I'm not going to let the user see this response. There's all these different places where you can do stuff. And what you're saying is also at your end, you're saying there's this other place where unusual things happen and we need to think about that.

I think people would be people. That's what people do. They want to engage with those things.

Push the envelope and try things out, don't they?

By the way, if you ever tried in deep seek to put in questions about Taiwan or Tian Man Square or things like this, you don't get an answer. You do, but it's like,.

Exactly.

Those things. But you heard a year ago, I think it was last summer or two summers ago, they put in Times Square in New York, where I live. They put a virtual gate to Dublin in Ireland. It was always on with two cameras in, and you could see people walking around in Dublin and you could see people are walking around in Times Square. They took it off after three days. Even in the centre of the world, as Americans call it in Times Square, it's like people did very interesting things. That's the same thing with LLMs, especially when you're in a private environment. I would say it's not that hard to resolve it. When you train a model, you can add ethics to the model itself and what you can do. If you go in and write... Because the information is out there. If you want to build a good model, the information is out there. It trains a lot of online data and a lot of online information in different forums. If you use Perplexity, it looks at Reddit, it looks at a bunch of other things. And then you need to put guardrails against this.

So if you ask the LLM, Hey, how can I build a bomb? It will probably say no because there are guardrails. So the ethics, it's actually a good point. It's like, what's the north? If agents has a north the star of what they can achieve in specific guardrail, I think LLMs, which is a way broader thing, should have conscience and ethics of what they want to talk about. And hey, if you're an individual and want to go and talk about things that are not specifically PG rated, I would say, to the least, or ethical, most humanity, you can use different LLMs that allow it.

Yes. I mean, you mentioned Deep Seek earlier. There was an article I saw that had, I think there was 50 ethical tests that they might run on a model which is well known. And it's things like that bomb thing, right? You ask an LLM, How do you make this very specific homemade bomb? And it would go, I'm sorry, but I can't give you that information, yada, yada, yada. Whereas Deep Seek failed on all of these counts. And it's like, Oh, yeah, you just do it like this. It's great. Go for it, guys.

Taiwan, no bueno. Taiwan no bueno. But ..., yes.

Well, exactly. And part of the situation is that these services, these companies, these institutions are making those decisions about how they train these things and how they. So obviously the DeepSeek guys were in China. It feels like with the Taiwan stuff, it's like they make a conscious decision there, right? I'm assuming there's a conscious decision there. In the same way that OpenAI and all the implement complexity would make a conscious decision to do certain things. But also you can just not do stuff, which is also bad. The whole bomb thing feels like they just went, We'll just put it out. Don't worry about it. Which is obviously another worrying aspect, which I'm sure my listeners have heard me bang on about many times before. But it's interesting, isn't it? Where those decisions lie. And with the Grok thing recently, it's almost like there's an unknown decision that someone made. They didn't realise it was going to have this catastrophic, weird outcome in that way.

I mean, really, I don't want to be cynical, but it's a good PR stunt, if you ask me. At the end of the day, Grok 4 is coming out. That's the first thing that's coming out to the press or the 1.5 thing that is coming up to the press after the initial release. Everyone is talking about it. They're fixing it very easily. I mean, I just think, sadly, it's a bad PR stunt, but it's an effective PR stunt.

Yeah, it's all PRs, good PR situations, isn't it? Yeah. Yeah, that's very interesting, isn't it? If anyone knows about that, can they let us know? I wish we were in a... I feel like we need a phone in, like It's a livestream version of this one. Oh, actually, Janet is on the phone and she has told us that she works with Grok, and that would be excellent. So we're getting towards near the end of the podcast. Is there something that we haven't chatted about yet that you'd like to discuss?

I think overall, the main topic that I'm doing on a day-to-day basis is how you can accelerate workflows in companies to get better results. We're focusing on software, but there are numerous solutions and companies. And what I would suggest to everyone from that topic is challenge yourself, challenge your coworkers, try to find different ways to not just use LLM as an input or an output source, but actually try to challenge different applications that are out there. And I think it's scary at the beginning, but it could be effective and accelerate. First of all, open your eyes and accelerate the work that you're doing and help you focus on other things.

Wicked. So the last question we always ask on the podcast, which some of this we might have talked about a little bit already. But what is it that excites you and maybe scares you a little bit about our AI-mediated future?

The change, the fact that I'm seeing... I'm fortunate enough to have a front row seat to how the work environment is changing with with the AI aspect. And I think it excites me a lot because the optimistic side of it says, we're going to venture to new horizons as a human race. It's also very scary. If you think about it, then it's like, go back to my nine year old son. He's like, hey, dad, I want to go learn computer science. I'm like, be a plumber. It's going to be short term. It's going to be way more effective than computer science. And he's like, okay. So that part scares me. And not being a plumber at all, but the part of it's unknown, right? Some paradigms are unknown.

Yes. I think someone else had been a plumber recently, actually. But I mean, There's going to be certain practical jobs, which there's going to be a long tail of AI that we're not be able to touch a lot of that stuff because robotics is not quite there yet. But a lot of like I've been in my career, web designer, teacher, data science and design and coding teacher, and games developer, designer. And yeah, a lot of that stuff is going to change dramatically in the next couple of years, definitely, or It's already changing. So it's going to be interesting to see where it plays out.

Well, with that, thank you so much for being on the show. And how do people talk to you, follow you, find out about you? Go.

Adir Ben-Yehuda:[00:48:33]. And then if you want to check It's like, Autonomy AI, it's autonomyai.io. Go check it out.

Yeah. Awesome. I'm going to go check it out now and I will report back. Thank you very much for coming in the show and I'll see you soon.

Thank you.

Hello, and welcome to the end of the episode. Thank you so much to Adir for coming on and talking to us about a plethora of stuff, his company, LLMs, how they are used, what his vision is for his company, and just the web and jobs in general. I think my pessimism was clear from our conversation, but it's nice that Adir had this optimism to balance me out a little bit there.

,owards the end, I think it's a bonkers that we keep having this conversation about what is practical for us to be doing. I wish we weren't getting to that point where we are scrabbling around for the scraps, it feels like. Knowledge workers are doomed and this AI erosion of our workforce and things like that. I'm hoping that we have a more hopeful vision going forward. So hopefully, I can talk to someone about that soon. Be that as it may, I'm hoping that you're doing well and still able to make ends-meat and that we can go forward and hopefully make a vibrant ecosystem of things that we like doing, things that drive us forward as a society and a human race and all that great blue sky thinking, moonshot stuff. Maybe I can put down my thoughts more succinctly in future.

If you'd like to follow us and hear more from me, we can go to machine-ethics.net for more episodes of the podcast. And if you can, you can support us on Patreon.com/machineethics. Thank you so much and I'll see you next time.

102. Autonomy AI with Adir Ben-Yehuda

Transcription:

Episode host: Ben Byford