By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
TechziTechziTechzi
  • Home
  • Community
    • Our Review
    • Join Our Slack community
    • Referral: Richieee
    • Referral: 6 for 6
  • Publications
    • Special Report: SE Asian Startup Funding
    • Top 30 Most Funded Southeast Asia Startups
  • Agencies
  • About
    • About us
    • Contact
Search
© 2023 Techzi . All Rights Reserved.
Reading: My thoughts on the Issue of English-centric AI
Share
Font ResizerAa
TechziTechzi
Font ResizerAa
Search
  • Home
  • Community
    • Our Review
    • Join Our Slack community
    • Referral: Richieee
    • Referral: 6 for 6
  • Publications
    • Special Report: SE Asian Startup Funding
    • Top 30 Most Funded Southeast Asia Startups
  • Agencies
  • About
    • About us
    • Contact
Have an existing account? Sign In
Follow US
© 2023 Techzi . All Rights Reserved.
AI

My thoughts on the Issue of English-centric AI

Viktoriya Tigipko
Last updated: August 27, 2024 12:24 pm
Viktoriya Tigipko
Share
8 Min Read
SHARE
Viktoriya Tigipko is one of the most recognized names in the Eastern European VC community and is a native of Ukraine.
She runs TA Ventures, a pre-seed and seed stage VC, since 2010. Additionally she founded iClub (an angel network), WTech (a community for women in tech), and is Chair of the Board at the Ukrainian Startup Fund.
Guest Author: Viktoriya Tigipko

Contents
Most of the information that AI is being trained on is in EnglishProducts based on these AI models will also be more English-focusedThe problem is that the business case is weakSo what can be done?Wrapping up

I was reading a terrific article by fellow Ukrainian, Artur Kulian, recently and wanted to add my two cents. The article is “Why is AI English-centric, and why is it a very big problem?”

The gist of the article is that the fact that AI engines are English-centric is potentially going to have very harmful impacts on culture.

By ‘AI engine’ we are referring to things like ChatGPT (OpenAI), Llama (Meta), Gemini (Google), and Claude (Anthropic).

And after I read it I was thinking to myself.. “Wow! He is right! As time goes on this is going to be more and more problematic!”

Most of the information that AI is being trained on is in English

You see AI is trained with loads and loads of information. And as Artur mentions, most of that information is written in English.

As of August 2024 ~50% of all websites are written in English with Spanish making a distant 2nd place with just 5.9%.

This means that AI will be a lot dumber in any topics that are not written in English. And reality is that in a lot of cultures around the world there is a lot of accumulated information and knowledge that is not written in English.

Beyond being ‘dumber’, in many cases the AI will simply spit out wrong or incorrect information. So people using these models in these other languages will be at a pretty big disadvantage.

Products based on these AI models will also be more English-focused

There are startups all over the place that are building products based on these models. Tools for both consumers and businesses like SAAS.

Just have a look at how many startups in the latest batches of Y-Combinator are AI-driven companies.

If the model performs much better in English than of course the companies that are built on top of them will have a major advantage if they are focused on the English-speaking market.

And since most of the top startups these days are leveraging AI this brings about some important questions to ponder.

For example, does this mean that tech startups focusing on the English-speaking market will have a major advantage for the forseeable future?

And if so, does that mean we can expect a higher and higher percentage of the most innovative companies being produced in markets that focus on English speakers?

The problem is that the business case is weak

One thing you might be thinking to yourself is… “well if the models are fed lots of information in English, than can’t you just translate all of that to every other language? And therefore the information available in those other languages would be equivalent.”

And you’d be right. You could hypothetically do this.

But there are a few problems with that.

First you’d need to worry about the quality of the translation. Machine translation can often lose nuances, idioms and cultural references.

Second, computational cost. Translating a massive dataset is very resource-intensive and often the business case just isn’t there.

Plus there are a number of other barriers. So the reality is that lots of the large datasets that are fed in English are simply not translated into many of the other languages.

So what can be done?

Well there are probably some folks out there with a lot more expertise in this area than me, but let me have a stab at some of the things that I think would make sense from my perspective.

First, collecting more large, high-quality datasets in other languages. For example I am from Ukraine.

What can be done to ensure that some of the most valuable datasets that are only written in Ukrainian are included into the popular models?

And how do we then ensure that there is a business case around that? Because when things are ‘for profit’ they tend to happen faster and at a larger scale.

Second, I think open source is key. Llama is open source whereas engines like ChatGPT and Gemini are not.

These open source models should cooperate with contributors around the world that have access to these localized datasets so that they incorporate them.

I think of this a bit like what Wikipedia has achieved. Wikipedia is in 300+ languages and in many of those languages it is very comprehensive.

How did it achieve this?

Simple. By incorporating lots of contributors. And by ‘lots’ I mean that there are currently over 47 million Wikipedia accounts, of which ~113k of them have made a contribution in the past month.

Wrapping up

What I do not want to see is certain countries begin to lag behind because of this disadvantage due to language. Because I can totally see how this problem will compound over time.

Rather I’d love to see more action be taken now to ensure there is a level playing field.

At TAV we invest in startups accelerated by AI. I hesitate to say ‘AI startups’ because at this point I think pretty much all startups can and should be accelerated by AI. And if they aren’t, then they are at high risk of being out of business in the future.

We are particularly interested in verticals that can be accelerated by AI. Things like heathcare & biotechnology, autonomous systems and robotics, natural language processing and conversational AI, and fintech (fraud detection, trading, etc.).

The key elements within these startups that we look for are things like solid adoption, data availability, and potential to disrupt the market.

Also it is important in my view that these startups are also keeping a focused eye on some of the challenges that AI faces. Things like data privacy, security and ethical issues.

For example we like to see that companies, even in the early stages, have and adhere to an ethical code of doing business.

If you’re interested in TA Ventures then please don’t hesitate to visit our website and reach out to the person that you feel is most appropriate to your inquiry: https://taventures.vc/team/.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook X Copy Link Print
Share
Previous Article SpaceX Tapped for Rescue Mission as Boeing’s Starliner Stumbles
Next Article Beam Mobility’s Phantom Fleet Scandal Rocks E-Scooter Industry

Subscribe to our newsletter to get our newest articles instantly

Please enable JavaScript in your browser to complete this form.
=

Stay Connected

XFollow
InstagramFollow
YoutubeSubscribe
TiktokFollow

Latest News

Techzi is Pausing
Media December 24, 2024
Twitch Pioneer Emmett Shear Launches Mysterious AI Venture
AI December 24, 2024
OpenAI CEO Labels Musk a ‘Bully’ in Latest Tech Titan Clash
AI December 24, 2024
AI Revolution Could Spark Live Entertainment Boom
Culture December 24, 2024

You Might also Like

AI

Amazon’s CEO Doubles Down on AI Investment Plans

November 14, 2024
AIFAANG

Google’s AI Search Shakes Up the Web While Industry Braces for Impact

July 12, 2024
AIStrategy

Ranvir Singhsachakul Explores the Future Symbiosis of BI and AI

February 15, 2024
AIFAANG

Gates: 3-Day Work Week “Probably OK” in AI-Powered Future

February 17, 2024
AISaaS

Indonesian Legal Tech Firm Hukumonline Raises Series B for AI Development

February 12, 2024
AIStartups

Ola Founder’s AI Startup Krutrim Hits Unicorn Status

February 12, 2024
AISocial Media

OpenAI Faces Legal Heat Over YouTube Data Scraping

August 12, 2024
AIMedia

Sqreem Acquires Trade Indy to Enhance AI-Driven Marketing Solutions

March 15, 2024
AI

OpenAI goes full Apple, launches “GPT Store” so you can make your own AI Apps

February 17, 2024
AIFAANG

Apple’s AI Ace Up Its Sleeve? Cook Keeps World Guessing

May 9, 2024
AI

Anthropic Expands Into Defense Sector Through Strategic AWS-Palantir Partnership

November 14, 2024
AI

Andreessen Horowitz’s AI App Rankings Reveal Shifting Landscape

August 30, 2024

Techzi

SE Asian tech news: Free & Comprehensive. Read more

Quick Links

  • Logistics
  • Marketplace
  • Mobility
  • Startups
  • VC
  • Food tech
  • Gaming
  • Health-Tech
  • Media
  • Social Media
  • SaaS
  • Travel

Quick Links

  • AI
  • Edutech
  • Climate
  • Creators
  • Crypto & Web3
  • Culture
  • Deep Tech
  • e-Commerce
  • FAANG
  • Fashion
  • Fintech

Techzi Tech Newsletter

FREE and Curated by Tech Insiders

Legal

Privacy Policy

Terms & conditions

TechziTechzi
Follow US
© 2024 Techzi . All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?