By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
TechziTechziTechzi
  • Home
  • Community
    • Our Review
    • Join Our Slack community
    • Referral: Richieee
    • Referral: 6 for 6
  • Publications
    • Special Report: SE Asian Startup Funding
    • Top 30 Most Funded Southeast Asia Startups
  • Agencies
  • About
    • About us
    • Contact
Search
© 2023 Techzi . All Rights Reserved.
Reading: My thoughts on the Issue of English-centric AI
Share
Font ResizerAa
TechziTechzi
Font ResizerAa
Search
  • Home
  • Community
    • Our Review
    • Join Our Slack community
    • Referral: Richieee
    • Referral: 6 for 6
  • Publications
    • Special Report: SE Asian Startup Funding
    • Top 30 Most Funded Southeast Asia Startups
  • Agencies
  • About
    • About us
    • Contact
Have an existing account? Sign In
Follow US
© 2023 Techzi . All Rights Reserved.
AI

My thoughts on the Issue of English-centric AI

Viktoriya Tigipko
Last updated: August 27, 2024 12:24 pm
Viktoriya Tigipko
Share
8 Min Read
SHARE
Viktoriya Tigipko is one of the most recognized names in the Eastern European VC community and is a native of Ukraine.
She runs TA Ventures, a pre-seed and seed stage VC, since 2010. Additionally she founded iClub (an angel network), WTech (a community for women in tech), and is Chair of the Board at the Ukrainian Startup Fund.
Guest Author: Viktoriya Tigipko

Contents
Most of the information that AI is being trained on is in EnglishProducts based on these AI models will also be more English-focusedThe problem is that the business case is weakSo what can be done?Wrapping up

I was reading a terrific article by fellow Ukrainian, Artur Kulian, recently and wanted to add my two cents. The article is “Why is AI English-centric, and why is it a very big problem?”

The gist of the article is that the fact that AI engines are English-centric is potentially going to have very harmful impacts on culture.

By ‘AI engine’ we are referring to things like ChatGPT (OpenAI), Llama (Meta), Gemini (Google), and Claude (Anthropic).

And after I read it I was thinking to myself.. “Wow! He is right! As time goes on this is going to be more and more problematic!”

Most of the information that AI is being trained on is in English

You see AI is trained with loads and loads of information. And as Artur mentions, most of that information is written in English.

As of August 2024 ~50% of all websites are written in English with Spanish making a distant 2nd place with just 5.9%.

This means that AI will be a lot dumber in any topics that are not written in English. And reality is that in a lot of cultures around the world there is a lot of accumulated information and knowledge that is not written in English.

Beyond being ‘dumber’, in many cases the AI will simply spit out wrong or incorrect information. So people using these models in these other languages will be at a pretty big disadvantage.

Products based on these AI models will also be more English-focused

There are startups all over the place that are building products based on these models. Tools for both consumers and businesses like SAAS.

Just have a look at how many startups in the latest batches of Y-Combinator are AI-driven companies.

If the model performs much better in English than of course the companies that are built on top of them will have a major advantage if they are focused on the English-speaking market.

And since most of the top startups these days are leveraging AI this brings about some important questions to ponder.

For example, does this mean that tech startups focusing on the English-speaking market will have a major advantage for the forseeable future?

And if so, does that mean we can expect a higher and higher percentage of the most innovative companies being produced in markets that focus on English speakers?

The problem is that the business case is weak

One thing you might be thinking to yourself is… “well if the models are fed lots of information in English, than can’t you just translate all of that to every other language? And therefore the information available in those other languages would be equivalent.”

And you’d be right. You could hypothetically do this.

But there are a few problems with that.

First you’d need to worry about the quality of the translation. Machine translation can often lose nuances, idioms and cultural references.

Second, computational cost. Translating a massive dataset is very resource-intensive and often the business case just isn’t there.

Plus there are a number of other barriers. So the reality is that lots of the large datasets that are fed in English are simply not translated into many of the other languages.

So what can be done?

Well there are probably some folks out there with a lot more expertise in this area than me, but let me have a stab at some of the things that I think would make sense from my perspective.

First, collecting more large, high-quality datasets in other languages. For example I am from Ukraine.

What can be done to ensure that some of the most valuable datasets that are only written in Ukrainian are included into the popular models?

And how do we then ensure that there is a business case around that? Because when things are ‘for profit’ they tend to happen faster and at a larger scale.

Second, I think open source is key. Llama is open source whereas engines like ChatGPT and Gemini are not.

These open source models should cooperate with contributors around the world that have access to these localized datasets so that they incorporate them.

I think of this a bit like what Wikipedia has achieved. Wikipedia is in 300+ languages and in many of those languages it is very comprehensive.

How did it achieve this?

Simple. By incorporating lots of contributors. And by ‘lots’ I mean that there are currently over 47 million Wikipedia accounts, of which ~113k of them have made a contribution in the past month.

Wrapping up

What I do not want to see is certain countries begin to lag behind because of this disadvantage due to language. Because I can totally see how this problem will compound over time.

Rather I’d love to see more action be taken now to ensure there is a level playing field.

At TAV we invest in startups accelerated by AI. I hesitate to say ‘AI startups’ because at this point I think pretty much all startups can and should be accelerated by AI. And if they aren’t, then they are at high risk of being out of business in the future.

We are particularly interested in verticals that can be accelerated by AI. Things like heathcare & biotechnology, autonomous systems and robotics, natural language processing and conversational AI, and fintech (fraud detection, trading, etc.).

The key elements within these startups that we look for are things like solid adoption, data availability, and potential to disrupt the market.

Also it is important in my view that these startups are also keeping a focused eye on some of the challenges that AI faces. Things like data privacy, security and ethical issues.

For example we like to see that companies, even in the early stages, have and adhere to an ethical code of doing business.

If you’re interested in TA Ventures then please don’t hesitate to visit our website and reach out to the person that you feel is most appropriate to your inquiry: https://taventures.vc/team/.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook X Copy Link Print
Share
Previous Article SpaceX Tapped for Rescue Mission as Boeing’s Starliner Stumbles
Next Article Beam Mobility’s Phantom Fleet Scandal Rocks E-Scooter Industry

Subscribe to our newsletter to get our newest articles instantly

Please enable JavaScript in your browser to complete this form.
=

Stay Connected

XFollow
InstagramFollow
YoutubeSubscribe
TiktokFollow

Latest News

Techzi is Pausing
Media December 24, 2024
Twitch Pioneer Emmett Shear Launches Mysterious AI Venture
AI December 24, 2024
OpenAI CEO Labels Musk a ‘Bully’ in Latest Tech Titan Clash
AI December 24, 2024
AI Revolution Could Spark Live Entertainment Boom
Culture December 24, 2024

You Might also Like

AI

Nvidia CEO Calls for Countries to Build “Sovereign AI” Systems

February 28, 2024
AITravel

Airbnb CEO Predicts “the Calm Before the Storm” Through AI Revolution

July 5, 2024
AI

Agency Debuts $11K-A-Month AI Influencer

February 17, 2024
AIe-Commerce

Lazada Launches AI Lazzie to Transform Southeast Asian Shopping Experience

November 15, 2024
AISaaS

Singapore’s Bld.ai Combines Mercenary Approach With Amazon-Like Infrastructure

March 4, 2024
AI

xAI Launches Global Hunt for AI Training Specialists

October 23, 2024
AI

NY Times Sues OpenAI and Microsoft for Using Articles to Train AI

February 17, 2024
AIFAANG

Google’s August Spectacle: Pixels, Folds, and AI Galore

July 25, 2024
AI

Indonesia Unveils Plans for Comprehensive AI Regulations by End-2024

March 13, 2024
AISaaS

Dance Battle Reignites as Tech Titans Clash over AI Supremacy

May 16, 2024
AISocial Media

TikTok’s AI Chatbot Tonik: Revolutionizing Music Discovery

July 16, 2024
AIDeep Tech

South Korea Invests $7 Billion to Boost AI Chip Production

April 15, 2024

Techzi

SE Asian tech news: Free & Comprehensive. Read more

Quick Links

  • Logistics
  • Marketplace
  • Mobility
  • Startups
  • VC
  • Food tech
  • Gaming
  • Health-Tech
  • Media
  • Social Media
  • SaaS
  • Travel

Quick Links

  • AI
  • Edutech
  • Climate
  • Creators
  • Crypto & Web3
  • Culture
  • Deep Tech
  • e-Commerce
  • FAANG
  • Fashion
  • Fintech

Techzi Tech Newsletter

FREE and Curated by Tech Insiders

Legal

Privacy Policy

Terms & conditions

TechziTechzi
Follow US
© 2024 Techzi . All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?