The Evolution of MYETV’s Search Engine: How AI and Multilingual Dictionaries Are Changing the Game

Why Search Matters

The MYETV Search Engine is already one of the most sophisticated search engine- [Search Engine: is the application that deals with the search for content, networks or anything else on the platform] - in the web. Search is the heartbeat of any content- [Contents: every content intended as text, images, audio or video] - platform- [Platform: the set of the main domain and all the subdomain of a particular website; also the computer architecture and equipment using a particular operating system] - . It’s the invisible bridge between what you’re thinking and what you’re trying to find. When you type “the fiuture of ai” into a search box, you shouldn’t get zero results just because of a typo. You should get exactly what you meant: “the future of AI.”

Today, we’re excited to share a major milestone in MYETV’s journey: a complete overhaul of our search engine powered by artificial intelligence, multilingual dictionaries, and advanced natural language processing. This isn’t just a technical upgrade, it’s a fundamental shift in how MYETV understands you.

Let’s dive into what we built, why it matters, and how it works behind the scenes.

The Problem: Traditional Search Is Too Rigid

Traditional search engines are unforgiving. They match words exactly as you type them. If you search for “miusic video,” most systems will either return nothing or show results for something completely unrelated. They don’t understand language, they just match patterns.

This becomes even more problematic in a multilingual, global platform like MYETV, where users speak English, Italian, Spanish, French, German, Portuguese, and many other languages. A user in Rome might search for “tutorail vidio” in English, while another in Madrid searches for “música” with a typo as “musica” (without the accent). Traditional search engines struggle with these real-world scenarios.

We asked ourselves: What if our search engine could think like a human? What if it could correct your spelling, understand context, detect language automatically, and even suggest content based on semantic meaning rather than exact word matches?

That’s exactly what we set out to build.

The Solution: A Three-Layer AI-Powered Search System

Our new search engine combines three powerful technologies working in harmony:

1. Multilingual Dictionary Correction

2. BERT Semantic Understanding

3. Content-Aware Title Matching

Let’s break down each layer.

Layer 1: Multilingual Dictionary Correction

The Foundation: 50,000+ Words Across 20 Languages

At the core of our new search engine is a massive multilingual dictionary containing over 50,000 carefully curated words across 20 languages. These aren’t just random word lists—they’re frequency-weighted vocabularies that represent how people actually communicate online.

Here’s what we included:

English: 10,000 most common words
Italian: 10,000 most common words
Spanish: 10,000 most common words
French: 10,000 most common words
German: 10,000 most common words
Portuguese: 10,000 most common words
Plus 14 additional languages: Dutch, Russian, Polish, Turkish, Arabic, Chinese (Simplified), Japanese, Korean, Hindi, Swedish, Norwegian, Danish, Finnish, and Greek

Each word is indexed with its frequency of use, meaning the system prioritizes common words over obscure ones when suggesting corrections.

Where Did This Data Come From?

We sourced our word lists from several open-source linguistic databases:

Wiktionary frequency lists: Community-maintained lists of the most commonly used words in each language
OpenSubtitles corpus: Real-world language usage from millions of movie and TV show subtitles
Google Books Ngram data: Historical language patterns from billions of published books
Universal Dependencies treebanks: Linguistically annotated text corpora

This combination gives us a realistic picture of how people actually write and search, not just formal dictionary definitions.

How Dictionary Correction Works

When you type a search query, our system immediately checks each word against our multilingual dictionary using a technique called Levenshtein distance, a mathematical way to measure how many single-character edits (insertions, deletions, substitutions) are needed to change one word into another.

For example:

“fiuture” → “future” (distance: 1, one substitution)
“tutorail” → “tutorial” (distance: 1, one substitution)
“vidio” → “video” (distance: 1, one substitution)

The system calculates these distances in real-time, searches our dictionary for words within a distance of 2, ranks them by frequency, and suggests the most likely correction.

But here’s the clever part: The system automatically detects which language you’re using by checking which dictionary has the most matches for your words. If you search “la musica italiana” (Italian), it won’t try to correct it to English, it recognizes Italian and validates against the Italian dictionary instead.

Layer 2: BERT Semantic Understanding

What Is BERT?

BERT stands for Bidirectional Encoder Representations from Transformers. It’s a deep learning model developed by Google that revolutionized natural language processing in 2018. Unlike traditional search engines that match keywords, BERT actually understands the meaning of words in context.

Think of it this way: If you search for “apple,” a traditional search engine doesn’t know if you mean the fruit or the tech company. BERT understands context. If your previous searches were about “iPhone” and “MacBook,” BERT knows you probably mean Apple Inc.

The Transformers Library

We’re using the Transformers library by Hugging Face, which provides pre-trained AI models that can understand and generate human language. Specifically, we’re using:

Model: bert-base-uncased (a compact but powerful English BERT model)
Task: Fill-mask predictions (understanding context to suggest related words)
Purpose: Semantic suggestions beyond spelling correction

How BERT Enhances Search

When you search for something and we don’t find exact matches, BERT steps in to suggest semantically related terms from our content database.

For example:

You search: “electronic beats”
BERT suggests: “music,” “techno,” “edm,” “electronica”
You search: “cooking show”
BERT suggests: “recipe,” “chef,” “culinary,” “kitchen”

This is possible because BERT has been trained on billions of words and understands relationships between concepts. It doesn’t just match letters, it matches meaning.

Layer 3: Content-Aware Title Matching

Searching Inside Titles, Not Just Tags

The third layer of our search engine looks directly into MYETV’s content library specifically, the titles and descriptions of contents- [Contents: every content intended as text, images, audio or video] - . This is where things get really smart.

When you type a multi-word query like “the future of music,” our system:

Splits your query into individual words: [“the”, “future”, “of”, “music”]
Searches our database for titles containing similar words
Calculates a similarity score for each title based on how many of your words match (even with typos)
Ranks and suggests the best-matching titles

For example, if we have a video titled “The Future of Electronic Music in 2026,” and you search “the fiuture of miusic,” the system will:

Correct “fiuture” → “future”
Correct “miusic” → “music”
Match “the” and “of” exactly
Recognize the title as a 100% match and suggest it

This layer is particularly powerful for long-tail searches—specific, multi-word queries that traditional keyword matching often misses.

Putting It All Together: The Search Experience

Let’s walk through a real example to see how all three layers work together.

Scenario: You search for “the fiuture of miusic”

Step 1: Dictionary Correction (Layer 1)

The system detects:

“the” → valid English word ✓
“fiuture” → invalid word, distance 1 from “future” → corrected
“of” → valid English word ✓
“miusic” → invalid word, distance 1 from “music” → corrected

Result: Your query is corrected to “the future of music”

Step 2: Semantic Understanding (Layer 2)

BERT analyzes “the future of music” and identifies related concepts in our content database:

“music” (exact match)
“electronic” (semantically related)
“beats” (semantically related)
“sound” (semantically related)

Result: Semantic keywords are generated for broader matching

Step 3: Content Matching (Layer 3)

The system searches our video titles and finds:

“The Future of Electronic Music in 2026” (98% match)
“Music Evolution: What’s Next?” (85% match)
“The Future of Sound Technology” (80% match)

Result: These titles are suggested alongside corrected keywords

What You See:

text🤖 AI Suggestions:
   [the future of music]  [music]  [electronic]

🎬 Title Suggestions:
   [The Future of Electronic Music in 2026]
   [Music Evolution: What's Next?]

Technical Implementation: How We Built This

The Stack

Backend: PHP with custom classes for modular AI processing
AI Library: transformers (PHP port of Hugging Face Transformers)
Database: with custom LEVENSHTEIN function for distance calculations
Dictionary Storage: Dedicated database table with 850,000+ indexed entries
Model: BERT base uncased (110M parameters, fine-tuned for masked language modeling)
Caching: In-memory cache for dictionary lookups and BERT predictions to optimize speed

Performance Optimization

AI models are computationally expensive. To ensure our search remains fast, we implemented several optimizations:

Dictionary Pre-filtering: Before calculating Levenshtein distances, we filter words by length (±2 characters) to reduce search space by 90%
Result Caching: Frequently searched terms and their corrections are cached in memory
Batch Processing: Multiple suggestions are generated in a single BERT inference call

Average search time: < 200ms (including AI processing)

The Data Behind the AI

Dictionary Import Process

Building our multilingual dictionary wasn’t trivial. Here’s how we did it:

Data Collection: Downloaded frequency lists from Wiktionary, OpenSubtitles, and Google Ngram for 20 languages
Cleaning: Removed special characters, URLs, numbers, and profanity
Frequency Weighting: Assigned importance scores based on usage frequency
Normalization: Converted all words to lowercase for case-insensitive matching
Database Import: Bulk-inserted 850,000+ entries into MySQL with indexed columns for lightning-fast lookups

Total database size: 45MB (highly compressed with indexes)

BERT Model Training

We’re using the pre-trained bert-base-uncased model, which was trained by Google on:

BooksCorpus: 800M words from 11,000 books
English Wikipedia: 2,500M words from 13GB of text

This gives BERT deep contextual understanding of English language patterns, idioms, and semantic relationships.

Real-World Impact: Early Results

We’ve been testing the new search engine internally for the past two weeks. Here are some early metrics:

Improved Search Success Rate

Before: 67% of searches returned relevant results
After: 91% of searches returned relevant results
+24 percentage points improvement

Typo Tolerance

Before: Searches with 1+ typos had 23% success rate
After: Searches with 1-2 typos have 89% success rate
+66 percentage points improvement

Multilingual Detection

Automatically detects and corrects queries in 20 languages
Italian and Spanish searches saw 3x increase in successful corrections

User Feedback

Early beta testers reported:

“It feels like the search actually understands what I’m looking for”
“I can type quickly without worrying about spelling”
“Finally! Searches in Italian work properly”

Behind the Code: Open Source and Transparency

We believe in transparency. While MYETV’s codebase is proprietary, we’re committed to sharing our learnings with the developer community. All the code used in this implementation is:

100% Open Source: Deep dives into our implementation from github repositories
Open Source AI: Open-source snippets demonstrating dictionary matching and BERT integration in PHP
Open Source library: the transformers huggingface library for PHP is open source on github

Why This Matters for You

As a MYETV user, you might not care about BERT models or Levenshtein distances. But you will notice:

Less frustration: Typos no longer break your search experience
Faster discovery: Find what you’re looking for in fewer attempts
Better recommendations: More relevant suggestions based on what you actually meant to search
Multilingual support: Search in your native language without worrying about language barriers

Our goal is simple: Make search invisible. You shouldn’t have to think about how to search, you should just find what you’re looking for, instantly, every time.

Conclusion: Search That Understands You

Building an AI-powered search engine isn’t about showing off technology, it’s about removing friction from your experience. Every typo corrected, every semantic suggestion made, every multilingual query understood is a moment where technology gets out of your way and lets you enjoy content.

We’ve poured hundreds of hours into this project: curating dictionaries, training models, optimizing databases, and fine-tuning algorithms. But the real measure of success isn’t in the code, it’s in those moments when you search for something, get the perfect result, and never even notice that AI just worked behind the scenes to make it happen.

That’s the MYETV difference.

Welcome to the future of search.

Try It Yourself

The new search engine is live right now on MYETV. Try these searches to see AI in action:

“the fiuture of tecnology” (intentional typos)
“musica electronica” (Spanish with typos)
“tutorail vidio” (double typo)
Any multi-word phrase describing content you want to find

See how the AI corrects, suggests, and finds exactly what you’re looking for.

Happy searching!

— The MYETV Engineering Team

Technical References

BERT: Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” (2018)
Transformers Library: https://github.com/huggingface/transformers
PHP Transformers: https://github.com/CodeWithKyrian/transformers-php
Levenshtein Distance: https://en.wikipedia.org/wiki/Levenshtein_distance
Universal Dependencies: https://universaldependencies.org/

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28