This topic is locked

Custom Arabic Search Normalization (Handling Similar Letters, Diacritics, and Tatweel)

8/4/2025 1:34:25 PM
PHPRunner General questions
A
Abbas author

Hello,

I am working on an Arabic-language PHPRunner project using version 10.9 and MySQL. I want to make the search functionality more tolerant of variations in Arabic spelling, such as different forms of similar letters, diacritics (harakat), and the "tatweel" character (ـ).

For example, I want a search for احمد to return records containing أحمد, and vice versa. However, PHPRunner's default LIKE behavior in MySQL doesn't match these variants unless the user types the exact form.

Goal:

Make the search function normalize Arabic input so that different forms of similar letters (e.g., ا, أ, إ, آ) are treated as equivalent, and diacritics (ًٌٍَُِّْ) and tatweel (ـ) are ignored.

What I tried:

I followed the Search API and placed the following code in:

students → AfterTableInit event

Code:

function normalizeArabic($text) {
$text = preg_replace('/[أإآ]/u', 'ا', $text);
$text = str_replace(['ى', 'ي'], 'ي', $text);
$text = str_replace('ة', 'ه', $text);
$text = str_replace(['ؤ', 'ئ'], ['و', 'ي'], $text);
$text = str_replace(['لأ', 'لإ', 'لآ'], 'لا', $text);
$text = preg_replace('/[ًٌٍَُِّْـ]/u', '', $text); // Remove diacritics and tatweel
return $text;
}

$srchObj = SearchClause::getSearchObject("students");

$rawValue = $srchObj->getFieldValue("StudentName");

if( $rawValue !== null && $rawValue !== "" ) {
$normalized = normalizeArabic($rawValue);

$sqlCond = "
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(StudentName,
'أ','ا'),'إ','ا'),'آ','ا'),'ى','ي'),'ة','ه'),'ؤ','و'),'ئ','ي'),'ً',''),'ُ',''),'ّ',''
)
LIKE '%" . db_addslashes($normalized) . "%'
";

$srchObj->setSearchSQL("StudentName", $sqlCond);
}

The problem:

The normalization works when I run the SQL manually in phpMyAdmin, but it does not return any results inside PHPRunner's generated list page, even when I use Advanced Search.

The StudentName field is enabled for search.

The AfterTableInit event is being triggered correctly.

No syntax errors or fatal errors are shown.

What I suspect:

Maybe:

PHPRunner applies additional filtering that overrides or ignores setSearchSQL.

Or maybe setSearchSQL only works if the condition is added at the exact right time in the event lifecycle.

Possibly REPLACE() in the SQL isn't parsed properly due to field quoting or encoding.

My Questions:

Is AfterTableInit the correct place to override the search condition like this?

Is setSearchSQL() expected to work with complex SQL expressions like nested REPLACE() functions?

Is there a more appropriate way to intercept and modify search input before it reaches the final SQL query?

Would using a custom database function (e.g., normalize_arabic(StudentName)) be more compatible?

Thank you in advance! I appreciate any help or insight on how to get this working inside the PHPRunner interface.

ChatGPT 8/4/2025

It looks like you're on the right track with your approach to normalizing Arabic text for search functionality in PHPRunner. However, there are some considerations and best practices that may help you to troubleshoot and refine your implementation. Let’s address your questions and provide some suggestions:

1. Event Lifecycle and Correct Placement

Is AfterTableInit the correct place to override the search condition like this?

The AfterTableInit event is typically used for initializing data or setting up configurations for a table. However, for modifying search conditions, you might want to consider using the BeforeProcessList or BeforeSQLQuery events. These events are triggered after the search parameters are set but before the SQL query is executed, making them more suitable for modifying search conditions.

2. Using setSearchSQL with Complex SQL Expressions

Is setSearchSQL() expected to work with complex SQL expressions like nested REPLACE() functions?

While setSearchSQL() can be used to set complex SQL expressions, it’s important to ensure that the SQL syntax is valid and well-formed. You can simplify your normalization logic by encapsulating it in a stored procedure or a custom MySQL function, which might make your queries cleaner and easier to read. However, make sure that the changes you make are syntactically correct.

3. Intercepting and Modifying Search Input

Is there a more appropriate way to intercept and modify search input before it reaches the final SQL query?

Using the BeforeProcessList event is generally the most reliable way to intercept the search input and modify the SQL query. Here's a brief example of how you might implement that:

// In BeforeProcessList event
function BeforeProcessList(&$params) {
// Get the search object
$srchObj = SearchClause::getSearchObject("students");

// Get the raw value from search input
$rawValue = $srchObj->getFieldValue("StudentName");

// Check if the raw value is not empty
if ($rawValue !== null && $rawValue !== "") {
// Normalize the input
$normalized = normalizeArabic($rawValue);

// Create the SQL condition
$sqlCond = "
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(StudentName,
'أ','ا'),'إ','ا'),'آ','ا'),'ى','ي'),'ة','ه'),'ؤ','و'),'ئ','ي'),'ً',''),'ُ',''),'ّ',''
) LIKE '%" . db_addslashes($normalized) . "%'
";

// Set the new search SQL condition
$srchObj->setSearchSQL("StudentName", $sqlCond);
}
}

4. Using a Custom Database Function

Would using a custom database function (e.g., normalize_arabic(StudentName)) be more compatible?

Creating a custom database function to handle the normalization can be an effective approach. This allows you to encapsulate your normalization logic within the database, which can simplify your PHP code and improve maintainability.

You would define the function in MySQL, and then modify the SQL condition in your PHPRunner code to use this function:

SELECT * FROM students WHERE normalize_arabic(StudentName) LIKE '%normalized_input%'

This way, you offload the complexity of normalization to the database layer, which can handle it more efficiently.

Conclusion

To summarize, consider switching to the BeforeProcessList event to modify the search conditions, and explore the option of using a custom database function for normalization. These changes should help you achieve the desired search functionality while keeping your code clean and maintainable. If you continue to experience issues, reviewing the logs and debugging with simpler conditions can also help isolate the problem.

Sergey Kornilov admin 8/4/2025

The search string set via setSearchSQL() will be used as is. Yes, placing this code to the correct event is also a requirement.

If nothing helps, enable SQL debugging, see the exact SQL query being executed and this may point you in the right direction.