This library is a Unicode aware alternative to PHP's native string handling API.
- Based on Hamid Sarfraz's work: http://pageconfig.com/attachments/portable-utf8.php
- Based on Nicolas Grekas's work: https://github.com/tchwork/utf8
- Based on Behat's work: https://github.com/Behat/Transliterator
- Based on Sebastián Grignoli's work: https://github.com/neitanod/forceutf8
- Based on Ivan Enderlin's work: https://github.com/hoaproject/Ustring
- Based on Paragon Initiative Enterprises's work: https://github.com/paragonie/random_compat
It is written in PHP and can work without "mbstring", "iconv" or any other extra encoding-library. The benefit of Portable UTF-8 is that it is easy to use, easy to bundle.
If you like a more Object Oriented Way to edit strings, then you can take a look at voku/Stringy, it's a fork of "danielstjules/Stringy" but it used the "Portable UTF-8"-Class and some extra methodes.
// Standard library
strtoupper('fòôbàř'); // 'FòôBàř'
strlen('fòôbàř'); // 10
// mbstring
// WARNING: if you don't use a polyfill like "Portable UTF-8", you need to install the php-extension "mbstring" on your server
mb_strtoupper('fòôbàř'); // 'FÒÔBÀŘ'
mb_strlen('fòôbàř'); // '6'
// Portable UTF-8
use voku\helper\UTF8;
UTF8::strtoupper('fòôbàř'); // 'FÒÔBÀŘ'
UTF8::strlen('fòôbàř'); // '6'
// voku/Stringy
use Stringy\Stringy as S;
$stringy = S::create('fòôbàř');
$stringy->toUpperCase(); // 'FÒÔBÀŘ'
$stringy->length(); // '6'
composer require voku/portable-utf8
PHP 5 and earlier versions have no native Unicode support. PHP 6 or 7 [1], where the Unicode support has been promised, may take years. To bridge the gap, there exist several extensions like "mbstring", "iconv" and "intl".
The problem with "mbstring" and others is that most of the time you cannot ensure presence of a specific one on a server. If you rely on one of these, your application is no more portable. This problem gets even severe for open source applications that have to run on different servers with different configurations. Considering these, I decided to write a library:
- No extensions are required to run this library. Portable UTF-8 only needs PCRE library that is available by default since PHP 4.2.0 and cannot be disabled since PHP 5.3.0. "\u" modifier support in PCRE for UTF-8 handling is not a must.
- PHP 5.3 is the minimum requirement, and all later versions are fine with Portable UTF-8.
- To speed up string handling, it is recommended that you have "mbstring" or "iconv" available on your server, as well as the latest version of PCRE library
- Although Portable UTF-8 is easy to use; moving from native API to Portable UTF-8 may not be straight-forward for everyone. It is highly recommended that you do not update your scripts to include Portable UTF-8 or replace or change anything before you first know the reason and consequences. Most of the time, some native function may be all what you need.
- There is also a shim for "mbstring", "iconv" and "intl", so you can use it also on shared webspace.
Example 1: UTF8::cleanup()
$cleanUTF8String = UTF8::cleanup($string);
// ... and then save to db
Example 2: UTF8::strlen()
$string = 'string <strong>with utf-8 chars åèä</strong> - doo-bee doo-bee dooh';
echo strlen($string) . "\n<br />";
echo UTF8::strlen($string) . "\n<br />";
// will output:
// 70
// 67
$string_test1 = strip_tags($string);
$string_test2 = UTF8::strip_tags($string);
echo strlen($string_test1) . "\n<br />";
echo UTF8::strlen($string_test2) . "\n<br />";
// will output:
// 53
// 50
Example 3: UTF8::fix_utf8()
echo UTF8::fix_utf8('Düsseldorf');
echo UTF8::fix_utf8('ä');
// will output:
// Düsseldorf
// ä
- Composer is a prerequisite for running the tests.
composer install
- The tests can be executed by running this command from the root directory:
./vendor/bin/phpunit
"Portable UTF8" is free software; you can redistribute it and/or modify it under the terms of the (at your option):
Unicode handling requires tedious work to be implemented and maintained on the long run. As such, contributions such as unit tests, bug reports, comments or patches licensed under both licenses are really welcomed.