您的位置：首页 > 编程语言 > PHP开发

PHP截取中文字符串方法详解

2016-12-23 11:35 801 查看

php自带的一个截取字符串的函数substr，但其只能处理英文、数字却不能截取中文混排的。如果需要在PHP中进行GB2312与UTF-8的互换，需要php_iconv.dll的支持（PHP4中包含此文件）。php5内建支持iconv，更加方便了。不管是uft-8编码转换为gb2312,还是将 gb2312 转换为 uft-8，PHP4.3.1以后的iconv函数很方便，只是需要自己写一个uft8到unicode的转换函数。处理中文的字符串截取函数mb_substr()是在PHP4.0.6后引入的，其自身支持不同编码字符的处理，所以一些新的PHP框架其实已经支持mb_substr()了。

处理函数汇总：

[php] view
plain copy

function cutstr($string, $length, $dot = ' ...')

{   //截字符串函数    GBK,UTF8

    $charset = 'utf-8';



    if(strlen($string) <= $length)

    {   //边界条件

        return $string;

    }



    $string = str_replace(array('&', '"', '<', '>'), array('&', '"', '<', '>'), $string);



    $strcut = '';

    if(strtolower($charset) == 'utf-8') {



        $n = $tn = $noc = 0;

        while($n < strlen($string)) {



        $t = ord($string[$n]);

        if($t == 9 || $t == 10 || (32 <= $t && $t <= 126)) {

            $tn = 1; $n++; $noc++;

        } elseif(194 <= $t && $t <= 223) {

            $tn = 2; $n += 2; $noc += 2;

        } elseif(224 <= $t && $t <= 239) {

            $tn = 3; $n += 3; $noc += 2;

        } elseif(240 <= $t && $t <= 247) {

            $tn = 4; $n += 4; $noc += 2;

        } elseif(248 <= $t && $t <= 251) {

            $tn = 5; $n += 5; $noc += 2;

        } elseif($t == 252 || $t == 253) {

            $tn = 6; $n += 6; $noc += 2;

        } else {

            $n++;

        }



        if($noc >= $length) {

            break;

        }



    }

    if($noc > $length)

    {

        $n -= $tn;

    }



    $strcut = substr($string, 0, $n);



    } else

    {

        for($i = 0; $i < $length; $i++)

        {

            $strcut .= ord($string[$i]) > 127 ? $string[$i].$string[++$i] : $string[$i];

        }

    }



    $strcut = str_replace(array('&', '"', '<', '>'), array('&', '"', '<', '>'), $strcut);



    return $strcut.$dot;

}

方法二：

[php] view
plain copy

function len($string, $sublen = 80, $etc = '...',$break_words = false, $middle = false)

{

$start=0;

$code="UTF-8";

       if($code == 'UTF-8')

   {

       $pa = "/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|\xe0[\xa0-\xbf][\x80-\xbf]|[\xe1-\xef][\x80-\xbf][\x80-\xbf]|\xf0[\x90-\xbf][\x80-\xbf][\x80-\xbf]|[\xf1-\xf7][\x80-\xbf][\x80-\xbf][\x80-\xbf]/";

       preg_match_all($pa, $string, $t_string);

       if(count($t_string[0]) - $start > $sublen) return join('', array_slice($t_string[0], $start, $sublen))."...";

       return join('', array_slice($t_string[0], $start, $sublen));

   }

   else

   {

       $start = $start*2;

       $sublen = $sublen*2;

       $strlen = strlen($string);

       $tmpstr = '';

       for($i=0; $i<$strlen; $i++)

       {

           if($i>=$start && $i<($start+$sublen))

           {

               if(ord(substr($string, $i, 1))>129)

               {

                   $tmpstr.= substr($string, $i, 2);

               }

               else

               {

                   $tmpstr.= substr($string, $i, 1);

               }

           }

           if(ord(substr($string, $i, 1))>129) $i++;

       }

       if(strlen($tmpstr)<$strlen ) $tmpstr.= "...";

       return $tmpstr;

   }

}

方法三（兼容mb_substr）：

[php] view
plain copy

/**

+----------------------------------------------------------

* 字符串截取，支持中文和其他编码

+----------------------------------------------------------

* @static

* @access public

+----------------------------------------------------------

* @param string $str 需要转换的字符串

* @param string $start 开始位置

* @param string $length 截取长度

* @param string $charset 编码格式

* @param string $suffix 截断显示字符

+----------------------------------------------------------

* @return string

+----------------------------------------------------------

*/

function msubstr($str, $start, $length, $charset="utf-8", $suffix=true)

{

    if(function_exists("mb_substr")){

        $slice = mb_substr($str, $start, $length, $charset);

    }elseif(function_exists('iconv_substr')) {

        $slice = iconv_substr($str,$start,$length,$charset);

        if(false === $slice) {

            $slice = '';

        }

    }else{

        $re['utf-8']   = "/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf]{2}|[\xf0-\xff][\x80-\xbf]{3}/";

        $re['gb2312'] = "/[\x01-\x7f]|[\xb0-\xf7][\xa0-\xfe]/";

        $re['gbk']    = "/[\x01-\x7f]|[\x81-\xfe][\x40-\xfe]/";

        $re['big5']   = "/[\x01-\x7f]|[\x81-\xfe]([\x40-\x7e]|\xa1-\xfe])/";

        preg_match_all($re[$charset], $str, $match);

        $slice = join("",array_slice($match[0], $start, $length));

    }

    return $suffix ? $slice.'...' : $slice;

}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： utf-8 PHP 函数字符串截取

相关文章推荐

新的分享

章节导航