Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中英文及emoji表情字符串截断处理(一个中文2个英文) #413

Open
confidence68 opened this issue Jul 9, 2023 · 0 comments

Comments

@confidence68
Copy link
Owner

前言

本文记录一下字符串处理,知识简单,可供参考。主要是字符串截取的相关知识,假如一个中文算两个字符,一个英文或者数字算一个字符,那么如何做字符串截取呢?特别是针对名字过长的时候,需要展示点点点,那么如何优雅的截取字符串呢?

简单方式

 const getByteVal=(val, max) =>{
    let returnValue = ''
    let byteValLen = 0
    for (let i = 0; i < val.length; i++) {
      if (val[i].match(/[^\x00-\xff]/gi) != null) byteValLen += 2
      else byteValLen += 1
      if (byteValLen > max) {
        returnValue = returnValue + '...'
        break
      }
      returnValue += val[i]
    }
    return returnValue
  }

上面的方法可以针对中英文字符串截取,一个中文顶2个英文或者数字。但是假如字符串中有emoji标签等,例如和😊😂🤣,这种字符串,那么这种方法就会在截取字符串的时候会乱码。因为一个emoji表情算2个字符串,length长度是2,用长度1来累计相加,肯定是不对的。

最齐全的字符串截取方法

const  subStringEmoji =(substring, maxLen)=> {
    maxLen = maxLen || 10
    if (substring) {
      let str_cut = new String()
      let str_length = 0

      for (var i = 0; i < substring.length; ) {
        var hs = substring.charCodeAt(i)
        let a = ''
        if (hs >= 0 && hs <= 128) {
          str_length += 1
          a = substring.charAt(i)
          i++
        } else if (0xd800 <= hs && hs <= 0xdbff) {
          if (substring.length > 1) {
            var ls = substring.charCodeAt(i + 1)
            var uc = (hs - 0xd800) * 0x400 + (ls - 0xdc00) + 0x10000
            if (0x1d000 <= uc && uc <= 0x1f77f) {
              str_length += 2
              a = substring.substring(i, i + 2)
              i += 2
            } else {
              str_length += 2
              a = substring.substring(i, i + 1)
              i++
            }
          } else {
            str_length += 2
            a = substring.substring(i, i + 1)
            i++
          }
        } else if (substring.length > 1) {
          var ls = substring.charCodeAt(i + 1)
          if (ls == 0x20e3) {
            str_length += 2
            a = substring.substring(i, i + 2)
            i += 2
          } else {
            a = substring.substring(i, i + 1)
            i++
            str_length += 2
          }
        } else {
          if (0x2100 <= hs && hs <= 0x27ff) {
            str_length += 2
            a = substring.substring(i, i + 2)
            i += 2
          } else if (0x2b05 <= hs && hs <= 0x2b07) {
            str_length += 2
            a = substring.substring(i, i + 2)
            i += 2
          } else if (0x2934 <= hs && hs <= 0x2935) {
            str_length += 2
            a = substring.substring(i, i + 2)
            i += 2
          } else if (0x3297 <= hs && hs <= 0x3299) {
            str_length += 2
            a = substring.substring(i, i + 2)
            i += 2
          } else if (hs == 0xa9 || hs == 0xae || hs == 0x303d || hs == 0x3030 || hs == 0x2b55 || hs == 0x2b1c || hs == 0x2b1b || hs == 0x2b50) {
            str_length += 2
            a = substring.substring(i, i + 2)
            i += 2
          } else {
            str_length += 1
            a = substring.substring(i, i + 1)
            i++
          }
        }

        //字符串处理
        if (str_length > maxLen) {
          str_cut = str_cut.concat('...')
          break
        } else {
          str_cut = str_cut.concat(a)
        }
      }
      return str_cut
    }
    return ''
  }

这种方式截取字符串,中英文及emoji表情,全字符串截取。利用Unicode 方式来实现。

扩展

关于emoji表情,其实也是有一些正则判断的,我之前文章有写过,JavaScript RegExp 常用的手机和邮箱正则(常用正则),关于emoji表情正则,特殊字符正则等等,都有。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant