Character problems with character encoding of iso-8859-1 sites #1543

darkcattz · 2018-04-05T19:53:05Z

Operating System: Mac OS
Cypress Version: Last
Browser Version: Chrome and Electron

Hello,

My site is not yet in utf8 and i have Character problems for accented characters (doctype iso-8859-1) when I use cypress. It's not really blocking except when doing regular expression searches :

HTML HEADER CHARSET :

meta http-equiv="content-type" content="text/html; charset=iso-8859-1"

Result :

Les sites ont bien ï¿½tï¿½ supprimï¿½s
ï¿½ Tous droits rï¿½servï¿½s ï¿½

Thanks
Regards

The text was updated successfully, but these errors were encountered:

victorjspinto · 2018-07-02T20:32:17Z

Same problem with me.

i'm trying to write some test to and old platform with charset iso-8859-1

Longtrainz · 2018-08-19T11:26:25Z

Same problem with windows-1251

cannibalcow · 2018-09-14T10:00:32Z

Why is this labeled as a feature? Is this not a bug?

joseasouza · 2019-01-14T16:56:01Z

I'm also facing this issue. Trying to write tests in a old app that uses ISO-8859-1. Any workaround?

jennifer-shehane · 2019-01-15T07:29:13Z

I'm having a hard time replicating this behavior with the example characters I've tried so far. Could any of you provide the exact html content that will print as ï¿½ within Cypress?

joseasouza · 2019-01-15T12:30:20Z

Hello @jennifer-shehane thanks for your time. I've uploaded the repository https://github.com/dudevictor/cypress-character-problem that shows this issue.

Besides the file must be encoded with iso-8859-1, also I've noticed that the server must returns the content-type header with charset=iso-8859-1 so that the issue occurs.

I noted that exists an app called 'runner' that shows the content inside the cypress app. If the solution won't be too complex, you could give me some directions and I could try to solve it

jennifer-shehane · 2019-01-15T16:28:52Z

@dudevictor Thank you! All of the characters I was trying previously were working, so this is extra helpful. I'd be happy to help with any directions you need working on the repo if you have any leads.

From everything I've read, the iso-8859-1 is meant to be parsed as windows-1252 per the spec.

I really thought this may be the problem. So I looked for document.characterSet, which prints the character encoding used to render the page. I printed this within Cypress within the application under test and also within the application, and it prints the accurate windows-1252.

cy.document().its('characterSet').should('include', 'windows-1252') // passes

The content-type also seems to be printing fine in the Network panel content-type: text/html;charset=iso-8859-1

I suppose one thing that does stand out is the content-encoding in Cypress of Content-Encoding: gzip, which does not exist when visiting on localhost, but I don't think this should be related to the issue. And now I'm at a dead end.

jennifer-shehane · 2019-01-16T10:27:47Z

Some more thoughts on this. The prevailing theory now is that since we are gzipping and sending chunked content, the chunking may think it is of one charset when it should be set to another - this may be causing the content to be chunked at the incorrect byte size (since charsets have different byte sizes).

That may not be the greatest explanation, but basically we think there is something going wrong in the chunking.

opensas · 2019-03-17T04:16:18Z

I have a similar issue, I reported it here, added a very simple html page to reproduce the bug:

<html>
<head>
  <meta http-equiv="Content-Type" content="text/html">
  <meta charset="windows-1252">
</head>
<body>
  <h1>Character encoding failing test: á é í ó ú ñ</h1>
</body>

If this is the case, can anybody point me in a workaround to sidestep this issue until it gets fixed?

I'm testing for the presence of the following text in a span like this:

    cy.get('div.flash_warning span')
      .should('have.text', 'El código de la aplicacion no puede estar vacío.')

Which is failing because of the broken encoding.

Is there some way to test for something like this?

    cy.get('div.flash_warning span')
      .should('have.text', 'El c?digo de la aplicacion no puede estar vac?o.')

That would allow me to work around this issue, and I could easily build a helper function that would replace the troublesome characters. I hope I made myself clear.

update: this is the best workaound I could find so far, if anybody has a better alternative I'd be grateful

describe('playing with regular expressions', () => {
  it.only('should match by regular expression', () => {
    cy.visit('http://localhost/metaSSC/cypress/regexp.html')
    cy.get('div.flash_warning span')
      .should('have.text', 'El registro no ha podido ser dado de alta.')
    cy.get('div.flash_error span')
      .contains(/^El c.digo de la aplicacion no puede estar vac.o\.$/) // match span text by regexp
  })
})

also asked at SO

opensas · 2019-03-17T04:23:42Z

ON the other hand, I noticed this issue is labeled like stage: needs information, I would gladly help with this if anybody can tell me what information is missing

jennifer-shehane · 2019-03-18T04:05:08Z

Hey @opensas - I do believe this issue is likely the same.

Honestly, this should be labeled as 'ready for work' on our side, since we do have a reproducible example. The cause it still unknown though, although we had a theory.

The fact that this doesn't run correctly in Electron only is a helpful new piece of information.

I will close #3725 as a duplicate.

opensas · 2019-03-18T04:34:18Z

I will close #3725 as a duplicate.

Sure, go ahead, I do hope you can work it out. Please let me know if there's anything I can do to help.

BTW, can anybody give a clue on how to implement a custom extension to cy like this:

cy.get('div.flash_error span')
      .containsWithEncoding('El código de la aplicacion no puede estar vacío.')

It would just build a regular expresion replacing every problematic char with '.'

thanks a lot

jennifer-shehane · 2019-03-18T04:55:13Z

@opensas Look into our custom command documentation

opensas · 2019-03-20T06:02:57Z

thanks, just for the record, this is the workaround I developed:

Cypress.Commands.add('containsLike', {
  prevSubject: true
}, (subject, search, chars) => {

  chars = chars || 'áéíóúñÁÉÍÓÚÑ'
  if (!Array.isArray(chars)) chars = chars.toString().split('')

  chars.forEach( char => {
    const repAllChars = new RegExp(char, 'g') // see: https://stackoverflow.com/a/17606289/47633
    search = search.replace(repAllChars, '.')
  })

  const regExp = new RegExp('^' + search + '$')
  return cy.wrap(subject).contains(regExp)
})

and I use it like this:

describe('my first test', () => {
  it.only('should pass', () => {
    cy.visit('http://localhost/xxxx/yyy.asp')
      .get('div.flash_error span')
      .containsLike('El código de la aplicacion no puede estar vacío.')
// it runs .contains(/^El c.digo de la aplicacion no puede estar vac.o\.$/)
  })
})

HeatherFlux · 2019-03-26T18:03:33Z

Hey this has to due with obstructive code. To fix the issue in your configuration file use, "modifyObstructiveCode": false . This should fix any issues with weird charsets.

opensas · 2019-03-27T04:30:20Z

I can confirm that setting modifyObstructiveCode to false does NOT fixes the issue, this is my cypress.json:

{
  "modifyObstructiveCode": false,
  "browser": {
    "modifyObstructiveCode": false
  }
}

(didn't know if the settings goes on the root level or inside browser)

and I also tried starting cypress with:

cypress open --config modifyObstructiveCode=false

None of them seemed to work

HeatherFlux · 2019-03-27T12:14:32Z

Hmmm, sorry then. For me I was having issues with some chars not being translated properly when running my application. https://docs.cypress.io/guides/references/configuration.html#modifyObstructiveCode
This section was able to help me solve the issue I was having with ï¿½tï type of char in the translation of bsdatepicker.

My config was laid out as such:
{ "modifyObstructiveCode": false, }

nagyzso94 · 2019-05-16T07:36:21Z

I have the same issue. Has anyone found a solution to this?
Tried to set the modifyObstructiveCode to false in cypress.json but that didnt help.

simonmeggle · 2019-07-11T09:47:28Z

Are there any new on this topic? I also have a web site which is comes in win1252; the charset gets utf8 in cypress. The german umlauts (ä,ö,ü) on this site are all displayed wrong (e.g. ï¿½).
Setting modifyObstructiveCode to false also did not work for me.

cypress-bot · 2019-07-15T16:59:03Z

The code for this is done in cypress-io/cypress#4698, but has yet to be released.
We'll update this issue and reference the changelog when it's released.

simonmeggle · 2019-07-16T07:32:46Z

Great to hear that you are working on the charset issue. Ad far as I can see, #4698 does not cover the win-1252 charset. Is there any plan to do this also? Thanks...

flotwig · 2019-07-16T14:59:05Z

Hey @simonmeggle, it does also fix win-1252 charset, along with any other charset you're likely to experience on the web (full list: https://github.com/ashtuchkin/iconv-lite/wiki/Supported-Encodings). I'll update the issue comment to clarify :)

cypress-bot · 2019-07-29T20:43:44Z

Released in 3.4.1.

jennifer-shehane added the type: feature New feature that does not currently exist label Apr 9, 2018

jennifer-shehane added the stage: proposal 💡 No work has been done of this issue label Oct 20, 2018

jennifer-shehane changed the title ~~Character problems with no utf8 site~~ Character problems with character encoding of iso-8859-1 sites Jan 15, 2019

jennifer-shehane added stage: needs information Not enough info to reproduce the issue type: unexpected behavior User expected result, but got another and removed stage: proposal 💡 No work has been done of this issue type: feature New feature that does not currently exist labels Jan 15, 2019

jennifer-shehane mentioned this issue Mar 18, 2019

cypress runner ignoring charset=“windows-1252” html tag #3725

Closed

jennifer-shehane added stage: ready for work The issue is reproducible and in scope type: bug and removed stage: needs information Not enough info to reproduce the issue labels Mar 18, 2019

flotwig self-assigned this Jul 11, 2019

flotwig mentioned this issue Jul 11, 2019

Fix a variety of character encoding issues #4698

Merged

8 tasks

cypress-bot bot added stage: work in progress stage: needs review The PR code is done & tested, needs review and removed stage: ready for work The issue is reproducible and in scope stage: work in progress stage: needs review The PR code is done & tested, needs review labels Jul 11, 2019

brian-mann closed this as completed in #4698 Jul 15, 2019

cypress-bot bot added stage: pending release and removed stage: needs review The PR code is done & tested, needs review labels Jul 15, 2019

cypress-bot bot removed the stage: pending release label Jul 29, 2019

snyk-bot mentioned this issue Nov 1, 2019

[Snyk] Upgrade cypress from 3.4.1 to 3.5.0 ngChile/ngx-devkit-cypress-builder#14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character problems with character encoding of iso-8859-1 sites #1543

Character problems with character encoding of iso-8859-1 sites #1543

darkcattz commented Apr 5, 2018 •

edited by jennifer-shehane

Loading

victorjspinto commented Jul 2, 2018 •

edited by jennifer-shehane

Loading

Longtrainz commented Aug 19, 2018 •

edited by jennifer-shehane

Loading

cannibalcow commented Sep 14, 2018

joseasouza commented Jan 14, 2019

jennifer-shehane commented Jan 15, 2019

joseasouza commented Jan 15, 2019 •

edited

Loading

jennifer-shehane commented Jan 15, 2019

jennifer-shehane commented Jan 16, 2019 •

edited

Loading

opensas commented Mar 17, 2019 •

edited

Loading

opensas commented Mar 17, 2019

jennifer-shehane commented Mar 18, 2019

opensas commented Mar 18, 2019

jennifer-shehane commented Mar 18, 2019

opensas commented Mar 20, 2019

HeatherFlux commented Mar 26, 2019

opensas commented Mar 27, 2019 •

edited

Loading

HeatherFlux commented Mar 27, 2019

nagyzso94 commented May 16, 2019

simonmeggle commented Jul 11, 2019

cypress-bot bot commented Jul 15, 2019

simonmeggle commented Jul 16, 2019

flotwig commented Jul 16, 2019 •

edited

Loading

cypress-bot bot commented Jul 29, 2019

Character problems with character encoding of iso-8859-1 sites #1543

Character problems with character encoding of iso-8859-1 sites #1543

Comments

darkcattz commented Apr 5, 2018 • edited by jennifer-shehane Loading

victorjspinto commented Jul 2, 2018 • edited by jennifer-shehane Loading

Longtrainz commented Aug 19, 2018 • edited by jennifer-shehane Loading

cannibalcow commented Sep 14, 2018

joseasouza commented Jan 14, 2019

jennifer-shehane commented Jan 15, 2019

joseasouza commented Jan 15, 2019 • edited Loading

jennifer-shehane commented Jan 15, 2019

jennifer-shehane commented Jan 16, 2019 • edited Loading

opensas commented Mar 17, 2019 • edited Loading

opensas commented Mar 17, 2019

jennifer-shehane commented Mar 18, 2019

opensas commented Mar 18, 2019

jennifer-shehane commented Mar 18, 2019

opensas commented Mar 20, 2019

HeatherFlux commented Mar 26, 2019

opensas commented Mar 27, 2019 • edited Loading

HeatherFlux commented Mar 27, 2019

nagyzso94 commented May 16, 2019

simonmeggle commented Jul 11, 2019

cypress-bot bot commented Jul 15, 2019

simonmeggle commented Jul 16, 2019

flotwig commented Jul 16, 2019 • edited Loading

cypress-bot bot commented Jul 29, 2019

darkcattz commented Apr 5, 2018 •

edited by jennifer-shehane

Loading

victorjspinto commented Jul 2, 2018 •

edited by jennifer-shehane

Loading

Longtrainz commented Aug 19, 2018 •

edited by jennifer-shehane

Loading

joseasouza commented Jan 15, 2019 •

edited

Loading

jennifer-shehane commented Jan 16, 2019 •

edited

Loading

opensas commented Mar 17, 2019 •

edited

Loading

opensas commented Mar 27, 2019 •

edited

Loading

flotwig commented Jul 16, 2019 •

edited

Loading