Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character problems with character encoding of iso-8859-1 sites #1543

Closed
darkcattz opened this issue Apr 5, 2018 · 23 comments · Fixed by #4698
Closed

Character problems with character encoding of iso-8859-1 sites #1543

darkcattz opened this issue Apr 5, 2018 · 23 comments · Fixed by #4698
Assignees
Labels
type: bug type: unexpected behavior User expected result, but got another

Comments

@darkcattz
Copy link

darkcattz commented Apr 5, 2018

  • Operating System: Mac OS
  • Cypress Version: Last
  • Browser Version: Chrome and Electron

Hello,

My site is not yet in utf8 and i have Character problems for accented characters (doctype iso-8859-1) when I use cypress. It's not really blocking except when doing regular expression searches :

HTML HEADER CHARSET :

meta http-equiv="content-type" content="text/html; charset=iso-8859-1"

Result :

Les sites ont bien �t� supprim�s
� Tous droits r�serv�s � 

Thanks
Regards

@jennifer-shehane jennifer-shehane added the type: feature New feature that does not currently exist label Apr 9, 2018
@victorjspinto
Copy link

victorjspinto commented Jul 2, 2018

Same problem with me.

i'm trying to write some test to and old platform with charset iso-8859-1

@Longtrainz
Copy link

Longtrainz commented Aug 19, 2018

Same problem with windows-1251

@cannibalcow
Copy link

Why is this labeled as a feature? Is this not a bug?

@jennifer-shehane jennifer-shehane added the stage: proposal 💡 No work has been done of this issue label Oct 20, 2018
@joseasouza
Copy link

I'm also facing this issue. Trying to write tests in a old app that uses ISO-8859-1. Any workaround?

@jennifer-shehane jennifer-shehane changed the title Character problems with no utf8 site Character problems with character encoding of iso-8859-1 sites Jan 15, 2019
@jennifer-shehane
Copy link
Member

I'm having a hard time replicating this behavior with the example characters I've tried so far. Could any of you provide the exact html content that will print as � within Cypress?

@jennifer-shehane jennifer-shehane added stage: needs information Not enough info to reproduce the issue type: unexpected behavior User expected result, but got another and removed stage: proposal 💡 No work has been done of this issue type: feature New feature that does not currently exist labels Jan 15, 2019
@joseasouza
Copy link

joseasouza commented Jan 15, 2019

Hello @jennifer-shehane thanks for your time. I've uploaded the repository https://github.com/dudevictor/cypress-character-problem that shows this issue.

Besides the file must be encoded with iso-8859-1, also I've noticed that the server must returns the content-type header with charset=iso-8859-1 so that the issue occurs.

I noted that exists an app called 'runner' that shows the content inside the cypress app. If the solution won't be too complex, you could give me some directions and I could try to solve it

@jennifer-shehane
Copy link
Member

@dudevictor Thank you! All of the characters I was trying previously were working, so this is extra helpful. I'd be happy to help with any directions you need working on the repo if you have any leads.

From everything I've read, the iso-8859-1 is meant to be parsed as windows-1252 per the spec.

I really thought this may be the problem. So I looked for document.characterSet, which prints the character encoding used to render the page. I printed this within Cypress within the application under test and also within the application, and it prints the accurate windows-1252.

cy.document().its('characterSet').should('include', 'windows-1252') // passes

The content-type also seems to be printing fine in the Network panel content-type: text/html;charset=iso-8859-1

I suppose one thing that does stand out is the content-encoding in Cypress of Content-Encoding: gzip, which does not exist when visiting on localhost, but I don't think this should be related to the issue. And now I'm at a dead end.

@jennifer-shehane
Copy link
Member

jennifer-shehane commented Jan 16, 2019

Some more thoughts on this. The prevailing theory now is that since we are gzipping and sending chunked content, the chunking may think it is of one charset when it should be set to another - this may be causing the content to be chunked at the incorrect byte size (since charsets have different byte sizes).

That may not be the greatest explanation, but basically we think there is something going wrong in the chunking.

@opensas
Copy link

opensas commented Mar 17, 2019

I have a similar issue, I reported it here, added a very simple html page to reproduce the bug:

<html>
<head>
  <meta http-equiv="Content-Type" content="text/html">
  <meta charset="windows-1252">
</head>
<body>
  <h1>Character encoding failing test: á é í ó ú ñ</h1>
</body>

If this is the case, can anybody point me in a workaround to sidestep this issue until it gets fixed?

I'm testing for the presence of the following text in a span like this:

    cy.get('div.flash_warning span')
      .should('have.text', 'El código de la aplicacion no puede estar vacío.')

Which is failing because of the broken encoding.

Is there some way to test for something like this?

    cy.get('div.flash_warning span')
      .should('have.text', 'El c?digo de la aplicacion no puede estar vac?o.')

That would allow me to work around this issue, and I could easily build a helper function that would replace the troublesome characters. I hope I made myself clear.


update: this is the best workaound I could find so far, if anybody has a better alternative I'd be grateful

describe('playing with regular expressions', () => {
  it.only('should match by regular expression', () => {
    cy.visit('http://localhost/metaSSC/cypress/regexp.html')
    cy.get('div.flash_warning span')
      .should('have.text', 'El registro no ha podido ser dado de alta.')
    cy.get('div.flash_error span')
      .contains(/^El c.digo de la aplicacion no puede estar vac.o\.$/) // match span text by regexp
  })
})

also asked at SO

@opensas
Copy link

opensas commented Mar 17, 2019

ON the other hand, I noticed this issue is labeled like stage: needs information, I would gladly help with this if anybody can tell me what information is missing

@jennifer-shehane
Copy link
Member

Hey @opensas - I do believe this issue is likely the same.

Honestly, this should be labeled as 'ready for work' on our side, since we do have a reproducible example. The cause it still unknown though, although we had a theory.

The fact that this doesn't run correctly in Electron only is a helpful new piece of information.

I will close #3725 as a duplicate.

@jennifer-shehane jennifer-shehane added stage: ready for work The issue is reproducible and in scope type: bug and removed stage: needs information Not enough info to reproduce the issue labels Mar 18, 2019
@opensas
Copy link

opensas commented Mar 18, 2019

I will close #3725 as a duplicate.

Sure, go ahead, I do hope you can work it out. Please let me know if there's anything I can do to help.

BTW, can anybody give a clue on how to implement a custom extension to cy like this:

cy.get('div.flash_error span')
      .containsWithEncoding('El código de la aplicacion no puede estar vacío.')

It would just build a regular expresion replacing every problematic char with '.'

thanks a lot

@jennifer-shehane
Copy link
Member

@opensas Look into our custom command documentation

@opensas
Copy link

opensas commented Mar 20, 2019

thanks, just for the record, this is the workaround I developed:

Cypress.Commands.add('containsLike', {
  prevSubject: true
}, (subject, search, chars) => {

  chars = chars || 'áéíóúñÁÉÍÓÚÑ'
  if (!Array.isArray(chars)) chars = chars.toString().split('')

  chars.forEach( char => {
    const repAllChars = new RegExp(char, 'g') // see: https://stackoverflow.com/a/17606289/47633
    search = search.replace(repAllChars, '.')
  })

  const regExp = new RegExp('^' + search + '$')
  return cy.wrap(subject).contains(regExp)
})

and I use it like this:

describe('my first test', () => {
  it.only('should pass', () => {
    cy.visit('http://localhost/xxxx/yyy.asp')
      .get('div.flash_error span')
      .containsLike('El código de la aplicacion no puede estar vacío.')
// it runs .contains(/^El c.digo de la aplicacion no puede estar vac.o\.$/)
  })
})

@HeatherFlux
Copy link

Hey this has to due with obstructive code. To fix the issue in your configuration file use, "modifyObstructiveCode": false . This should fix any issues with weird charsets.

@opensas
Copy link

opensas commented Mar 27, 2019

I can confirm that setting modifyObstructiveCode to false does NOT fixes the issue, this is my cypress.json:

{
  "modifyObstructiveCode": false,
  "browser": {
    "modifyObstructiveCode": false
  }
}

(didn't know if the settings goes on the root level or inside browser)

and I also tried starting cypress with:

cypress open --config modifyObstructiveCode=false

None of them seemed to work

@HeatherFlux
Copy link

Hmmm, sorry then. For me I was having issues with some chars not being translated properly when running my application. https://docs.cypress.io/guides/references/configuration.html#modifyObstructiveCode
This section was able to help me solve the issue I was having with �tï type of char in the translation of bsdatepicker.

My config was laid out as such:
{ "modifyObstructiveCode": false, }

@nagyzso94
Copy link

I have the same issue. Has anyone found a solution to this?
Tried to set the modifyObstructiveCode to false in cypress.json but that didnt help.

@simonmeggle
Copy link

Are there any new on this topic? I also have a web site which is comes in win1252; the charset gets utf8 in cypress. The german umlauts (ä,ö,ü) on this site are all displayed wrong (e.g. �).
Setting modifyObstructiveCode to false also did not work for me.

@flotwig flotwig self-assigned this Jul 11, 2019
@cypress-bot cypress-bot bot added stage: work in progress stage: needs review The PR code is done & tested, needs review and removed stage: ready for work The issue is reproducible and in scope stage: work in progress stage: needs review The PR code is done & tested, needs review labels Jul 11, 2019
@cypress-bot cypress-bot bot added stage: pending release and removed stage: needs review The PR code is done & tested, needs review labels Jul 15, 2019
@cypress-bot
Copy link
Contributor

cypress-bot bot commented Jul 15, 2019

The code for this is done in cypress-io/cypress#4698, but has yet to be released.
We'll update this issue and reference the changelog when it's released.

@simonmeggle
Copy link

Great to hear that you are working on the charset issue. Ad far as I can see, #4698 does not cover the win-1252 charset. Is there any plan to do this also? Thanks...

@flotwig
Copy link
Contributor

flotwig commented Jul 16, 2019

Hey @simonmeggle, it does also fix win-1252 charset, along with any other charset you're likely to experience on the web (full list: https://github.com/ashtuchkin/iconv-lite/wiki/Supported-Encodings). I'll update the issue comment to clarify :)

@cypress-bot
Copy link
Contributor

cypress-bot bot commented Jul 29, 2019

Released in 3.4.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug type: unexpected behavior User expected result, but got another
Projects
None yet
Development

Successfully merging a pull request may close this issue.