Move sub-pages of "Layout Tests" from Google Sites to Markdown.

The page on LayoutTests in dev.chromium.org [1] has already been moved to Markdown. This CL moves its sub-pages to Markdown as well. [1] https://sites.google.com/a/chromium.org/dev/developers/testing/webkit-layout-tests BUG= Review-Url: https://codereview.chromium.org/2488463004 Cr-Commit-Position: refs/heads/master@{#430981}
marcosholgado · Nov 9, 2016 · d8a2507 · d8a2507
1 parent 7774315
commit d8a2507
Show file tree

Hide file tree

Showing 5 changed files with 573 additions and 7 deletions.
diff --git a/docs/testing/identifying_tests_that_depend_on_order.md b/docs/testing/identifying_tests_that_depend_on_order.md
@@ -0,0 +1,80 @@
+
+# Fixing layout test flakiness
+
+We'd like to stamp out all the tests that have ordering dependencies. This helps
+make the tests more reliable and, eventually, will make it so we can run tests
+in a random order and avoid new ordering dependencies being introduced. To get
+there, we need to weed out and fix all the existing ordering dependencies.
+
+## Diagnosing test ordering flakiness
+
+These are steps for diagnosing ordering flakiness once you have a test that you
+believe depends on an earlier test running.
+
+### Bisect test ordering
+
+1. Run the tests such that the test in question fails.
+2. Run `./Tools/Scripts/print-test-ordering` and save the output to a file. This
+   outputs the tests run in the order they were run on each content_shell
+   instance.
+3. Create a file that contains only the tests run on that worker in the same
+   order as in your saved output file. The last line in the file should be the
+   failing test.
+4. Run
+   `./Tools/Scripts/bisect-test-ordering --test-list=path/to/file/from/step/3`
+
+The bisect-test-ordering script should spit out a list of tests at the end that
+causes the test to fail.
+
+*** promo
+At the moment bisect-test-ordering only allows you to find tests that fail due
+to a previous test running. It's a small change to the script to make it work
+for tests that pass due to a previous test running (i.e. to figure out which
+test it depends on running before it). Contact ojan@chromium if you're
+interested in adding that feature to the script.
+***
+
+### Manual bisect
+
+Instead of running `bisect-test-ordering`, you can manually do the work of step
+4 above.
+
+1. `run-webkit-tests --child-processes=1 --order=none --test-list=path/to/file/from/step/3`
+2. If the test doesn't fail here, then the test itself is probably just flaky.
+   If it does, remove some lines from the file and repeat step 1. Continue
+   repeating until you've found the dependency. If the test fails when run by
+   itself, but passes on the bots, that means that it depends on another test to
+   pass. In this case, you need to generate the list of tests run by
+   `run-webkit-tests --order=natural` and repeat this process to find which test
+   causes the test in question to *pass* (e.g.
+   [crbug.com/262793](https://crbug.com/262793)).
+3. File a bug and give it the
+   [LayoutTestOrdering](https://crbug.com/?q=label:LayoutTestOrdering) label,
+   e.g. [crbug.com/262787](https://crbug.com/262787) or
+   [crbug.com/262791](https://crbug.com/262791).
+
+### Finding test ordering flakiness
+
+#### Run tests in a random order and diagnose failures
+
+1. Run `run-webkit-tests --order=random --no-retry`
+2. Run `./Tools/Scripts/print-test-ordering` and save the output to a file. This
+   outputs the tests run in the order they were run on each content_shell
+   instance.
+3. Run the diagnosing steps from above to figure out which tests
+
+Run `run-webkit-tests --run-singly --no-retry`. This starts up a new
+content_shell instance for each test. Tests that fail when run in isolation but
+pass when run as part of the full test suite represent some state that we're not
+properly resetting between test runs or some state that we're not properly
+setting when starting up content_shell. You might want to run with
+`--time-out-ms=60000` to weed out tests that timeout due to waiting on
+content_shell startup time.
+
+#### Diagnose especially flaky tests
+
+1. Load
+   https://test-results.appspot.com/dashboards/overview.html#group=%40ToT%20Blink&flipCount=12
+2. Tweak the flakiness threshold to the desired level of flakiness.
+3. Click on *webkit_tests* to get that list of flaky tests.
+4. Diagnose the source of flakiness for that test.
diff --git a/docs/testing/layout_test_expectations.md b/docs/testing/layout_test_expectations.md
@@ -0,0 +1,298 @@
+# Layout Test Expectations and Baselines
+
+
+The primary function of the LayoutTests is as a regression test suite; this
+means that, while we care about whether a page is being rendered correctly, we
+care more about whether the page is being rendered the way we expect it to. In
+other words, we look more for changes in behavior than we do for correctness.
+
+[TOC]
+
+All layout tests have "expected results", or "baselines", which may be one of
+several forms. The test may produce one or more of:
+
+* A text file containing JavaScript log messages.
+* A text rendering of the Render Tree.
+* A screen capture of the rendered page as a PNG file.
+* WAV files of the audio output, for WebAudio tests.
+
+For any of these types of tests, there are files checked into the LayoutTests
+directory named `-expected.{txt,png,wav}`. Lastly, we also support the concept
+of "reference tests", which check that two pages are rendered identically
+(pixel-by-pixel). As long as the two tests' output match, the tests pass. For
+more on reference tests, see
+[Writing ref tests](https://trac.webkit.org/wiki/Writing%20Reftests).
+
+## Failing tests
+
+When the output doesn't match, there are two potential reasons for it:
+
+* The port is performing "correctly", but the output simply won't match the
+  generic version. The usual reason for this is for things like form controls,
+  which are rendered differently on each platform.
+* The port is performing "incorrectly" (i.e., the test is failing).
+
+In both cases, the convention is to check in a new baseline (aka rebaseline),
+even though that file may be codifying errors. This helps us maintain test
+coverage for all the other things the test is testing while we resolve the bug.
+
+*** promo
+If a test can be rebaselined, it should always be rebaselined instead of adding
+lines to TestExpectations.
+***
+
+Bugs at [crbug.com](https://crbug.com) should track fixing incorrect behavior,
+not lines in
+[TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations). If a
+test is never supposed to pass (e.g. it's testing Windows-specific behavior, so
+can't ever pass on Linux/Mac), move it to the
+[NeverFixTests](../../third_party/WebKit/LayoutTests/NeverFixTests) file. That
+gets it out of the way of the rest of the project.
+
+There are some cases where you can't rebaseline and, unfortunately, we don't
+have a better solution than either:
+
+1. Reverting the patch that caused the failure, or
+2. Adding a line to TestExpectations and fixing the bug later.
+
+In this case, **reverting the patch is strongly preferred**.
+
+These are the cases where you can't rebaseline:
+
+* The test is a reference test.
+* The test gives different output in release and debug; in this case, generate a
+  baseline with the release build, and mark the debug build as expected to fail.
+* The test is flaky, crashes or times out.
+* The test is for a feature that hasn't yet shipped on some platforms yet, but
+  will shortly.
+
+## Handling flaky tests
+
+The
+[flakiness dashboard](https://test-results.appspot.com/dashboards/flakiness_dashboard.html)
+is a tool for understanding a test’s behavior over time.
+Originally designed for managing flaky tests, the dashboard shows a timeline
+view of the test’s behavior over time. The tool may be overwhelming at first,
+but
+[the documentation](https://dev.chromium.org/developers/testing/flakiness-dashboard)
+should help. Once you decide that a test is truly flaky, you can suppress it
+using the TestExpectations file, as described below.
+
+We do not generally expect Chromium sheriffs to spend time trying to address
+flakiness, though.
+
+## How to rebaseline
+
+Since baselines themselves are often platform-specific, updating baselines in
+general requires fetching new test results after running the test on multiple
+platforms.
+
+### Rebaselining using try jobs
+
+The recommended way to rebaseline for a currently-in-progress CL is to use
+results from try jobs. To do this:
+
+1. Upload a CL with changes in Blink source code or layout tests.
+2. Trigger Blink try jobs. The bots to use are the release builders on
+   [tryserver.blink](https://build.chromium.org/p/tryserver.blink/builders).
+   This can be done via the code review Web UI or via `git cl try`.
+3. Wait for all try jobs to finish.
+4. Run `third_party/WebKit/Tools/Scripts/webkit-patch rebaseline-cl` to fetch
+   new baselines.
+5. Commit the new baselines and upload a new patch.
+
+This way, the new baselines can be reviewed along with the changes, which helps
+the reviewer verify that the new baselines are correct. It also means that there
+is no period of time when the layout test results are ignored.
+
+The tests which `webkit-patch rebaseline-cl` tries to download new baselines for
+depends on its arguments.
+
+* By default, it tries to download all baselines for tests that failed in the
+  try jobs.
+* If you pass `--only-changed-tests`, then only tests modified in the CL will be
+  considered.
+* You can also explicitly pass a list of test names, and then just those tests
+  will be rebaselined.
+
+### Rebaselining with rebaseline-o-matic
+
+If the test is not already listed in
+[TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations), you
+can mark it as `[ NeedsRebaseline ]`. The
+[rebaseline-o-matic bot](https://build.chromium.org/p/chromium.infra.cron/builders/rebaseline-o-matic)
+will automatically detect when the bots have cycled (by looking at the blame on
+the file) and do the rebaseline for you. As long as the test doesn't timeout or
+crash, it won't turn the bots red if it has a `NeedsRebaseline` expectation.
+When  all of the continuous builders on the waterfall have cycled, the
+rebaseline-o-matic bot will commit a patch which includes the new baselines and
+removes the `[ NeedsRebaseline ]` entry from TestExpectations.
+
+### Rebaselining manually
+
+1. If the tests is already listed in TestExpectations as flaky, mark the test
+   `NeedsManualRebaseline` and comment out the flaky line so that your patch can
+   land without turning the tree red. If the test is not in TestExpectations,
+   you can add a `[ Rebaseline ]` line to TestExpectations.
+2. Run `third_party/WebKit/Tools/Scripts/webkit-patch rebaseline-expectations`
+3. Post the patch created in step 2 for review.
+
+## Kinds of expectations files
+
+* [TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations): The
+  main test failure suppression file. In theory, this should be used for flaky
+  lines and `NeedsRebaseline`/`NeedsManualRebaseline` lines.
+* [ASANExpectations](../../third_party/WebKit/LayoutTests/ASANExpectations):
+  Tests that fail under ASAN.
+* [LeakExpectations](../../third_party/WebKit/LayoutTests/LeakExpectations):
+  Tests that have memory leaks under the leak checker.
+* [MSANExpectations](../../third_party/WebKit/LayoutTests/MSANExpectations):
+  Tests that fail under MSAN.
+* [NeverFixTests](../../third_party/WebKit/LayoutTests/NeverFixTests): Tests
+  that we never intend to fix (e.g. a test for Windows-specific behavior will
+  never be fixed on Linux/Mac). Tests that will never pass on any platform
+  should just be deleted, though.
+* [SlowTests](../../third_party/WebKit/LayoutTests/SlowTests): Tests that take
+  longer than the usual timeout to run. Slow tests are given 5x the usual
+  timeout.
+* [SmokeTests](../../third_party/WebKit/LayoutTests/SmokeTests): A small subset
+  of tests that we run on the Android bot.
+* [StaleTestExpectations](../../third_party/WebKit/LayoutTests/StaleTestExpectations):
+  Platform-specific lines that have been in TestExpectations for many months.
+  They're moved here to get them out of the way of people doing rebaselines
+  since they're clearly not getting fixed anytime soon.
+* [W3CImportExpectations](../../third_party/WebKit/LayoutTests/W3CImportExpectations):
+  A record of which W3C tests should be imported or skipped.
+* [WPTServeExpectations](../../third_party/WebKit/LayoutTests/WPTServeExpectations):
+  Expectations for tests that fail differently when run under the W3C's wptserve
+  HTTP server with the `--enable-wptserve flag`. This is an experimental feature
+  at this time.
+
+
+### Flag-specific expectations files
+
+It is possible to handle tests that only fail when run with a particular flag
+being passed to `content_shell`. See
+[LayoutTests/FlagExpectations/README.txt](../../third_party/WebKit/LayoutTests/FlagExpectations/README.txt)
+for more.
+
+## Updating the expectations files
+
+### Ordering
+
+The file is not ordered. If you put new changes somewhere in the middle of the
+file, this will reduce the chance of merge conflicts when landing your patch.
+
+### Syntax
+
+The syntax of the file is roughly one expectation per line. An expectation can
+apply to either a directory of tests, or a specific tests. Lines prefixed with
+`# ` are treated as comments, and blank lines are allowed as well.
+
+The syntax of a line is roughly:
+
+```
+[ bugs ] [ "[" modifiers "]" ] test_name [ "[" expectations "]" ]
+```
+
+* Tokens are separated by whitespace.
+* **The brackets delimiting the modifiers and expectations from the bugs and the
+  test_name are not optional**; however the modifiers component is optional. In
+  other words, if you want to specify modifiers or expectations, you must
+  enclose them in brackets.
+* Lines are expected to have one or more bug identifiers, and the linter will
+  complain about lines missing them. Bug identifiers are of the form
+  `crbug.com/12345`, `code.google.com/p/v8/issues/detail?id=12345` or
+  `Bug(username)`.
+* If no modifiers are specified, the test applies to all of the configurations
+  applicable to that file.
+* Modifiers can be one or more of `Mac`, `Mac10.9`, `Mac10.10`, `Mac10.11`,
+  `Retina`, `Win`, `Win7`, `Win10`, `Linux`, `Linux32`, `Precise`, `Trusty`,
+  `Android`, `Release`, `Debug`.
+* Some modifiers are meta keywords, e.g. `Win` represents both `Win7` and
+  `Win10`. See the `CONFIGURATION_SPECIFIER_MACROS` dictionary in
+  [third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/base.py](../../third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/base.py)
+  for the meta keywords and which modifiers they represent.
+* Expectations can be one or more of `Crash`, `Failure`, `Pass`, `Rebaseline`,
+  `Slow`, `Skip`, `Timeout`, `WontFix`, `Missing`, `NeedsRebaseline`,
+  `NeedsManualRebaseline`. If multiple expectations are listed, the test is
+  considered "flaky" and any of those results will be considered as expected.
+
+For example:
+
+```
+crbug.com/12345 [ Win Debug ] fast/html/keygen.html [ Crash ]
+```
+
+which indicates that the "fast/html/keygen.html" test file is expected to crash
+when run in the Debug configuration on Windows, and the tracking bug for this
+crash is bug \#12345 in the [Chromium issue tracker](https://crbug.com). Note
+that the test will still be run, so that we can notice if it doesn't actually
+crash.
+
+Assuming you're running a debug build on Mac 10.9, the following lines are all
+equivalent (in terms of whether the test is performed and its expected outcome):
+
+```
+fast/html/keygen.html [ Skip ]
+fast/html/keygen.html [ WontFix ]
+Bug(darin) [ Mac10.9 Debug ] fast/html/keygen.html [ Skip ]
+```
+
+### Semantics
+
+* `WontFix` implies `Skip` and also indicates that we don't have any plans to
+  make the test pass.
+* `WontFix` lines always go in the
+  [NeverFixTests file]((../../third_party/WebKit/LayoutTests/NeverFixTests) as
+  we never intend to fix them. These are just for tests that only apply to some
+  subset of the platforms we support.
+* `WontFix` and `Skip` must be used by themselves and cannot be specified
+  alongside `Crash` or another expectation keyword.
+* `Slow` causes the test runner to give the test 5x the usual time limit to run.
+  `Slow` lines go in the
+  [SlowTests file ](../../third_party/WebKit/LayoutTests/SlowTests). A given
+  line cannot have both Slow and Timeout.
+
+Also, when parsing the file, we use two rules to figure out if an expectation
+line applies to the current run:
+
+1. If the configuration parameters don't match the configuration of the current
+   run, the expectation is ignored.
+2. Expectations that match more of a test name are used before expectations that
+   match less of a test name.
+
+For example, if you had the following lines in your file, and you were running a
+debug build on `Mac10.10`:
+
+```
+crbug.com/12345 [ Mac10.10 ] fast/html [ Failure ]
+crbug.com/12345 [ Mac10.10 ] fast/html/keygen.html [ Pass ]
+crbug.com/12345 [ Win7 ] fast/forms/submit.html [ Failure ]
+crbug.com/12345 fast/html/section-element.html [ Failure Crash ]
+```
+
+You would expect:
+
+* `fast/html/article-element.html` to fail with a text diff (since it is in the
+  fast/html directory).
+* `fast/html/keygen.html` to pass (since the exact match on the test name).
+* `fast/html/submit.html` to pass (since the configuration parameters don't
+  match).
+* `fast/html/section-element.html` to either crash or produce a text (or image
+  and text) failure, but not time out or pass.
+
+*** promo
+Duplicate expectations are not allowed within the file and will generate
+warnings.
+***
+
+You can verify that any changes you've made to an expectations file are correct
+by running:
+
+```bash
+third_party/WebKit/Tools/Scripts/lint-test-expectations
+```
+
+which will cycle through all of the possible combinations of configurations
+looking for problems.