Introduced in the 0.9.12 release, the API is meant to supply an abstraction
for using the PyMarkdown application from within another application. Currently
at interface version 1, the API seeks to provide a useful interface
to the PyMarkdown application with low friction.
The PyDoc3 application is used to generate the Markdown API documentation as part of the build process. That given application was chosen for its simplicity and its ability to generate decent Markdown to properly document the PyMarkdown API.
Note that in addition to this document, a useful source for code snippets are the various test files under the test/api directory. As much as possible, we have strived to connect the API test function with any corresponding test function for the same scenario in the non-API part of the project. If you believe we have missed a scenario test function or have ideas on how to improve on our scenario tests, please let us know.
- A Quick Word on Executing These Snippets
- Basics
- PyMarkdown API Exceptions
- Positive Scan Results
- Introducing Scan Failures
- Introducing Pragma Failures
- Alternatives To Scan_Path
- Future Documentation
Usually, we are big fans of VSCode and its Terminal window, but with these examples that is not the case. As we tend to develop and test code, often with the help of the Terminal window, we are aware that the Terminal window caches any packages once imported. If you are using the Terminal window and change versions of PyMarkdown or other packages to test out a new version, please keep in mind that you will probably have to restart VSCode to clear the cache to allow for any new packages to be applied properly.
The basic code to perform scanning on a given markdown path is as follows:
from pymarkdown.api import PyMarkdownApi
source_path = "some-manner-of-path"
PyMarkdownApi().scan_path(source_path)We tried to keep the starting scenario as simplistic as possible, so that code
snippet is the minimum code needed to execute the scanner on a given path. In
this case, the
path is specified as some-manner-of-path which could either be the path to a
file or to a directory. Note that if some-manner-of-path specifies a file name,
it will be rejected because the filename does not end with .md. But this function
can also take globbed arguments, such as *.md to specify all the Markdown files
in the current directory.
To make the API easy to use, we focused on supplying a simple, bare-bones function
that we expect our users to use most of the time. As such, we believe that the
PyMarkdownApi object can be quickly instantiated and that the scan_path function
is clearly named. We hope that this will allow developers to have an easy time
integrating API into their applications.
While the base invocation of the scan_path function is simple, there are two
things that are not yet represented in our examples. These are that we are not
collecting any
information about any issues with the Markdown text and that we are not handling
any errors that occurred when scanning the non-existent path some-manner-of-path.
In this specific scenario, if you execute that code snippet it as is, you will see
error text that looks like:
WARNING:pymarkdown.main:Provided path 'some-manner-of-path' does not exist.
WARNING:pymarkdown.main:No matching files found.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<some-path>\pymarkdown\api.py", line 296, in scan_path
return self.__handle_scan_results(return_code, this_presentation)
File "<some-path>\pymarkdown\api.py", line 308, in __handle_scan_results
self.__generate_exception(this_presentation)
File "<some-path>\pymarkdown\api.py", line 351, in __generate_exception
raise PyMarkdownApiNoFilesFoundException(second_last_error_text)
pymarkdown.api.PyMarkdownApiNoFilesFoundException: Provided path 'some-manner-of-path' does not exist.
Note that the first two lines are reporting log messages from the PyMarkdown application.
As most logging defaults to a log level of Warning and output to the console,
you should expect to see the two lines of log messages. However, the remaining
lines talking about an exception that was raised are messy and do not help us
any.
To address that problem, we need to talk about catching exceptions.
When something goes wrong with the PyMarkdownApi object, the API raises a
PyMarkdownApiException exception. Therefore, to make the above sample handle
scan errors, we must change the sample slightly to add the needed support
for those exceptions:
from pymarkdown.api import PyMarkdownApi, PyMarkdownApiException
source_path = "some-manner-of-path"
try:
PyMarkdownApi().scan_path(source_path)
except PyMarkdownApiException:
passBy including a try/except block around the scan_path API call, the sample now
properly handles the exception and the output from executing the above code
snippet should now be:
WARNING:pymarkdown.main:Provided path 'some-manner-of-path' does not exist.
WARNING:pymarkdown.main:No matching files found.
This looks better as we can control the log file using the Python logging package
with ease. But as we are simply using a pass statement, we are not doing anything
useful with the exception. A better handling of the exception would be:
import sys
from pymarkdown.api import PyMarkdownApi, PyMarkdownApiException
source_path = "some-manner-of-path"
try:
PyMarkdownApi().scan_path(source_path)
except PyMarkdownApiException as this_exception:
print(f"API Exception: {this_exception}", file=sys.stderr)
sys.exit(1)The reason that this example is better than the last one is that it is doing something specific with the raised error. That action is to print the information to stdout and exit the program, which is a typical approach to handling the error when contained within a simple Python script. If the API is being called by a more complex application, that exception handling should be replaced with something in keeping with the rest of the calling application.
This is the point where all this planning and examples for the API starts to pay
off! Before continuing any further, we must write a trivial Markdown example
that should easily pass PyMarkdown's inspection. Create a new file called sample.md
in the local directory and set its content to the following Markdown:
# This is a title
This is a document
When copying the content into the sample.md file, please make sure that the new
file concludes with a single newline. Once that is done, execute the following
code snippet:
import sys
from pymarkdown.api import PyMarkdownApi, PyMarkdownApiException
source_path = "sample.md"
try:
scan_result = PyMarkdownApi().scan_path(source_path)
except PyMarkdownApiException as this_exception:
print(f"API Exception: {this_exception}", file=sys.stderr)
sys.exit(1)
print(scan_result.scan_failures)
print(scan_result.pragma_errors)If everything is working properly, you should see the following output:
[]
[]
That is because both the scan_failures property and the pragma_errors property
of the returned PyMarkdownScanPathResult instance are clear of any failures. This means
that the application could not find any issues with the sample.md file.
Now that we have a clean scan of a file, try and make these modifications to the
sample.md file that you created in the last section:
- remove the last line of the document, making
This is a documentthe last line - after the text
This is a document, insert a single space character - change the text
# This is a titleto## This almost a title - change the text
# This is a titletoThis is not a title - remove the blank line between
# This is a titleandThis is a document
In each of these cases, the resultant modifications produce acceptable Markdown for parsers, but Markdown which breaks one of the rules of PyMarkdown. Using the first suggested modification as an example, executing the code snippet will produce the following output:
[PyMarkdownScanFailure(scan_file='sample.md', line_number=3, column_number=18,
rule_id='MD047', rule_name='single-trailing-newline',
rule_description='Each file should end with a single newline character.',
extra_error_information='')]
[]
While the output is rather crude, it gives us a good amount of information. The
big information is that the Markdown snippet raises one issue when scanned by
PyMarkdown. By looking at the output along with the documentation on the PyMarkdownScanFailure
object in the API document, we can deduce the following:
scan_file: the issue was found in the filesample.mdline_numberandcolumn_number: the issue was one line 3, column 18rule_idandrule_name: the issue has idsMD047andsingle-trailing-newlinerule_description: this issue was raised as it expected a single newline character at the end of the fileextra_error_information: no extra information was provided
For a good example that includes the extra_error_information field, reset the first
modification, apply the second modification, and rescan the file. The results
should be:
[PyMarkdownScanFailure(scan_file='sample.md', line_number=3, column_number=19,
rule_id='MD009', rule_name='no-trailing-spaces', rule_description='Trailing spaces',
extra_error_information=' [Expected: 0 or 2; Actual: 1]')]
[]
This reported issue looks like the earlier issue, but the extra_error_information
field now holds the information [Expected: 0 or 2; Actual: 1]. By reading the issue
and looking at the sample.md file, it is reasonable to assume that rule id MD009
(with name no-trailing-spaces) expects each line to end with 0 space characters
or 2 space characters. As stated by the issue, it encountered 1 trailing space
character, so it triggered the failure.
With simple Python programming, a developer using the PyMarkdown API can create
their own handling of the PyMarkdownScanFailure instances, customized to their
own needs.
Based on Wikipedia, which had the most comprehensive information on pragmas, a pragma is "a language construct that specifies how a... translator should process its input." For the PyMarkdown project pragmas allow for the suppression of rules being triggered by PyMarkdown itself.
Reusing the sample.md file from the last section, if we change it to the following
text:
This is a title
produces the following output:
[PyMarkdownScanFailure(scan_file='sample.md', line_number=1, column_number=1,
rule_id='MD041', rule_name='first-line-heading,first-line-h1',
rule_description='First line in file should be a top level heading', extra_error_information='')]
[]
If we do not want to suppress this failure for every reporting of this issue, a pragma can be added to an individual file to just disable that one instance of the failure being reported. To fix up the above text snippet, we would change that snippet to:
<!-- pyml disable-next-line first-line-heading -->
This is a title
which will result in the PyMarkdown API returning no scan failures and no pragma failures.
But, as with any tool, there are error cases. That is where the handling of pragma
failures comes in. Once the <-- pyml text or <--- pyml text is detected, the
pragma is extracted from the token stream for later processing. When the document
is finished, the pragmas are parsed to see if they are validly formed and specify
that the reporting of a scan failure should be ignored. PyMarkdown considers any
invalidly formed failures to be failures in the same class as scan failures. That
is to say that the failures are reported, but do not stop the parsing and linting
of the Markdown files.
To see an example of such a failure, change the contents of sample.md to the
following text and rescan the file:
<!-- pyml disable-next-line invalid-->
This is a title
When the file is scanned, the following results should be reported:
[PyMarkdownScanFailure(scan_file='sample.md', line_number=2, column_number=1,
rule_id='MD041', rule_name='first-line-heading,first-line-h1',
rule_description='First line in file should be a top level heading', extra_error_information='')]
[PyMarkdownPragmaError(file_path='sample.md', line_number=1,
pragma_error="Inline configuration command 'disable-next-line' specified a plugin with a blank id.")]
Because the pragma was not constructed properly, it correctly reported that a blank
plugin id was parsed. Furthermore, as the pragma was not properly constructed, it
did not suppress rule MD041 from reporting a failure and the failure was therefore
reported.
The above sections all refer to the scan_path function that is used to scan one
or more paths that exist within the operating system. As some of our own future
scenarios include being able to scan an in-memory string object, we added support
for that usage with the scan_string function.
The scan_string function takes a single parameter is the actual Markdown to scan
instead of a path to one or more files to scan. Aside from that difference, the
rest of the functionality for that function is identical to the scan_path function.
In addition to the scan_string function, there are certain times during debugging
where we want to verify that PyMarkdown is scanning the files that we believe we
specified in the path argument. To that end, we constructed the list_path function
to be essentially equivalent to the scan --list-files arguments for PyMarkdown.
While this function (more completely documented under List Files To Scan)
may not seem useful at first glance, it has saved us time during debugging sessions
on more than one occasion. As no parsing of any found Markdown documents occurs,
the list_path function returns an instance of the PyMarkdownListPathResult
object that only holds the paths of any files that are eligible to scan.
We had planned to complete the documentation for the remaining functions, but we were eager to get the basic API and documentation out so our users could start using the API. The majority of our API mirrors functionality that is available on the command line, and hopefully does not need much explanation (yet).
-
version APIs for the
PyMarkdownApiobjectapplication_versionis equivalent topymarkdown versioninterface_versionis new and is currently1
-
Advanced Configuration - Configuration
configuration_file_pathis the equivalent of the--config {file}argumentenable_strict_configurationis the equivalent of the--strict-configargument and is more thoroughly covered in Specifying Strict Configuration Modeset_propertyis the equivalent of the--set {key}={value}argumentset_boolean_property,set_integer_property, andset_string_propertyare functions that automatically take care of translating these types of properties according to their Configuration Property Types
-
logis the equivalent of the--log-levelargumentlog_*_and aboveare shortcuts to using thelogfunction, with the log-level already in place- i.e.
log_info_and_aboveis equivalent tolog("INFO")
- i.e.
log_to_fileis the equivalent of the--log-fileargumentenable_stack_traceis the equivalent of the--stack-traceargument
-
Advanced Configuration - Plugins
add_plugin_pathis the equivalent of the--add-pluginargumentdisable_rule_by_identifieris the equivalent of the--disable-rulesargumentenable_rule_by_identifieris the equivalent of the--enable-rulesargument