ParUtils

This package is a light version of ParTools, requiring no third-party dependencies. It includes a reworked logging feature, as well as string, and file handling features. You can mainly use them for:

Logging information in a file
Loading / saving text files
Listing files
Loading / saving csv files (lighter but more performant than built in csv module)
Creating hash strings, random string, comparing strings with wildcard char
Comparing files and lists
Listing and removing duplicates from a list

QuickStart

pip install parutils

You can start by testing the logger with the following code:

import parutils as u

u.Logger()  # initializes a log file
u.log('Hello World')  # logs something in the console and in the log file

You should then see something like this in the console:

Log file initialised (c:\Dev\ParUtils\log\20221213_072132.txt)
CWD: c:\Dev\ParUtils
Python interpreter path: C:\Python\python.exe
Python version: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
ParUtils version: 1.0.8

07:21:32 - Hello World

Example of useful functions

ParUtils provides generic functions meant to be reused by external packages. In this section, a few of these functions are listed. For an exhaustive list, you can check out parutils/__init__.py.

Manipulating files

save_list: saves a list into a text file
load_txt: loads a text file into a string or a list
load_csv: loads a csv file into a list of lists
save_csv: saves a list of lists into a csv file
list_files: lists files in a given directory (possibility to recurse subdirectories)

Manipulating strings

like: behaves as the LIKE of Oracle SQL (you can match strings with wildcard character '*'). It returns a re.match object giving you access to the matched wildcards strings.
Example: m = like('Hello World', 'He\*o w\*d'), m.group(1) => 'll'
like_list / like_dict: apply the like function directly to lists or dictionaries (see doc).
big_number: converts a potentially big number into a readable string.
Example: big_number(10000000) => '10 000 000'.
get_duration_string: outputs a string representing the time elapsed since the input start_time.
Example: get_duration_string(0, end_time=200) => '3 minutes and 20 seconds'.
hash512: creates a non randomised hash string from a string.
gen_random_string: generates a random string.

Data quality features:

diff_list: compares two lists
file_match: compare two files
find_dup_list: finds duplicates in a list
del_dup_list: removes duplicates from a list

Logging with parutils

Basic usage guide

The log function and the Logger class are directly available from the parutils package. So you can do:

import parutils as u

u.Logger()
u.log('Hello World')

Note: if you want the log function to actually write in a log file, you have to create a Logger object before using it, otherwise it will just print out the log info in the console.

The relevant parameters such a the log directory or the log format can be specified when initializing the Logger object. The default log_format is '%H:%M:%S -', and a default log line looks like:

19:45:04 - This line has been generated by the parutils.log function

Note that the default constants for the logging sub package are stored in parutils.logging.const. So for example, if you want to overwrite the default value for the logging directory, you can do:

import parutils as u

u.logging.const.DEFAULT_DIR = '<my_custom_dir>'

About the step_log function

The step_log function allows you to log some information only when the input counter is a multiple of the input step. Thus, step_log is to be used in loops to track the progress of long processes such as reading or writing millions of lines in a file. The what input expects a description of what is being counted. It's default value is 'lines written'.
In order to correctly measure the elapsed time for the first log line, the step_log function has to be initialised by running init_sl_timer().
So for example, if you input step=500 and don't input any what value, you should get something like this:

19:45:04 - 500 lines written in 3 ms. 500 lines written in total.
19:45:04 - 500 lines written in 2 ms. 1 000 lines written in total.
19:45:04 - 500 lines written in 2 ms. 1 500 lines written in total.

Checkout the test_logging.py file in tests/logging for simple examples of use.

About the retry mechanism

If the logger is initialized, the log information is written in a file each time you call the log function. This file writing can fail, especially if your log file is located on a network drive.

Parutils has a built-in retry mechanism related to log file writing failures. Here is how it works:
Whenever writing some log fails, a warning is printed, looking like this

Warning: the following message couldn't be logged because of <error>: <logged message>

This warning is stored in a buffer and the logger will try to log this buffer to the log file next time it has something to log. If the next try also fails, another warning is printed and added to the buffer, etc, up to a certain limit. When the size of the buffer exceeds 10, the logger stops trying to log into a file, and just prints the logs in the currently available console.

About the log_every argument

Because each logged message implies opening a file, writing to this file, and closing the file, logging can severely affect performance, especially when logging to some network locations, or more generally to a slow drive. The log_every argument of the Logger constructor allows to mitigate this by logging to the file only every <log_every> log messages, using a buffer mechanism. By default, log_every is set to one, so the log file is writing for every log messages. For example, if you set log_every to 10, the log file is only written every 10 log entries, while every log entry will still be printed live on the console.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
parutils		parutils
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
req_admin.txt		req_admin.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

ParUtils

QuickStart

Example of useful functions

Logging with parutils

Basic usage guide

About the step_log function

About the retry mechanism

About the log_every argument

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

paularnaud2/ParUtils

Folders and files

Latest commit

History

Repository files navigation

ParUtils

QuickStart

Example of useful functions

Logging with parutils

Basic usage guide

About the step_log function

About the retry mechanism

About the log_every argument

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages