Module strings

Utilities for working with strings.

New in version 0.2.0.

Overview

BINARY_PREFIXES

List with binary prefixes.

BOOLEAN_STATES

Dictionary with mappings from strings to boolean values.

DECIMAL_PREFIXES

List with decimal prefixes.

NO_PREFIX

No prefix

Prefix

Class for prefixes with fields name, symbol, factor.

TranslationTable

TranslationTable class.

bin_prefix

Get an appropriate binary prefix for an integer number.

dec_prefix

Get an appropriate decimal prefix for a number.

find_bin_prefix

Find binary prefix for name or symbol.

find_dec_prefix

Find decimal prefix for name or symbol.

format_bin_prefix

Format a number with a binary prefix.

format_dec_prefix

Format a number with a decimal prefix.

format_timedelta

Format a time delta.

insert_separator

Insert separators in a string.

int2str

Convert an integer to a string.

is_hexdigit

Check if all characters are hexadecimal digits.

parse_timedelta

Parse a string as a time delta according to a format.

purge

Purge characters from a string.

shorten

Shorten the text to fit in the given width.

slugify

Slugify a string.

split_host_port

Split a string into host and port.

str2bool

Convert a string to a boolean value.

str2port

Convert a string to a network port number.

str2tuple

Convert a string to a tuple.

walign

Align a string even if not all characters occupy only one column in a terminal.

wlen

Determine the number of columns a string actually occupies in a terminal.

wshorten

Shorten the text to fit in the given width even if not all characters occupy only one column in a terminal.

salmagundi.strings.BOOLEAN_STATES

Dictionary with mappings from strings to boolean values.

Used by the function str2bool().

This dictionary can be modified. The default values are:
  • True: '1', 'yes', 'true', 'on'

  • False: '0', 'no', 'false', 'off'

These are the same as in configparser.ConfigParser.BOOLEAN_STATES.

salmagundi.strings.NO_PREFIX

No prefix

salmagundi.strings.BINARY_PREFIXES

List with binary prefixes.

The entries in this list are of type Prefix.

Name

Symbol

Factor

yobi

Yi

280

zebi

Zi

270

exbi

Ei

260

pebi

Pi

250

tebi

Ti

240

gibi

Gi

230

mebi

Mi

220

kibi

Ki

210

1

salmagundi.strings.DECIMAL_PREFIXES

List with decimal prefixes.

The entries in this list are of type Prefix.

Name

Symbol

Factor

yotta

Y

1024

zetta

Z

1021

exa

E

1018

peta

P

1015

tera

T

1012

giga

G

109

mega

M

106

kilo

k

103

hecto

h

102

deca

da

101

1

deci

d

10-1

centi

c

10-2

milli

m

10-3

micro

µ

10-6

nano

n

10-9

pico

p

10-12

femto

f

10-15

atto

a

10-18

zepto

z

10-21

yocto

y

10-24

class salmagundi.strings.Prefix(name, symbol, factor)

Class for prefixes with fields name, symbol, factor.

Used by the *_prefix() functions.

class salmagundi.strings.TranslationTable(mapped_chars, unmapped_chars='', delete_chars='')[source]

TranslationTable class.

This class is for use with str.translate(). If a character is not in mapped_chars, unmapped_chars, or delete_chars a ValueError will be raised. See also: str.maketrans().

Parameters
  • mapped_chars (dict(str, str)) – mapping from character to replacement string

  • unmapped_chars (str) – string with characters that will not be replaced

  • delete_chars (str) – string with characters that will be deleted

Returns

the resulting string

Return type

str

Raises

ValueError – if a character is not allowed

New in version 0.9.0.

salmagundi.strings.bin_prefix(value)[source]

Get an appropriate binary prefix for an integer number.

Parameters

value (int) – the number

Returns

binary prefix

Return type

Prefix

Raises

TypeError – if value is not an integer

salmagundi.strings.dec_prefix(value, restricted=True)[source]

Get an appropriate decimal prefix for a number.

Parameters
  • value (int or float) – the number

  • restricted (bool) – if True only integer powers of 1000 are used, i.e. hecto, deca, deci, centi are skipped

Returns

decimal prefix

Return type

Prefix

Raises

TypeError – if value is not of type int or float

salmagundi.strings.find_bin_prefix(s)[source]

Find binary prefix for name or symbol.

Parameters

s (str) – name (case-insensitive) or symbol (case-sensitive)

Returns

binary prefix or None if not found

Return type

Prefix or None

salmagundi.strings.find_dec_prefix(s)[source]

Find decimal prefix for name or symbol.

Parameters

s (str) – name (case-insensitive) or symbol (case-sensitive); instead of the symbol µ the letter u can be used

Returns

decimal prefix or None if not found

Return type

Prefix or None

salmagundi.strings.format_bin_prefix(num_frmt, value, prefix=None)[source]

Format a number with a binary prefix.

>>> format_bin_prefix('.3f', 1024**2+1024)
'1.001 Mi'
>>> format_bin_prefix('.3f', 1024**2+1024, prefix='Gi')
'0.001 Gi'
Parameters
  • num_frmt (str) – number format string as used with format()

  • value (int) – the number

  • prefix (Prefix or str) – can be a binary prefix object, name, or symbol

Returns

the result of value / prefix.factor formatted according to num_frmt with a space character and prefix.symbol appended

Return type

str

Raises

TypeError – if value is not an integer

salmagundi.strings.format_dec_prefix(num_frmt, value, prefix=None, restricted=True)[source]

Format a number with a decimal prefix.

>>> format_dec_prefix('.1f', 0.012)
'12.0 m'
>>> format_dec_prefix('.1f', 0.012, restricted=False)
'1.2 c'
Parameters
  • num_frmt (str) – number format string as used with format()

  • value (int or float) – the number

  • prefix (Prefix or str) – can be a decimal prefix object, name, or symbol; instead of the symbol µ the letter u can be used

  • restricted (bool) – if True only integer powers of 1000 are used, i.e. hecto, deca, deci, centi are skipped. Ignored if prefix is set.

Returns

the result of value / prefix.factor formatted according to num_frmt with a space character and prefix.symbol appended

Return type

str

Raises

TypeError – if value is not of type int or float

salmagundi.strings.format_timedelta(fmt_str, delta)[source]

Format a time delta.

The time delta is the difference between two points in time, e.g. a duration.

The delta can be given in seconds as a int or float, or as a datetime.timedelta. It must be >= 0.

A format specifier starts with a '%' character followed by a flag (optional) and a number for the minimum field width (optional) followed by a format code.

Supported flag:

Flag

Meaning

'0'

zero-padded

' '

space-padded (default except for 's')

Supported format codes:

Code

Meaning

'D'

days as a decimal number

'H'

hours as a decimal number

'M'

minutes as a decimal number

'S'

seconds as a decimal number

's'

microseconds as a decimal number

'X'

equal to %02H:%02M:%02S

'Y'

equal to %02H:%02M

'Z'

equal to %02M:%02S

The codes 'X', 'Y', 'Z' cannot be used with flags and field widths.

If the field width for microseconds (code 's') is less than 6 it will be counted from the left. So '%3s' will get milliseconds.

>>> format_timedelta('%03H:%M:%S.%06s', 90078.012345678)
'025:1:18.012346'
>>> format_timedelta('%03H:%M:%S.%05s', 90078.012345678)
'025:1:18.01235'
>>> format_timedelta('%03H:%M:%S.%04s', 90078.012345678)
'025:1:18.0123'
>>> format_timedelta('%Z.%03s', 3678.0123)
'61:18.012'
>>> format_timedelta('%M min %S sec %3s ms', 3678.0123)
'61 min 18 sec 12 ms'
Parameters
Returns

the formatted time delta

Return type

str

Raises
  • TypeError – if the given delta is not int, float or timedelta

  • ValueError – if the given delta is negative or width < 1

salmagundi.strings.insert_separator(s, sep, group_size, reverse=False)[source]

Insert separators in a string.

>>> insert_separator('008041aefd7e', ':', 2)
00:80:41:ae:fd:7e
>>> insert_separator('aaabbbcccd', ':', 3)
'aaa:bbb:ccc:d'
>>> insert_separator('aaabbbcccd', ':', 3, True)
'a:aab:bbc:ccd'
>>> insert_separator('9783161484100', '-', (3, 1, 2, 6))
'978-3-16-148410-0'
Parameters
  • s (str) – the string

  • sep (str) – the separator character(s)

  • group_size (int or sequence of ints) – the size of each group

  • reverse (bool) – if True group from right to left instead from left to right

Returns

string with separators

Return type

str

Raises

ValueError – if group_size < 1

New in version 0.5.0.

Changed in version 0.6.0: Add parameter reverse

Changed in version 0.9.0: Groups can be of different sizes

salmagundi.strings.int2str(n, base)[source]

Convert an integer to a string.

For base > 10 lower case letters are used for digits.

See also the built-in functions bin(), oct(), hex().

Parameters
  • n (int) – the integer

  • base (int) – the base (2 <= base <= 36)

Returns

converted integer

Return type

str

Raises
  • TypeError – if n or base are not integers

  • ValueError – if base is outside the allowed range

New in version 0.7.0.

salmagundi.strings.is_hexdigit(s)[source]

Check if all characters are hexadecimal digits.

Parameters

s (str) – the string

Returns

True if all characters in the string are hexadecimal digits and there is at least one character

Return type

bool

New in version 0.5.0.

salmagundi.strings.parse_timedelta(string, fmt_str)[source]

Parse a string as a time delta according to a format.

The time delta is the difference between two points in time, e.g. a duration.

Note

The format specifier '%s' for microseconds can only be used for strings that are the fractional part of a second. The string '1' is 100000 µs; but '000001' is 1 µs.

>>> parse_timedelta('03:21.001', '%M:%02S.%s')
datetime.timedelta(seconds=201, microseconds=1000)
>>> str(_)
'0:03:21.001000'
Parameters
Returns

timedelta object

Return type

datetime.timedelta

Raises

ValueError – if the string cannot be parsed

salmagundi.strings.purge(s, chars=None, negate=False)[source]

Purge characters from a string.

Each character in chars will be eliminated from the string.

If chars=None or chars='' all consecutive whitespace are replaced by a single space.

if negate=True all characters not in chars will be purged (only applies when chars is at least one character)

>>> purge('00:80:41:ae:fd:7e', ':')
008041aefd7e
Parameters
  • s (str) – the string

  • chars (str) – the characters

  • negate (bool) – see above

Returns

the purged string

Return type

str

New in version 0.5.0.

salmagundi.strings.shorten(text, width=80, placeholder='…', pos='right')[source]

Shorten the text to fit in the given width.

If len(text) <= width the text is returned unchanged.

>>> text = 'Lorem ipsum dolor sit amet'
>>> shorten(text, width=15)
'Lorem ipsum do…'
>>> shorten(text, width=15, placeholder=' ... ', pos='middle')
'Lorem ...  amet'
>>> shorten(text, width=15, pos='left')
'…dolor sit amet'
Parameters
  • text (str) – the text

  • width (int) – the width

  • placeholder (str) – the placeholder

  • pos (str) – position ('left', 'middle', 'right') of placeholder in text

Returns

the shortened text

Return type

str

Raises

ValueError – if width < len(placeholder) or pos is unknown

salmagundi.strings.slugify(s)[source]

Slugify a string.

In the created slug white space characters are replaced by dashes (-) and all consecutive dashes but one are eliminated.

Parameters

s (str) – the string

Returns

slugified string

Return type

str

New in version 0.9.0.

salmagundi.strings.split_host_port(s, port=None)[source]

Split a string into host and port.

>>> split_host_port('example.com:42', 21)
('example.com', 42)
>>> split_host_port('example.com', 21)
('example.com', 21)
Parameters
  • s (str) – the string to split

  • port (int) – port that is used if there is none in the string

Returns

host and port number

Return type

str, int

Raises

ValueError – if port or the port number in s are not in [0..65535] or neither of them is given

New in version 0.5.0.

salmagundi.strings.str2bool(s)[source]

Convert a string to a boolean value.

The string is converted to lowercase before looked up in the BOOLEAN_STATES dictionary.

Parameters

s (str) – the string

Returns

a boolean value

Return type

bool

Raises

ValueError – if the string is not in BOOLEAN_STATES

salmagundi.strings.str2port(s)[source]

Convert a string to a network port number.

Parameters

s (str) – the string to convert

Returns

port number

Return type

int

Raises

ValueError – if s cannot be converted to a number in [0..65535]

New in version 0.5.0.

salmagundi.strings.str2tuple(s, sep=',', converter=None, *, maxsplit=- 1)[source]

Convert a string to a tuple.

If converter is given and not None, it must be a callable that takes a string parameter and returns an object of the required type, or else a tuple with string elements will be returned.

>>> str2tuple('1, 2, 3,4', converter=int)
(1, 2, 3, 4)
>>> str2tuple('on, off, no, true, YES')
('on', 'off', 'no', 'true', 'YES')
>>> str2tuple('on, off, no, true, YES', converter=str2bool)
(True, False, False, True, True)
>>> str2tuple('a, b, , d')
('a', 'b', '', 'd')
Parameters
  • s (str) – the string

  • sep (str) – the separator (whitespace around sep will be ignored)

  • converter (callable(str)) – the converter function

  • maxsplit (int) – max. number of splits (-1 means no limit)

Returns

tuple with elements of the required type

Return type

tuple

Changed in version 0.14.0: Add parameter maxsplit

salmagundi.strings.walign(text, width, fill=' ', align='center')[source]

Align a string even if not all characters occupy only one column in a terminal.

The string methods center(), ljust(), and rjust() only consider the number of characters when aligning a string. If a string contains characters that occupy zero or two columns (see wlen()) the result is wrong when displayed in a terminal:

s = 'abcde'
w = 'は日本競馬'
wlen(s) -> 5
wlen(w) -> 10
s.center(12, '.') -> 12 columns (correct)
w.center(12, '.') -> 17 columns (wrong)
walign(s, 12, '.', 'center') -> 12 columns (correct)
walign(w, 12, '.', 'center') -> 12 columns (correct)
Parameters
  • text (str) – the string

  • width (int) – number of columns wherein to align the string

  • fill (str) – fill character

  • align (str) – alignment (one of 'center', 'left', 'right)

Returns

aligned string (if wlen(text) >= width the original string is returned)

Raises

ValueError – if length could not be determined, align is unknown or wlen(fill) != 1

New in version 0.15.0.

salmagundi.strings.wlen(text)[source]

Determine the number of columns a string actually occupies in a terminal.

Though most characters need just one column there are characters that use zero (e.g. NULL) or two (e.g. Japanese) columns. The len() function counts the number of characters in a string.

Examples:

>>> s1 = 'a\0b'
>>> s2 = 'は日本'
>>> len(s1)
3
>>> len(s2)
3
>>> print(s1)
ab
>>> print(s2)
は日本
>>> wlen(s1)
2
>>> wlen(s2)
6
Parameters

text (str) – the string

Returns

number of columns

Return type

int

Raises

ValueError – if length could not be determined

New in version 0.15.0.

salmagundi.strings.wshorten(text, width=80, placeholder='…', pos='right')[source]

Shorten the text to fit in the given width even if not all characters occupy only one column in a terminal.

Parameters
  • text (str) – the text

  • width (int) – the width

  • placeholder (str) – the placeholder

  • pos (str) – position ('left', 'middle', 'right') of placeholder in text

Returns

the shortened text (may be one character less then width)

Return type

str

Raises

ValueError – if length could not be determined, width < wlen(placeholder) or pos is unknown

New in version 0.15.0.