Just enough Python 3.4+ for Sysadmin.
Given normal sysadmin jobs, which daily requires using:
- cat
- cut
- grep
- uniq
- sort
- head
- tail
- sed
- awk
- tr
- find
- glue all these with bash, some loop, some if/else conditions
Did I miss something?
Those are all de-factor/beloved/popular text manipulate tools that every sysadmin must proficient with.
But writing python script which calls these tools are just
- creates unnecessary subprocess - which costs more resources
- less-portable, depends on these tools' version/feature. One runs on Linux might not run on your colleague shiny MacBook and it gets worse when need to run on Windows, too..
- un-pythonic - it makes you not look like really know some Python
So, let's write all those in Python. It's put aside what you already learned and used for years on CLI, but I promise, that is simpler than the back time you learnt those UNIX tools.
NOTE: this tutorial does not target to "reinvent" UNIX tools, or replacing those tools for daily usage, but as a guide to archive their features as part of your bigger python application.
cat a file most of the time is just to have it as input of other command,
and that most of the time is not right way to do it.
$ cat /etc/passwd | grep root
root:x:0:0:root:/root:/bin/bashcat is unnecessary as grep (and most UNIX commands) accepts files as input
$ grep root /etc/passwd
root:x:0:0:root:/root:/bin/bashIn Python, to cat is to open a file, then iterate over lines and print out.
import sys
# sys.argv is same as $@ in bash, sys.argv[1:] is list of passed args
for filename in sys.argv[1:]:
with open(filename) as f:
for line in f:
# end='' to not auto-add newline to output. A file already has newline at the end of each line.
print(line, end='')Run it
$ python3 cat.py /etc/passwd | grep root
root:x:0:0:root:/root:/bin/bashgrep is most common used tool for searching text in files, it has many
options, so clone all grep in Python is not trivial task. But just to search
lines that contain a string/pattern is simple.
$ grep -n root /etc/passwd
1:root:x:0:0:root:/root:/bin/bashimport sys
pattern = sys.argv[1]
files = sys.argv[2:]
for filename in files:
with open(filename) as f:
for count, line in enumerate(f, start=1):
if pattern in line:
print(count, line, end='')The output is almost same:
$ python3 grep.py root /etc/passwd
1 root:x:0:0:root:/root:/bin/bashMost UNIX commands supports read from stdin, and with bash pipe |, it can
use output of other commands as its input, thus, make a super powerful way to
process text.
$ cat /etc/passwd | grep root
root:x:0:0:root:/root:/bin/bash# grepstdin.py
import sys
pattern = sys.argv[1]
for line in sys.stdin:
if pattern in line:
print(line, end='')Run it:
$ cat /etc/passwd | python grepstdin.py root
root:x:0:0:root:/root:/bin/bashThe reason grep is called grep because it is g/re/p (globally search
a regular expression and print).
You can skip this subsection if you don't know what regex is and come back later, maybe.
Using -E to tell grep you would input a regular expression pattern, and
-o to display only what matched (not whole line), let's show IPs:
$ ifconfig | grep -E '([0-9]{1,3}\.){3}[0-9]{1,3}' -o
127.0.0.1
255.0.0.0
10.192.122.2
10.192.122.2
255.255.255.255
10.246.114.252
10.246.115.255
255.255.252.0Python has more powerful regex (PCRE) support using standard re library.
grep regular expression is less powerful than Python - see man 1 grep:
grep understands three different versions of regular expression syntax: “basic” (BRE), “extended” (ERE) and “perl” (PCRE). In GNU grep, there is no difference in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards. Perl-compatible regular expressions give additional functionality, and are documented in pcresyntax(3) and pcrepattern(3), but work only if PCRE is available in the system.
# grepregex.py
import re
import sys
for line in sys.stdin:
# r'pattern' is raw-string, used when writing a regex
for matched in re.finditer(r'([0-9]{1,3}\.){3}[0-9]{1,3}', line):
print(matched.group())Output:
$ ifconfig | python grepregex.py
127.0.0.1
255.0.0.0
10.192.122.2
10.192.122.2
255.255.255.255
192.168.1.117
192.168.1.255
255.255.255.0A big note: though using regular expression works here, it has flaws:
- this looks for number.number.number.number - so passing
999.999.999.999still match, even it is not a valid IPv4. - regex is hard to understand/write/debug.
Using re library can help replacing almost every UNIX tool that mainly work with regex:
- grep - to find a pattern
- sed - to replace a pattern
- awk - to find, process and replace a pattern (it actually is a programming language, but general usage is that).
uniq - report or omit repeated lines, which can easily do in Python:
$ echo """abc
> abc
> abc
> def
> gh
> ghk
> abc""" | uniq
abc
def
gh
ghk
abcThe duplicated continued lines "abc" is removed.
import sys
last_line = None
for line in sys.stdin:
if line != last_line:
print(line, end='')
last_line = lineOne common option used is -c to count number of occurrences. To implement in
Python by iterating each line and count could be tricky (do try it), but
using standard library itertools makes thing trivial:
import itertools
import sys
for line, group in itertools.groupby(sys.stdin):
dup_count = 0
for _ in group:
dup_count = dup_count + 1
print(dup_count, line, end="")The counting part not using len to avoid creating an auxiliary list, thus
saving memory.
sort helps sorting lines using numerical or lexical order.
$ sort -n -t: -k3 /etc/passwd | head -3
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologinshow users info ordered by 3rd column (UID number). Or sort by names (column 1):
$ sort -t1 -r /etc/passwd | head -3
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
whoopsie:x:109:116::/nonexistent:/bin/false
uuidd:x:107:111::/run/uuidd:/bin/falseimport sys
SEPARATOR = ':'
def first_field(line):
return line.split(SEPARATOR)[0]
files = sys.argv[1:]
for filename in files:
with open(filename) as f:
for line in sorted(f, key=first_field, reverse=True):
print(line, end='')$ python sort.py /etc/passwd | head -3
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
whoopsie:x:109:116::/nonexistent:/bin/false
uuidd:x:107:111::/run/uuidd:/bin/false- head
- tail
- sed
- awk
- tr
- find
- argparse
- requests & JSON