capturer: Easily capture stdout/stderr of the current process and subprocesses¶
The capturer package makes it easy to capture the stdout and stderr streams of the current process and subprocesses. Output can be relayed to the terminal in real time but is also available to the Python program for additional processing. It’s currently tested on cPython 2.6, 2.7, 3.4, 3.5, 3.6 and PyPy (2.7). It’s tested on Linux and Mac OS X and may work on other unixes but definitely won’t work on Windows (due to the use of the platform dependent pty module). For usage instructions please refer to the documentation.
Status¶
The capturer package was developed as a proof of concept over the course of a weekend, because I was curious to see if it could be done (reliably). After a weekend of extensive testing it seems to work fairly well so I’m publishing the initial release as version 1.0, however I still consider this a proof of concept because I don’t have extensive “production” experience using it yet. Here’s hoping it works as well in practice as it did during my testing :-).
Installation¶
The capturer package is available on PyPI which means installation should be as simple as:
$ pip install capturer
There’s actually a multitude of ways to install Python packages (e.g. the per user site-packages directory, virtual environments or just installing system wide) and I have no intention of getting into that discussion here, so if this intimidates you then read up on your options before returning to these instructions ;-).
Getting started¶
The easiest way to capture output is to use a context manager:
import subprocess
from capturer import CaptureOutput
with CaptureOutput() as capturer:
# Generate some output from Python.
print "Output from Python"
# Generate output from a subprocess.
subprocess.call(["echo", "Output from a subprocess"])
# Get the output in each of the supported formats.
assert capturer.get_bytes() == b'Output from Python\r\nOutput from a subprocess\r\n'
assert capturer.get_lines() == [u'Output from Python', u'Output from a subprocess']
assert capturer.get_text() == u'Output from Python\nOutput from a subprocess'
The use of a context manager (the with statement) ensures that output capturing is enabled and disabled at the appropriate time, regardless of whether exceptions interrupt the normal flow of processing.
Note that the first call to get_bytes(), get_lines() or get_text()
will stop the capturing of output by default. This is intended as a sane
default to prevent partial reads (which can be confusing as hell when you don’t
have experience with them). So we could have simply used print
to show
the results without causing a recursive “captured output is printed and then
captured again” loop. There’s an optional partial=True
keyword argument
that can be used to disable this behavior (please refer to the documentation
for details).
Design choices¶
There are existing solutions out there to capture the stdout and stderr streams of (Python) processes. The capturer package was created for a very specific use case that wasn’t catered for by existing solutions (that I could find). This section documents the design choices that guided the development of the capturer package:
Intercepts writes to low level file descriptors¶
Libraries like capture and iocapture change Python’s sys.stdout and sys.stderr file objects to fake file objects (using StringIO). This enables capturing of (most) output written to the stdout and stderr streams from the same Python process, however any output from subprocesses is unaffected by the redirection and not captured.
The capturer package instead intercepts writes to low level file descriptors (similar to and inspired by how pytest does it). This enables capturing of output written to the standard output and error streams from the same Python process as well as any subprocesses.
Uses a pseudo terminal to emulate a real terminal¶
The capturer package uses a pseudo terminal created using pty.openpty() to capture output. This means subprocesses will use ANSI escape sequences because they think they’re connected to a terminal. In the current implementation you can’t opt out of this, but feel free to submit a feature request to change this :-). This does have some drawbacks:
The use of pty.openpty() means you need to be running in a UNIX like environment for capturer to work (Windows definitely isn’t supported).
All output captured is relayed on the stderr stream by default, so capturing changes the semantics of your programs. How much this matters obviously depends on your use case. For the use cases that triggered me to create capturer it doesn’t matter, which explains why this is the default mode.
There is experimental support for capturing stdout and stderr separately and relaying captured output to the appropriate original stream. Basically you call
CaptureOutput(merged=False)
and then you use thestdout
andstderr
attributes of theCaptureOutput
object to get at the output captured on each stream.I say experimental because this method of capturing can unintentionally change the order in which captured output is emitted, in order to avoid interleaving output emitted on the stdout and stderr streams (which would most likely result in incomprehensible output). Basically output is relayed on each stream separately after each line break. This means interactive prompts that block on reading from standard input without emitting a line break won’t show up (until it’s too late ;-).
Relays output to the terminal in real time¶
The main use case of capturer is to capture all output of a snippet of Python code (including any output by subprocesses) but also relay the output to the terminal in real time. This has a couple of useful properties:
- Long running operations can provide the operator with real time feedback by emitting output on the terminal. This sounds obvious (and it is!) but it is non-trivial to implement (an understatement :-) when you also want to capture the output.
- Programs like gpg and ssh that use interactive password prompts will render their password prompt on the terminal in real time. This avoids the awkward interaction where a password prompt is silenced but the program still hangs, waiting for input on stdin.
Contact¶
The latest version of capturer is available on PyPI and GitHub. The documentation is hosted on Read the Docs. For bug reports please create an issue on GitHub. If you have questions, suggestions, etc. feel free to send me an e-mail at peter@peterodding.com.
License¶
This software is licensed under the MIT license.
© 2017 Peter Odding.
A big thanks goes out to the pytest developers because pytest’s mechanism for capturing the output of subprocesses provided inspiration for the capturer package. No code was copied, but both projects are MIT licensed anyway, so it’s not like it’s very relevant :-).
API documentation¶
The following documentation is based on the source code of version 2.4 of the capturer package.
Easily capture stdout/stderr of the current process and subprocesses.
-
capturer.
interpret_carriage_returns
(text)¶ Alias to
humanfriendly.terminal.clean_terminal_output()
.In capturer version 2.1.2 the
interpret_carriage_returns()
function was obsoleted byhumanfriendly.terminal.clean_terminal_output()
. This alias remains for backwards compatibility.
-
capturer.
DEFAULT_TEXT_ENCODING
= 'UTF-8'¶ The name of the default character encoding used to convert captured output to Unicode text (a string).
-
capturer.
GRACEFUL_SHUTDOWN_SIGNAL
= <Signals.SIGUSR1: 10>¶ The number of the UNIX signal used to communicate graceful shutdown requests from the main process to the output relay process (an integer). See also
enable_graceful_shutdown()
.
-
capturer.
TERMINATION_DELAY
= 0.01¶ The number of seconds to wait before terminating the output relay process (a floating point number).
-
capturer.
PARTIAL_DEFAULT
= False¶ Whether partial reads are enabled or disabled by default (a boolean).
-
capturer.
STDOUT_FD
= 1¶ The number of the file descriptor that refers to the standard output stream (an integer).
-
capturer.
STDERR_FD
= 2¶ The number of the file descriptor that refers to the standard error stream (an integer).
-
capturer.
enable_old_api
()[source]¶ Enable backwards compatibility with the old API.
This function is called when the
capturer
module is imported. It modifies theCaptureOutput
class to install method proxies forget_handle()
,get_bytes()
,get_lines()
,get_text()
,save_to_handle()
andsave_to_path()
.
-
capturer.
create_proxy_method
(name)[source]¶ Create a proxy method for use by
enable_old_api()
.Parameters: name – The name of the PseudoTerminal
method to call when the proxy method is called.Returns: A proxy method (a callable) to be installed on the CaptureOutput
class.
-
class
capturer.
MultiProcessHelper
[source]¶ Helper to spawn and manipulate child processes using
multiprocessing
.This class serves as a base class for
CaptureOutput
andPseudoTerminal
because both classes need the same child process handling logic.-
__init__
()[source]¶ Initialize a
MultiProcessHelper
object.
-
start_child
(target)[source]¶ Start a child process using
multiprocessing.Process
.Parameters: target – The callable to run in the child process. Expected to take a single argument which is a multiprocessing.Event
to be set when the child process has finished initialization.
-
stop_children
()[source]¶ Gracefully shut down all child processes.
Child processes are expected to call
enable_graceful_shutdown()
during initialization.
-
enable_graceful_shutdown
()[source]¶ Register a signal handler that converts
GRACEFUL_SHUTDOWN_SIGNAL
to an exception.Used by
capture_loop()
to gracefully interrupt the blockingos.read()
call when the capture loop needs to be terminated (this is required for coverage collection).
-
raise_shutdown_request
(signum, frame)[source]¶ Raise
ShutdownRequested
whenGRACEFUL_SHUTDOWN_SIGNAL
is received.
-
-
class
capturer.
CaptureOutput
(merged=True, encoding='UTF-8', termination_delay=0.01, chunk_size=1024, relay=True)[source]¶ Context manager to capture the standard output and error streams.
-
__init__
(merged=True, encoding='UTF-8', termination_delay=0.01, chunk_size=1024, relay=True)[source]¶ Initialize a
CaptureOutput
object.Parameters: - merged – Whether to capture and relay the standard output and
standard error streams as one stream (a boolean,
defaults to
True
). When this isFalse
thestdout
andstderr
attributes of theCaptureOutput
object arePseudoTerminal
objects that can be used to get at the output captured from each stream separately. - encoding – The name of the character encoding used to decode the
captured output (a string, defaults to
DEFAULT_TEXT_ENCODING
). - termination_delay – The number of seconds to wait before
terminating the output relay process (a
floating point number, defaults to
TERMINATION_DELAY
). - chunk_size – The maximum number of bytes to read from the
captured streams on each call to
os.read()
(an integer). - relay – If this is
True
(the default) then captured output is relayed to the terminal or parent process, if it’sFalse
the captured output is hidden (swallowed).
- merged – Whether to capture and relay the standard output and
standard error streams as one stream (a boolean,
defaults to
-
initialize_stream
(file_obj, expected_fd)[source]¶ Initialize one or more
Stream
objects to capture a standard stream.Parameters: - file_obj – A file-like object with a
fileno()
method. - expected_fd – The expected file descriptor of the file-like object.
Returns: The
Stream
connected to the file descriptor of the file-like object.By default this method just initializes a
Stream
object connected to the given file-like object and its underlying file descriptor (a simple one-liner).If however the file descriptor of the file-like object doesn’t have the expected value (
expected_fd
) twoStream
objects will be created instead: One of the stream objects will be connected to the file descriptor of the file-like object and the other stream object will be connected to the file descriptor that was expected (expected_fd
).This approach is intended to make sure that “nested” output capturing works as expected: Output from the current Python process is captured from the file descriptor of the file-like object while output from subprocesses is captured from the file descriptor given by
expected_fd
(because the operating system defines special semantics for the file descriptors with the numbers one and two that we can’t just ignore).For more details refer to issue 2 on GitHub.
- file_obj – A file-like object with a
-
__enter__
()[source]¶ Automatically call
start_capture()
when entering awith
block.
-
__exit__
(exc_type=None, exc_value=None, traceback=None)[source]¶ Automatically call
finish_capture()
when leaving awith
block.
-
is_capturing
¶ True
if output is being captured,False
otherwise.
-
start_capture
()[source]¶ Start capturing the standard output and error streams.
Raises: TypeError
when output is already being captured.This method is called automatically when using the capture object as a context manager. It’s provided under a separate name in case someone wants to extend
CaptureOutput
and build their own context manager on top of it.
-
finish_capture
()[source]¶ Stop capturing the standard output and error streams.
This method is called automatically when using the capture object as a context manager. It’s provided under a separate name in case someone wants to extend
CaptureOutput
and build their own context manager on top of it.
-
allocate_pty
(relay_fd=None, output_queue=None, queue_token=None)[source]¶ Allocate a pseudo terminal.
Internal shortcut for
start_capture()
to allocate multiple pseudo terminals without code duplication.
-
merge_loop
(started_event)[source]¶ Merge and relay output in a child process.
This internal method is used when standard output and standard error are being captured separately. It’s responsible for emitting each captured line on the appropriate stream without interleaving text within lines.
-
get_bytes
(partial=False)¶ Get the captured output as binary data.
Parameters: partial – Refer to get_handle()
for details.Returns: The captured output as a binary string. Note
This method is a proxy for the
get_bytes()
method of thePseudoTerminal
class. It requires merged to beTrue
and it expects thatstart_capture()
has been called. If this is not the case thenTypeError
is raised.
-
get_handle
(partial=False)¶ Get the captured output as a Python file object.
Parameters: partial – If True
(not the default) the partial output captured so far is returned, otherwise (so by default) the relay process is terminated and output capturing is disabled before returning the captured output (the default is intended to protect unsuspecting users against partial reads).Returns: The captured output as a Python file object. The file object’s current position is reset to zero before this function returns. This method is useful when you’re dealing with arbitrary amounts of captured data that you don’t want to load into memory just so you can save it to a file again. In fact, in that case you might want to take a look at
save_to_path()
and/orsave_to_handle()
:-).Warning
Two caveats about the use of this method:
- If partial is
True
(not the default) the output can end in a partial line, possibly in the middle of an ANSI escape sequence or a multi byte character. - If you close this file handle you just lost your last chance to get at the captured output! (calling this method again will not give you a new file handle)
Note
This method is a proxy for the
get_handle()
method of thePseudoTerminal
class. It requires merged to beTrue
and it expects thatstart_capture()
has been called. If this is not the case thenTypeError
is raised.- If partial is
-
get_lines
(interpreted=True, partial=False)¶ Get the captured output split into lines.
Parameters: - interpreted – If
True
(the default) captured output is processed usinginterpret_carriage_returns()
. - partial – Refer to
get_handle()
for details.
Returns: The captured output as a list of Unicode strings.
Warning
If partial is
True
(not the default) the output can end in a partial line, possibly in the middle of a multi byte character (this may cause decoding errors).Note
This method is a proxy for the
get_lines()
method of thePseudoTerminal
class. It requires merged to beTrue
and it expects thatstart_capture()
has been called. If this is not the case thenTypeError
is raised.- interpreted – If
-
get_text
(interpreted=True, partial=False)¶ Get the captured output as a single string.
Parameters: - interpreted – If
True
(the default) captured output is processed usinginterpret_carriage_returns()
. - partial – Refer to
get_handle()
for details.
Returns: The captured output as a Unicode string.
Warning
If partial is
True
(not the default) the output can end in a partial line, possibly in the middle of a multi byte character (this may cause decoding errors).Note
This method is a proxy for the
get_text()
method of thePseudoTerminal
class. It requires merged to beTrue
and it expects thatstart_capture()
has been called. If this is not the case thenTypeError
is raised.- interpreted – If
-
save_to_handle
(handle, partial=False)¶ Save the captured output to an open file handle.
Parameters: - handle – A writable file-like object.
- partial – Refer to
get_handle()
for details.
Note
This method is a proxy for the
save_to_handle()
method of thePseudoTerminal
class. It requires merged to beTrue
and it expects thatstart_capture()
has been called. If this is not the case thenTypeError
is raised.
-
save_to_path
(filename, partial=False)¶ Save the captured output to a file.
Parameters: - filename – The pathname of the file where the captured output should be written to (a string).
- partial – Refer to
get_handle()
for details.
Note
This method is a proxy for the
save_to_path()
method of thePseudoTerminal
class. It requires merged to beTrue
and it expects thatstart_capture()
has been called. If this is not the case thenTypeError
is raised.
-
-
class
capturer.
OutputBuffer
(fd)[source]¶ Helper for
CaptureOutput.merge_loop()
.Buffers captured output and flushes to the appropriate stream after each line break.
-
__init__
(fd)[source]¶ Initialize an
OutputBuffer
object.Parameters: fd – The number of the file descriptor where output should be flushed (an integer).
-
-
class
capturer.
PseudoTerminal
(encoding, termination_delay, chunk_size, relay_fd, output_queue, queue_token)[source]¶ Helper for
CaptureOutput
.Manages capturing of output and exposing the captured output.
-
__init__
(encoding, termination_delay, chunk_size, relay_fd, output_queue, queue_token)[source]¶ Initialize a
PseudoTerminal
object.Parameters: - encoding – The name of the character encoding used to decode the
captured output (a string, defaults to
DEFAULT_TEXT_ENCODING
). - termination_delay – The number of seconds to wait before
terminating the output relay process (a
floating point number, defaults to
TERMINATION_DELAY
). - chunk_size – The maximum number of bytes to read from the
captured stream(s) on each call to
os.read()
(an integer). - relay_fd – The number of the file descriptor where captured
output should be relayed to (an integer or
None
ifoutput_queue
andqueue_token
are given). - output_queue – The multiprocessing queue where captured output
chunks should be written to (a
multiprocessing.Queue
object orNone
ifrelay_fd
is given). - queue_token – A unique identifier added to each output chunk
written to the queue (any value or
None
ifrelay_fd
is given).
- encoding – The name of the character encoding used to decode the
captured output (a string, defaults to
-
attach
(stream)[source]¶ Attach a stream to the pseudo terminal.
Parameters: stream – A Stream
object.
-
get_handle
(partial=False)[source]¶ Get the captured output as a Python file object.
Parameters: partial – If True
(not the default) the partial output captured so far is returned, otherwise (so by default) the relay process is terminated and output capturing is disabled before returning the captured output (the default is intended to protect unsuspecting users against partial reads).Returns: The captured output as a Python file object. The file object’s current position is reset to zero before this function returns. This method is useful when you’re dealing with arbitrary amounts of captured data that you don’t want to load into memory just so you can save it to a file again. In fact, in that case you might want to take a look at
save_to_path()
and/orsave_to_handle()
:-).Warning
Two caveats about the use of this method:
- If partial is
True
(not the default) the output can end in a partial line, possibly in the middle of an ANSI escape sequence or a multi byte character. - If you close this file handle you just lost your last chance to get at the captured output! (calling this method again will not give you a new file handle)
- If partial is
-
get_bytes
(partial=False)[source]¶ Get the captured output as binary data.
Parameters: partial – Refer to get_handle()
for details.Returns: The captured output as a binary string.
-
get_lines
(interpreted=True, partial=False)[source]¶ Get the captured output split into lines.
Parameters: - interpreted – If
True
(the default) captured output is processed usinginterpret_carriage_returns()
. - partial – Refer to
get_handle()
for details.
Returns: The captured output as a list of Unicode strings.
Warning
If partial is
True
(not the default) the output can end in a partial line, possibly in the middle of a multi byte character (this may cause decoding errors).- interpreted – If
-
get_text
(interpreted=True, partial=False)[source]¶ Get the captured output as a single string.
Parameters: - interpreted – If
True
(the default) captured output is processed usinginterpret_carriage_returns()
. - partial – Refer to
get_handle()
for details.
Returns: The captured output as a Unicode string.
Warning
If partial is
True
(not the default) the output can end in a partial line, possibly in the middle of a multi byte character (this may cause decoding errors).- interpreted – If
-
save_to_handle
(handle, partial=False)[source]¶ Save the captured output to an open file handle.
Parameters: - handle – A writable file-like object.
- partial – Refer to
get_handle()
for details.
-
save_to_path
(filename, partial=False)[source]¶ Save the captured output to a file.
Parameters: - filename – The pathname of the file where the captured output should be written to (a string).
- partial – Refer to
get_handle()
for details.
-
capture_loop
(started_event)[source]¶ Continuously read from the master end of the pseudo terminal and relay the output.
This function is run in the background by
start_capture()
using themultiprocessing
module. It’s role is to read output emitted on the master end of the pseudo terminal and relay this output to the real terminal (so the operator can see what’s happening in real time) as well as a temporary file (for additional processing by the caller).
-
-
class
capturer.
Stream
(fd)[source]¶ Container for standard stream redirection logic.
Used by
CaptureOutput
to temporarily redirect the standard output and standard error streams.-
is_redirected
¶ True
onceredirect()
has been called,False
whenredirect()
hasn’t been called yet orrestore()
has since been called.
-
__init__
(fd)[source]¶ Initialize a
Stream
object.Parameters: fd – The file descriptor to be redirected (an integer).
-
redirect
(target_fd)[source]¶ Redirect output written to the file descriptor to another file descriptor.
Parameters: target_fd – The file descriptor that should receive the output written to the file descriptor given to the Stream
constructor (an integer).Raises: TypeError
when the file descriptor is already being redirected.
-
-
exception
capturer.
ShutdownRequested
[source]¶ Raised by
raise_shutdown_request()
to signal graceful termination requests (incapture_loop()
).