Justin C. FisherResearch – ObjArray

 

ObjArray: Better Integrating Python Objects with NumPy Arrays. Source Code Sample Usage

There are many well-known benefits of Object Oriented Programming (OOP), including allowing hierarchically classed objects to possess and inherit diverse attributes and methods. There are also well-known benefits of Numeric Python (NumPy) arrays, including speed, a vast library of powerful functions, and flexible indexing. Before now, it was quite cumbersome for Python users to enjoy both sorts of benefits at the same time. Python’s objects offer excellent OOP, but could not easily be used with Numpy. NumPy’s structured and record arrays offer NumPy virtues but only a pale approximation of OOP. NumPy also allows arrays of Python objects, but these provide only fancy indexing, not straightforward access to OOP attributes or methods, nor direct access to the vast library of fast NumPy functions. Instead, coders who want to enjoy both OOP and NumPy have needed to write many manual loops/mappings/comprehensions to copy arrayed objects’ attributes into NumPy arrays and vice versa. My free library, ObjArray, offers a solution to this problem, providing NumPy users clean natural access to OOP attributes and methods – for example, people[ people.age > 18 ].height does exactly what you would intuitively expect it should do, namely return a NumPy array of the height attributes of all the people over age 18 – all while providing full access to the speed, power, and flexible indexing of NumPy.

Documentation (v0.0.3):

 

ObjArray: Better Integrating Python Objects with NumPy Arrays Source Code Sample Usage

# Copyright (c) 2016, Justin C. Fisher
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#     * Redistributions of source code must retain the above copyright
#       notice, this list of conditions and the following disclaimer.
#     * Redistributions in binary form must reproduce the above copyright
#       notice, this list of conditions and the following disclaimer in the
#       documentation and/or other materials provided with the distribution.
#     * The name of Justin C. Fisher may not be used to endorse or promote products
#       derived from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDER BE LIABLE FOR ANY
# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

#----------------------------------------------------------------------------------------------
#  Motivations for ObjArray
#----------------------------------------------------------------------------------------------

# There are many well-known benefits of Object Oriented Programming (OOP), including allowing
# hierarchically classed objects to possess and inherit diverse attributes and methods.
# There are also well-known benefits of Numeric Python (NumPy) arrays, including speed, a vast
# library of powerful functions, and flexible indexing.  Before now, it was quite cumbersome
# for Python users to enjoy both sorts of benefits at the same time.  Python’s objects offer
# excellent OOP, but could not easily be used with Numpy.  NumPy’s structured and record arrays
# offer Numpy virtues but only a pale approximation of OOP.  NumPy also allows arrays of Python
# objects, but these provide only fancy indexing, not straightforward access to OOP attributes
# or methods, nor direct access to the vast library of fast NumPy functions.  Instead, coders
# who want to enjoy both OOP and NumPy have needed to write many manual loops, mappings, and/or
# comprehensions to copy arrayed objects’ attributes into NumPy arrays and vice versa.
# ObjArray offers a solution to this problem, providing NumPy users clean natural
# access to OOP attributes and methods – for example, people[ people.age >= 18 ].height does
# exactly what you would intuitively expect it should do, namely return a NumPy array of the
# height attributes of all the people over age 18 – all while providing full access to the
# speed, power, and flexible indexing of NumPy.

#----------------------------------------------------------------------------------------------
#  General Overview
#----------------------------------------------------------------------------------------------

# This library contains the ObjArray class and associated functions.  ObjArray is a subclass
# of numpy.ndarray (always with dtype object) and provides convenient access to attributes
# and methods of the Python objects an ObjArray holds.  This access can either be in a slow
# flexible "ad hoc" way, or by faster, less flexible "coupling".

# ARRAY CREATION.  ObjArrays will typically be created by OA = ObjArray( A ). This returns a
# view (if possible, otherwise a new copy) of argument A (if given) as an array of type
# ObjArray.  Typically A should be an arraylike structure of Python objects.
#
# This accepts the same arguments and keywords as numpy.asarray.
#
# In addition, this can take a shape keyword, in which case it will first create an ObjArray
# with that shape, and then broadcast/***squeeze the given argument to fill it.  This can be
# especially useful to overcome ambiguities in Numpy's semantics. E.g.,
# np.array( ((0,1),(2,3)), dtype = object) could plausibly be interpreted as requesting a
# 1D object-array consisting of 2 python tuples (which is what Numpy would give if the inner
# tuples had different lengths), or as requesting a 2x2 array of python integers (which is
# what numpy gives when the tuples have the same length). This behavior can be quite annoying,
# especially when you wanted the former behavior and didn't think to check if each of the
# nested tuples happened to be the same length.  You can easily force whichever reading you
# want using the shape keyword when calling ObjArray().
#
# This also accepts the keyword attributes=attlist, which provides a convenient way of
# declaring what attributes of member objects you would like to be writeable from an ObjArray.
# (Explicit declaration isn't always necessary -- see the section on Ad Hoc Writing below.)

# AD HOC READING. values = OA.x will read the values of attribute .x for each object in OA, and
# return a new array whose initial dimensions match OA and whose contents match the corresponding
# attribute-instances for the objects in OA (including whatever dimensions those instances have).
# The shape and type of attribute instances is determined by sampling a member of OA, or you can
# manually specify these with keyword arguments, via OA.read_attr(attr_name, shape=None, dtype=None).
#
# Ad hoc read operations support indexing in two ways.  First, you can slice OA itself before
# reading, e.g., values = OA[0:2].x .  Second you can of course slice the resulting array,
# though remember its initial dimensions will match those of whatever array/slice you read the
# attribute from. So, for example, if OA is a 5x5 ObjArray, and each .x instance is a 3x3 array
# of integers, then OA.x would return a 5x5x3x3 array of integers.  OA[:,0].x[0:2,-1,:] would
# first take OA[:,0] (the 0th column of OA, hence a 1D array of 5 objects), then read off an
# array of its .x entries (a 5x3x3 array of ints), then slice the first 2 (out of 5) entries
# from the first dimension (corresponding to the first 2 objects in the column OA[:,0]) and
# return the last row of the values of those two instances (yielding a 2x3 array of ints).

# AD HOC WRITING.  OA.x = values is the converse ad hoc write operation.  It broadcasts(***)
# values to match the shape of OA, and then stores those values as the .x attribute instances
# of the corresponding objects of OA.  To use this brief notation, you must first declare the
# existence of the .x attribute in one of 5 ways:
#    (1) by doing a read operation from it,
#    (2) by coupling it to a buffer (see below),
#    (3) by declaring it in OA's creation, OA = ObjArray( A, attributes=(name1, name2, ...) ),
#    (4) *** by using OA.declare_attr(name1, name2,...), or
#    (5) by using the more verbose OA.write_attr(name, shape=None), which can also be
#        useful in cases where the relevant broadcasting (***) does not work automatically.
#
# Note: Writeable attributes are declared for the whole class of ObjArray's (or for some
# subclass of it, if you create and employ one), so, e.g., declaring a height attribute
# for one ObjArray will automatically make height be a writeable attribute for all
# ObjArrays, not just the one where you declared it.  If you create a subclass
# of ObjArray, and declare a writeable attribute for some member of that subclass,
# that attribute will automatically become writeable for other members of the subclass,
# but not for other ObjArrays that aren't members of that subclass.
#
# WARNING:  If you try to employ attribute names that collide with Numpy's own array
# attribute names like 'shape' or 'size', things may not work well!
# ObjArray attempts not to step on Numpy's toes, so, e.g., OA.shape will return the Numpy
# shape of OA, rather than ad-hoc reading the .shape attributes of OA's member objects.
# OA's members can still have .shape attributes -- you just won't be able to batch-access
# them with simple dot-notation like OA.shape.  You can still use more verbose commands
# like OA.read_attr('shape') and OA.write_attr('shape') to gain batch access to these.

# COUPLED BUFFERS.  The above ad hoc operations are fairly slow (by numpy standards) because
# each must loop through all the objects in OA.  This may not be a big deal if you need only
# rarely transfer info from the various objects to a compact numpy array, or vice versa.
# In cases where you'll frequently want to access and update info from both object-side and
# numpy-array side, a coupled buffer can provide much faster performance.
#
# Like the output of the ad hoc read operation above, a "buffer" is an array that stores the
# attribute values for all the different objects contiguously in the same buffer, so fast Numpy
# operations can operate on them all at once.  "Coupling" sets up each particular object so
# that, rather than storing that attribute value somewhere on its own, it will instead store
# to and retrieve from the relevant part of the buffer.  This means that ordinary python
# operations on the individual objects will proceed as normal with their attribute references
# automatically reading from or writing to the buffer, while fast numpy array operations on
# the buffer can effectively change the attributes of all the objects at the same time, without
# any slowdown nor any extra steps from the coder.
#
# The easiest way to create such a coupling is by B = OA.new_coupled_buffer(attr_name).
# This will examine a sample object in OA to determine the shape that attribute's instances should have.
# It will then create a "buffer", an array whose initial dimensions match those of OA itself
# and whose latter dimensions will be just enough to hold the sample attribute instance.  The existing
# values of each object's attribute instance will be copied into the corresponding part of the buffer.
# (Ideally each object in OA would already have that attribute defined, and all instances would have
# the same shape and dtype.  Where this is not the case, this will attempt to mold whatever is there,
# if present, into the same shape and dtype as the sample had.)  Then each object's attribute instance
# will be replaced with a view of the appropriate part of the buffer.
#
# The result is that changing objects' attributes will automatically update the coupled portion
# of the buffer, and similarly changes in the buffer will then automatically be accessible from
# the corresponding objects' attributes.  In effect, this will have relocated all attribute
# instances to be nicely contiguous in memory in the buffer, which allows them to participate
# in all sorts of fast numpy operations. Subsequent retrievals of OA.attr_name will return the
# buffer, so you needn't keep track of it yourself.  Also, the __set__ method of OA.attr_name
# will be modified to ensure the linkage won't be broken from that end. Setting
# OA.attr_name = new_value will broadcast that new_value into the existing buffer, rather than
# trying to couple all the separate instances to some new array (which would be comparatively
# slow).  (If you ever do want to couple to an existing array, see OA.couple_to_array() below.)

# EXAMPLE USAGE:
#    class Person(object):  # A simple object class for us to build an ObjArray out of
#        def __init__(self, height): self.height = height
#
#    people = [Person(h) for h in (75, 62, 22, 47, 28)] # list of people with heights I made up
#    Adam = people[0] # the first person, we'll check back to see what has happened to him later
#    print("Adam's beginning height:", Adam.height ) # we'll watch how this changes
#    OA = ObjArray( people ) # an ObjArray containing all the people
#
#    print("The first height recorded in an ad hoc reading of all people's heights:", OA.height[0])
#    adultA = OA[ OA.height > 54 ] # Ad hoc reading ObjArrays allows nice numpy-style fancy indexing
#    adultA.height = adultA.height + 1 # Ad hoc write new heights for each of the adults
#    print("Writing new heights to array of adults made Adam's height be", Adam.height )
#
#    buffer = OA.new_coupled_buffer('height') # this buffer now stores the heights for each person
#    buffer += 1  # make each person grow an inch by modifying the buffer
#    print("Altering the buffer made Adam be", Adam.height ) # confirm this changed objects' heights
#    Adam.height += 1  # make Adam himself grow another inch
#    print("Growing Adam made the 1st entry in buffer be", buffer[0] ) # confirm this changed buffer

# MANUAL COUPLING. Field instances can also be manually coupled to existing array A via the
# method OA.couple_to_array(attr_name, A, overwrite_array = True).  A's initial dimensions
# should match those of OA; any subsequent dimensions will be present in attribute instances.
# If overwrite_array is True, then, as above, the existing value of each attribute instance
# (where present) will be copied into the coupled part of A. If set to False, the existing
# values will be lost, and instead those attribute instances will effectively have been set
# to have whatever values were already in A.
#
# Manual linkage can be especially useful (a) if you want to force a particular shape or dtype
# for all instances, and can't rely on the simple sampling method described above to get that
# right, (b) if you already have the values that you want for an attribute, esp. one that
# doesn't exist yet in some or all of the objects, and/or (c) if you want to ensure that the
# buffer for some OA.x will be located contiguously in memory with some other data, e.g., the
# buffer for some other OA.y, as such contiguity can be useful for some numpy operations.
# For this latter usage, you would typically first allocate a double-sized buffer, then
# manually couple half of it to one attribute, and the other half to the other attribute.
#
# WARNING: each attribute of each object can be coupled to only one buffer at once!  To be
# coupled to a buffer, an object's attribute would need to have its value placed contiguously to
# that of the other objects from the same ObjArray, and that is (typically) incompatible with
# also placing its data contiguously to that of other objects in some other ObjArray.
# It's fine to have the same object be a member of multiple ObjArrays, but, for each attribute,
# no more than one of those ObjArrays should be coupled.  (When you slice out a subarray of a
# coupled ObjArray, the original coupling of object attributes will be maintained, but you
# shouldn't re-couple attributes of the subarray to anything else! Or if you do, you will
# break the nice coupling that you once had for the larger array.)
# ***I'm not sure whether I can even catch errors of this sort???
#
# Best practice will usually be to find the one batch of objects you'll most often want to do
# vectorized numpy operations on (or on slices from it) and couple that batch, and then use
# ad hoc operations to read from or write to other ObjArrays involving those objects.
# OA.x automatically returns a coupled buffer if it exists, and otherwise returns an array
# produced by ad hoc reading, so best practice will often involve using such read operations,
# which remain agnostic regarding which particular couplings you may have set up, with little
# loss of efficiency. Writing is trickier, as modifications to coupled arrays don't require
# any further "write" operation to percolate the results out to the objects themselves,
# whereas operations on uncoupled arrays don't automatically percolate out to objects in this
# way.  A good coding practice can be to conclude modifications of any array that you expect
# to be a buffer, but aren't 100% certain, with OA.x = modified_array. In the case where
# modified_array really was just the buffer that OA.x was already coupled to, this will be
# detected and no significant time will be wasted, whereas in the case where OA.x wasn't
# coupled to modified_array, this will copy the values in modified_array over to the relevant
# attribute instances.
#
# WARNING: coupling an object's attribute to a buffer creates a property with the same name
# in that object's class. This property is needed to ensure that attempts to set a coupled
# field instance won't break the coupling, and to allow attribute retrieval to yield a scalar
# value (rather than a tiny subarray containing the scalar, which is technically what scalar
# attributes actually get coupled to). This new property will attempt to be as invisible as
# possible.  However, this precludes coupling "attributes" that were already properties.
# (Typically you wouldn't want to do this anyway, because properties are useful for
# redirecting attempts to get/set them to third parties, whereas the point of coupling is
# instead to redirect such attempts to the buffer.  Also, the main advantage of coupling is
# that fast numpy operations can alter the buffer and thereby effectively alter
# object attributes without needing to call anything like a __set__ method for each object,
# whereas settable properties are intended to call a __set__ method whenever an object's
# property is set to a new value, which would be so slow as to defeat the purpose of coupling.)
# For similar reasons the property will block inheritance of that attribute from from
# super-class definitions (e.g., in the rare case where you want attempted readings of an
# object's attribute to return the current value of some superclass's attribute, which itself
# sometimes changes -- again the point of coupling is to redirect attribute references to the
# buffer, which is incompatible with redirecting up an inheritance chain).
# *** I think it might be possible to figure out how to pass uncoupled references up a super() chain???
#
# Coupled arrays experience no slowdown at all for numpy operations on the buffers, which are
# usually the operations for which speed will be most crucial.  Technically, coupling may cause
# slight slowdown accessing or modifying attributes from the object side though, because these
# operations are redirected, by the created properties, to the relevant portion of the buffer.
# This slowdown would be significant by numpy standards, but not by the standards usually
# applicable for Python code that operates on objects one at a time.  This slowdown arises
# from two sources: (1) the new property provides safety and convenience features, which
# require quick checks whenever an object's coupled attribute is get or set, and (2) in the
# case of scalar attributes, getting or setting these attributes is redirected through a
# numpy view of a cell in the buffer, and the processing of that takes a bit more time than
# just setting/retrieving a Python scalar.  In the case where your attribute values were
# already (views of) numpy arrays, this will be no slowdown at all, but in cases where they
# weren't, this slowdown will be unavoidable.  In cases where slowdown (1) matters, it may be
# diminished by employing the "dangerous mode" discussed below.
#
# Importantly, the created properties slightly slow attempts to set or retrieve coupled
# attribute values from *every* member of the object's class, including members that have
# not themselves been coupled to a buffer! If you have a class with many members, only a few
# of which you want to have participate in couplings, it may be more efficient to create a
# subclass just for the members you want to have participate in couplings, so the other
# members won't be slowed at all (and won't have their inheritance of attributes from
# superclasses blocked at all).
#
# ***Alternatively, I may eventually support a "dangerous" mode. Creation of a class property is needed
# primarily for safety (ensuring couplings won't accidentally be broken) and convenience (returning
# scalars rather than tiny arrays of scalars, whereas coupling technically always links an attribute
# to a subarray of the buffer).  Dangerous mode would opt out of these helpful features,
# but would avoid the slight slowdown of having the __get__ and __set__ methods that provide these
# features be called whenever any class member has that attribute gotten or set.

Home

Last updated 2006.04.12, by Justin C. Fisher. fisher@smu.edu