I’ve talked a lot to fellow developers about making PySAL objects more than containers for the results of a statistical procedure.
One way I think we can do this is to focus on methods like predict, find, update, or reclassify.
So, here, I’ll show the way I’ve implemented a simple API to update map classifiers by defining their __call__ method.
import pysal as ps
The patch I applied to mapclassify should be in this github branch. To get it, you’ll need to git fetch my repository and check out the reclassify branch. Alternatively, what I added to Map_Classifier is so small, it’s easy to show:
First, I added a call method:
def __call__(self, *args, **kwargs):
"""
This will allow the classifier to be called like a
function *after* instantiation
"""
if inplace:
self._update(new_data, **kwargs)
else:
new = copy.deepcopy(self)
new._update(new_data, **kwargs)
return new
This will allow us to do something like:
classifier = pysal.Quantiles(data)
classifier(k=4)
classifier(k=9)
classifier(new_data, inplace=True)
and proceed to interact with the classifier object over and over again. Since there’s an inplace toggle (False by default), users can choose when to mutate or when to copy.
In theory, the __call__ method can support all of the different __init__ declarations possible. I’ve defined it this way because most of the mapclassify methods I can think of use a mandatory data argument and optional keyword arguments. The only one that varies from this is User_Defined, which I overwrote to handle correctly.
The main point here is that this enables users to quickly reclassify and view new classifications using the object they created! Thus, a common use case might be something like this:
df = ps.pdio.read_files(ps.examples.get_path('south.dbf'))
df.head()
data = df['HR60'].values
classifier = ps.Quantiles(data)
classifier
Once estimated, the user can reclassify based on the same API as the constructor:
classifier(k=3)
classifier(k=9)
It doesn’t mutate the object unless inplace is provided and is true:
classifier
classifier(k=6, inplace=True)
classifier
This also enables users to add new data to the classifier.
Now, I bet there are better updating equations for the different classifiers than reestimating the entire classifier, like there are for running median problems. I anticipated extending this work with more sophisticated updaters than just reclassifying the entire set. This is why I split the __call__ method from what really does the updating:
def _update(self, data, *args, **kwargs):
if data is not None:
data = np.append(data.flatten(), y)
else:
data = self.y
self.__init__(data, *args, **kwargs) #this is the most naive updater
As the comment denotes, this is the most universally-acceptible updater, hence it’s definition in the Map_Classify baseclass. Fortunately, this means that any new classifier defined as a subclass of this gets a very naive in-place reclassification method for free.
Thus, you can do stuff like:
new_data = df['HR90'].values
classifier(new_data)
classifier(new_data, k=14)
classifier(new_data, k=6, inplace=True)
classifier
So, this is what I mean by “responsive” classes. They should:
- support updating/reuse w/ new data
- support augmentation of initial/init-time options/parameters
- provide
__call__methods that consistently eitherupdateoruse.
In map classification, I think __call__ would be better suited to find_bin than update_bins.
In spatial regression, I think __call__ would be better suited to predict than something else.
__call__ should never alias summary() methods, which probably belong in __repr__, anyway.
Originally posted on yetanothergeographer.tumblr.com.