I’ve talked a lot to fellow developers about making PySAL objects more than containers for the results of a statistical procedure.

One way I think we can do this is to focus on methods like predict, find, update, or reclassify.

So, here, I’ll show the way I’ve implemented a simple API to update map classifiers by defining their __call__ method.

import pysal as ps

The patch I applied to mapclassify should be in this github branch. To get it, you’ll need to git fetch my repository and check out the reclassify branch. Alternatively, what I added to Map_Classifier is so small, it’s easy to show:

First, I added a call method:

def __call__(self, *args, **kwargs):
"""
    This will allow the classifier to be called like a
    function *after* instantiation
    """
if inplace:
self._update(new_data, **kwargs)
else:
new = copy.deepcopy(self)
new._update(new_data, **kwargs)
return new

This will allow us to do something like:

classifier = pysal.Quantiles(data)
classifier(k=4)
classifier(k=9)
classifier(new_data, inplace=True)

and proceed to interact with the classifier object over and over again. Since there’s an inplace toggle (False by default), users can choose when to mutate or when to copy.

In theory, the __call__ method can support all of the different __init__ declarations possible. I’ve defined it this way because most of the mapclassify methods I can think of use a mandatory data argument and optional keyword arguments. The only one that varies from this is User_Defined, which I overwrote to handle correctly.

The main point here is that this enables users to quickly reclassify and view new classifications using the object they created! Thus, a common use case might be something like this:

df = ps.pdio.read_files(ps.examples.get_path('south.dbf'))

df.head()

data = df['HR60'].values

classifier = ps.Quantiles(data)

classifier

                Quantiles
Lower            Upper              Count
x[i] <=  2.497               283
2.497 < x[i] <=  5.104               282
5.104 < x[i] <=  7.621               282
7.621 < x[i] <= 10.981               282
10.981 < x[i] <= 92.937               283

Once estimated, the user can reclassify based on the same API as the constructor:

classifier(k=3)

                Quantiles
Lower            Upper              Count
x[i] <=  4.265               471
4.265 < x[i] <=  8.679               470
8.679 < x[i] <= 92.937               471

classifier(k=9)

                Quantiles
Lower            Upper              Count
x[i] <=  0.000               180
0.000 < x[i] <=  2.836               134
2.836 < x[i] <=  4.265               157
4.265 < x[i] <=  5.628               157
5.628 < x[i] <=  7.137               156
7.137 < x[i] <=  8.679               157
8.679 < x[i] <= 10.600               157
10.600 < x[i] <= 13.924               157
13.924 < x[i] <= 92.937               157

It doesn’t mutate the object unless inplace is provided and is true:

classifier

                Quantiles
Lower            Upper              Count
x[i] <=  2.497               283
2.497 < x[i] <=  5.104               282
5.104 < x[i] <=  7.621               282
7.621 < x[i] <= 10.981               282
10.981 < x[i] <= 92.937               283

classifier(k=6, inplace=True)

classifier

                Quantiles
Lower            Upper              Count
x[i] <=  1.993               236
1.993 < x[i] <=  4.265               235
4.265 < x[i] <=  6.245               235
6.245 < x[i] <=  8.679               235
8.679 < x[i] <= 11.850               235
11.850 < x[i] <= 92.937               236

This also enables users to add new data to the classifier.

Now, I bet there are better updating equations for the different classifiers than reestimating the entire classifier, like there are for running median problems. I anticipated extending this work with more sophisticated updaters than just reclassifying the entire set. This is why I split the __call__ method from what really does the updating:

def _update(self, data, *args, **kwargs):
if data is not None:
data = np.append(data.flatten(), y)
else:
data = self.y
self.__init__(data, *args, **kwargs) #this is the most naive updater

As the comment denotes, this is the most universally-acceptible updater, hence it’s definition in the Map_Classify baseclass. Fortunately, this means that any new classifier defined as a subclass of this gets a very naive in-place reclassification method for free.

Thus, you can do stuff like:

new_data = df['HR90'].values

classifier(new_data)

                Quantiles
Lower            Upper              Count
x[i] <=  3.228               565
3.228 < x[i] <=  5.912               565
5.912 < x[i] <=  8.710               564
8.710 < x[i] <= 12.735               565
12.735 < x[i] <= 92.937               565

classifier(new_data, k=14)

                Quantiles
Lower            Upper              Count
x[i] <=  0.000               296
0.000 < x[i] <=  2.200               108
2.200 < x[i] <=  3.469               201
3.469 < x[i] <=  4.483               202
4.483 < x[i] <=  5.394               202
5.394 < x[i] <=  6.282               201
6.282 < x[i] <=  7.297               202
7.297 < x[i] <=  8.266               202
8.266 < x[i] <=  9.348               201
9.348 < x[i] <= 10.628               202
10.628 < x[i] <= 12.217               202
12.217 < x[i] <= 14.603               201
14.603 < x[i] <= 18.544               202
18.544 < x[i] <= 92.937               202

classifier(new_data, k=6, inplace=True)

classifier

                Quantiles
Lower            Upper              Count
x[i] <=  2.691               471
2.691 < x[i] <=  5.069               471
5.069 < x[i] <=  7.297               470
7.297 < x[i] <=  9.736               471
9.736 < x[i] <= 13.736               470
13.736 < x[i] <= 92.937               471

So, this is what I mean by “responsive” classes. They should:

support updating/reuse w/ new data
support augmentation of initial/init-time options/parameters
provide __call__ methods that consistently either update or use.

In map classification, I think __call__ would be better suited to find_bin than update_bins. In spatial regression, I think __call__ would be better suited to predict than something else.

__call__ should never alias summary() methods, which probably belong in __repr__, anyway.

	FIPSNO	NAME	STATE_NAME	STATE_FIPS	CNTY_FIPS	FIPS	STFIPS	COFIPS	SOUTH	HR60	…	BLK90	GI59	GI69	GI79	GI89	FH60	FH70	FH80	FH90	geometry
0	54029	Hancock	West Virginia	54	029	54029	54	29	1	1.682864	…	2.557262	0.223645	0.295377	0.332251	0.363934	9.981297	7.8	9.785797	12.604552	<pysal.cg.shapes.Polygon object at 0x7fc5495eb…
1	54009	Brooke	West Virginia	54	009	54009	54	9	1	4.607233	…	0.748370	0.220407	0.318453	0.314165	0.350569	10.929337	8.0	10.214990	11.242293	<pysal.cg.shapes.Polygon object at 0x7fc5495eb…
2	54069	Ohio	West Virginia	54	069	54069	54	69	1	0.974132	…	3.310334	0.272398	0.358454	0.376963	0.390534	15.621643	12.9	14.716681	17.574021	<pysal.cg.shapes.Polygon object at 0x7fc5495eb…
3	54051	Marshall	West Virginia	54	051	54051	54	51	1	0.876248	…	0.546097	0.227647	0.319580	0.320953	0.377346	11.962834	8.8	8.803253	13.564159	<pysal.cg.shapes.Polygon object at 0x7fc549565…
4	10003	New Castle	Delaware	10	003	10003	10	3	1	4.228385	…	16.480294	0.256106	0.329678	0.365830	0.332703	12.035714	10.7	15.169480	16.380903	<pysal.cg.shapes.Polygon object at 0x7fc549565…

Bringing Classifiers Alive in PySAL

Lower Upper Count

Lower Upper Count

Lower Upper Count

Lower Upper Count

Lower Upper Count

Lower Upper Count

Lower Upper Count

Lower Upper Count