[Question] Cross-matching in python?

Aldo asked:

"I am quite new to python and trying to convert my old IDL scripts. In particular I'm interested in cross-correlating source catalogs: is anything already publicly available on this problem?"

To get this started, I'll comment that I think the poster is looking for cross-correlation based on positional separation (match within 1 arcsec).   I do not know of a full fledged script but among the coordinate resources listed at astropython.org, I was able to easily calculate angular separations using the astLib and coords packages.

# using astLib 

from astLib.astCoords import calcAngSepDeg as gcirc

theta1c = [83.81860444,-5.38969611]
theta1b = [83.817270,-5.3853220]

print '\nastLib.astCoords'
print gcirc(*theta1c+theta1b),'degrees'
print gcirc(*theta1c+theta1b)*3600.,'arcsec'
print gcirc(*theta1c+theta1b)*3600. < 20.

# Results:
# astLib.astCoords
# 0.00457141885429 degrees
# 16.4571078754 arcsec
# True

# using coords

import coords

print '\ncoords'
ptheta1c = coords.Position(tuple(theta1c))
ptheta1b = coords.Position(tuple(theta1b))

print ptheta1c.angsep(ptheta1b)
print ptheta1c.angsep(ptheta1b).arcsec(),'arcsec'
print ptheta1c.within(ptheta1b,20,units='arcsec')

# Results
# coords
# 0.004571 degrees
# 16.4571078405 arcsec
# True

print '\ncompare'
print gcirc(*theta1c+theta1b)*3600.,ptheta1c.angsep(ptheta1b).arcsec()
print 'delta: ', gcirc(*theta1c+theta1b)*3600.-ptheta1c.angsep(ptheta1b).arcsec()

# Results
# compare
# 16.4571078754 16.4571078405
# delta: 3.49091848761e-08

Posted by virtualastronomer
Add a comment...


  1. Re: Article by Moritz (2010-04-29)

    Last week, I also wanted to merge several datasets, matching them by position. I did not find any module to do that in python, so I started writing one. Unfortunately, it's still at a very early stage, but I'll post it when the basic matching works.
  2. Re: Article by Ivan (2010-04-29)

    Typically, cross-match is a functionality where you compare some lists of coordinates not bothering with loops and calculation of the distance between all points in a lists. Low-level user programming of this stuff is not desired as you won't make it efficiently even, say, at 100k points level. In the meantime, with the new surveys coming, there's a growing evidence many people need to match quite a big lists at their workstations. AFAIK, there's no good mid-scale cross-matching module in Python. I ended with os.system() of STILTS (check http://www.star.bris.ac.uk/~mbt/stilts/sun256/match.html for its x-match docs) but of course I'm not happy with calling java stuff from python and reading/writing tables for that...

  3. Re: Article by Jonathan Foster (2010-04-29)

    The IDL cross-match routine (srcor) is not efficient for large lists, but is sufficient for many tasks. Something like it for python would be quite useful.

    For larger lists I agree with Ivan that you would want to do something more intelligent -- like a kd-tree (http://www.scipy.org/Cookbook/KDTree).
  4. Re: Article by Gus (2010-04-29)

    I would completely agree with the assertions of @Ivan and @Jonathan that STILTS can scale to the informatics scale while low level routines for small data sets are actually very very useful.

    Yet, I think Ivan is picking up that angular separation matching is assumed to be a boolean assertion captured by simple, low level looping over lists.  Positional matching is actually a probabilistic assertion. For very large datasets simple angsep arguments are kind of silly since dealing with large datasets requires methodologies that can dig into covariance (such as localized astrometric warps).

    A python module that builds off coords (because it can objectify object-object relations in spherical coordinates) to provide probability based matching with flexible parameters would be a very powerful tool and prob not that hard to code. 

  5. New Resource by Gus (2010-04-29)

    A new resource has been posted to astropython.org that might provide or lead to a solution to the problem of cross-matching in python.  Ryan Scranton kindly alerted us to his STOMP C++ library for which he has added a python SWIG wrapper.  I look forward to trying this out.
  6. Re: Article by Neil (2010-06-17)

    I've written a function to do this that you might want to try - the only dependency is Numpy, and it's fast enough for arrays of up to a few tens of thousands of coordinates.

    You can copy the code from here:


    The two necessary functions are match() and angsep().
  7. Re: Article by Erik Tollerud (2013-03-30)

    I've implemented a function to do this  that uses scipy's cKDTree.  Hence, it is quite fast and scales as n * log(n).  It also does *not* use the small-angle approximation, and is hence valid for large distances on the sky.

    It's available for anyone to use - see https://gist.github.com/eteq/4599814
Enter Your Comment