## [Question] Cross-matching in python?

"I am quite new to python and trying to convert my old IDL scripts. In particular I'm interested in cross-correlating source catalogs: is anything already publicly available on this problem?"

To get this started, I'll comment that I think the poster is looking for cross-correlation based on positional separation (match within 1 arcsec).   I do not know of a full fledged script but among the coordinate resources listed at astropython.org, I was able to easily calculate angular separations using the astLib and coords packages.

`# using astLib from astLib.astCoords import calcAngSepDeg as gcirctheta1c = [83.81860444,-5.38969611]theta1b = [83.817270,-5.3853220]print '\nastLib.astCoords'print gcirc(*theta1c+theta1b),'degrees'print gcirc(*theta1c+theta1b)*3600.,'arcsec'print gcirc(*theta1c+theta1b)*3600. < 20.# Results:# astLib.astCoords# 0.00457141885429 degrees# 16.4571078754 arcsec# True# using coordsimport coordsprint '\ncoords'ptheta1c = coords.Position(tuple(theta1c))ptheta1b = coords.Position(tuple(theta1b))print ptheta1c.angsep(ptheta1b)print ptheta1c.angsep(ptheta1b).arcsec(),'arcsec'print ptheta1c.within(ptheta1b,20,units='arcsec')# Results# coords# 0.004571 degrees# 16.4571078405 arcsec# Trueprint '\ncompare'print gcirc(*theta1c+theta1b)*3600.,ptheta1c.angsep(ptheta1b).arcsec()print 'delta: ', gcirc(*theta1c+theta1b)*3600.-ptheta1c.angsep(ptheta1b).arcsec()# Results# compare# 16.4571078754 16.4571078405# delta: 3.49091848761e-08`

Posted by virtualastronomer

1. Re: Article by Moritz (2010-04-29)

Last week, I also wanted to merge several datasets, matching them by position. I did not find any module to do that in python, so I started writing one. Unfortunately, it's still at a very early stage, but I'll post it when the basic matching works.
2. Re: Article by Ivan (2010-04-29)

Typically, cross-match is a functionality where you compare some lists of coordinates not bothering with loops and calculation of the distance between all points in a lists. Low-level user programming of this stuff is not desired as you won't make it efficiently even, say, at 100k points level. In the meantime, with the new surveys coming, there's a growing evidence many people need to match quite a big lists at their workstations. AFAIK, there's no good mid-scale cross-matching module in Python. I ended with os.system() of STILTS (check http://www.star.bris.ac.uk/~mbt/stilts/sun256/match.html for its x-match docs) but of course I'm not happy with calling java stuff from python and reading/writing tables for that...

3. Re: Article by Jonathan Foster (2010-04-29)

The IDL cross-match routine (srcor) is not efficient for large lists, but is sufficient for many tasks. Something like it for python would be quite useful.

For larger lists I agree with Ivan that you would want to do something more intelligent -- like a kd-tree (http://www.scipy.org/Cookbook/KDTree).
4. Re: Article by Gus (2010-04-29)

I would completely agree with the assertions of @Ivan and @Jonathan that STILTS can scale to the informatics scale while low level routines for small data sets are actually very very useful.

Yet, I think Ivan is picking up that angular separation matching is assumed to be a boolean assertion captured by simple, low level looping over lists.  Positional matching is actually a probabilistic assertion. For very large datasets simple angsep arguments are kind of silly since dealing with large datasets requires methodologies that can dig into covariance (such as localized astrometric warps).

A python module that builds off coords (because it can objectify object-object relations in spherical coordinates) to provide probability based matching with flexible parameters would be a very powerful tool and prob not that hard to code.

5. New Resource by Gus (2010-04-29)

A new resource has been posted to astropython.org that might provide or lead to a solution to the problem of cross-matching in python.  Ryan Scranton kindly alerted us to his STOMP C++ library for which he has added a python SWIG wrapper.  I look forward to trying this out.
6. Re: Article by Neil (2010-06-17)

I've written a function to do this that you might want to try - the only dependency is Numpy, and it's fast enough for arrays of up to a few tens of thousands of coordinates.

You can copy the code from here:

http://bitbucket.org/nhmc/pyserpens/src/tip/coord.py

The two necessary functions are match() and angsep().
7. Re: Article by Erik Tollerud (2013-03-30)

I've implemented a function to do this  that uses scipy's cKDTree.  Hence, it is quite fast and scales as n * log(n).  It also does *not* use the small-angle approximation, and is hence valid for large distances on the sky.

It's available for anyone to use - see https://gist.github.com/eteq/4599814