|
WSA - Questions and Answers page
Before contacting wsa-support@roe.ac.uk if you have a query about accessing the WSA please see
if it has already been covered in the Q&A listed below. (note: please do not contact individuals in the Wide-Field Astronomy Unit, use wsa-support@roe.ac.uk).
General look-ups between tables *Source and *Detection.
In general when querying the *Source table, you may find that certain catalogue quantities
that you need are not present. This is because the table design in the WSA is to
write only a subset of the most useful attributes available from *Detection into *Source (for
efficiency reasons). For example, CASU standard source extraction produces 13 aperture
magnitudes for every detection in every passband and these are of course present in
*Detection, but in *Source you will only see a subset of 3 per passband - aperMags 1, 3
and 4 for example. How do you query if you are interested in an aperture magnitude
other than 1, 3 or 4 - say aperMag2 in the J band for a GPS query?
The answer is to do a join query between *Source and *Detection,
noting that each row of *Source is a member of a frame set the details of
which are stored in *MergeLog (hence the join has to include *MergeLog as well).
The SQL syntax is best illustrated by an example:
SELECT ra, dec, jd.aperMag2 as jAperMag2
FROM gpsSource AS s, gpsMergeLog AS l, gpsDetection as jd
WHERE
/* Join source to merge log for the frame set: */
s.frameSetID=l.frameSetID AND
/* Join merge log to detection for the J frame: */
l.jmfID = jd.multiframeID AND l.jeNum = jd.extNum AND
/* Join source to detection to look up the detection required: */
s.jSeqNum = jd.seqNum
Note that the final predicate could be done via objID:
... AND s.jobjID = jd.objID but this is most inadvisable for
efficiency reasons (the attribute combination multiframe number,
extension number, sequence number is the primary key in
*Detection so look-ups are very fast on that combination) and
furthermore for the more subtle reasons detailed
below.
Default detection rows and objID or (multiframeID,extNum,seqNum) for look-ups between *Source and *Detection.
In the example *Source-to-*Detection join query given
above,
one might be tempted to use objID as the look-up into *Detection in place
of the more long-winded predicate combination involving multiframeID,
extNum and seqNum since both the former attribute and the latter
3-attribute combination are unique. Users should be aware, however, that there
are default rows in the *Detection table to allow joins between *Source
and *Detection that allow selection of *Source rows that have incomplete
passband coverage when querying that table in conjunction with a join with
*Detection. Since there is now (i.e. from DR4 onwards) a default row in *Detection
for every (non-default) multiframe/extension number combination, looking up a row in
*Detection via objID alone when that attribute is a default produces many matches
for all those default rows in *Detection having that same default value in that table.
Those rows could be excluded by an addition WHERE ... AND objID > 0 but this is
very inefficient and will be inappropriate if you are trying to select source rows that
may have non-detections (and therefore default objIDs) in some passbands. Hence,
ALWAYS use the correct relational association of the combination
of multiframeID, extNum and seqNum when joining *Source
to *Detection, as in the example query given above.
Trade-off between completeness and reliability in point source sample selections
There is always a trade-off between completeness and reliability when using
catalogue data, since no image analysis software is perfect. The
morphological classification attributes class and classStat
allow you to make a choice about how to maximise completeness, or
conversely minimise contamination, at query time - for examples of this,
consult the SQL scripts defined under "views" in the schema browser. Note
that although the integer classification scheme is based on the N(0,1) measure
of stellarness (classStat) it has several overrides built in to attempt to
make it more reliable. ClassStat is equivalent to a straight likelihood
measure based directly on the curve-of-growth of the aperture fluxes.
But because the statistic is not strictly speaking a Gaussian distribution
and has much broader tails, the Class parameter has assorted overrides
invoked based on saturation, ellipticity as a function of magnitude and so
on. The most reliable way of getting a reasonably complete stellar-like sample
is to require class of -1 and -2 on the bands in question and then if you
are concerned about completeness add in any extra sources with ClassStat
in the range 0 to +n-sigma; where n is between 3 and 5 depending on how
non-stellar you want to push things.
Why is my CMD/two-colour diagram so messy?
When selecting samples of merged sources and displaying in colour-magnitude or
multi-colour space, simply selecting all sources from the merged source tables
will not yield clean diagrams. As always with survey datasets, there is
a balance to be struck between reliability and completeness, and
that balance can only be made by the end-user for a given science application.
Hence, for example, the pairing radius used in the source merging is very
generous in terms of the RMS astrometric precision of WFCAM detections (this is
to pull in high proper motion stars, and very low s/n objects with poorly
defined centroids). In general, you should always refine sample selections from
the merged source lists appropriately for your science usage. So if, for example,
you are interested in a highly reliable set of point sources from lasSource
that you know are not moving (e.g. QSOs), then you should filter your selection
by using the pair proximity offset attributes *Xi and *Eta in
your WHERE clause. Examples of this are given in the SQL queries that define
the reliable source "views" in the schema browser.
Issues relating to the use of the ZPN World Co-ordinate System
CASU pipeline processed images from WFCAM come with accurate (good to
around 0.1 arcsec) astrometry based on 2MASS secondary astrometric
standards. The astrometric information is encoded in the FITS file as
World Co-ordinate System (WCS) keywords: the WCS type used is the
zenithal polynomial (ZPN) projection. Note that some older image
display utilities may not compute correctly cursor co-ordinates for
this projection, so if there is any suspicion that this may be the
case, do not rely on the real-time cursor readout from such software.
Information on the fix for ZPN in IRAF
Terapix Software ZPN howto
Compressed image format (CFITSIO RICE compression)
CASU pipeline processed images are supplied in a compact, compressed format
to reduce online storage and network transfer overheads. The format employed
is known as
Rice
tile compression, and is implemented in the CFITSIO library
that is used by the core processing software. By default, images are
delivered from the WSA in this format: network transfer will be a
factor 3 to 4 times quicker, and end-user storage requirements a factor
3 to 4 times smaller, if you download the compressed image files (the
compression ratio depends on the noise properties of the image in question;
these figures are typical for short WFCAM exposures, as in the UKIDSS
Large Area Survey). Hence
it makes a great deal of sense to stick to using Rice compressed
products as opposed to selecting standard decompressed images (which are
also available). The only disadvantage is that the Rice compressd format
is not universally supported in applications software at present; note however that it is easy
to download and install
an image copy utility (imcopy)
that can decompress to standard FITS (the Rice
compressed images are actually stored as FITS binary tables and not
standard FITS images - this is why some image display utilities will not
handle them correctly). On a system with Starlink you can compile imcopy using
gcc -I/star/include -L/star/lib -lm imcopy.c -lcfitsio -o imcopy
The WSA imcopy page gives a step by step guide to
downloading and installation.
CFITSIO Quick Start Guide includes additional informational on
installation.
When trying to view or uncompress our FITS files, in some rare cases you
may come across one or both of the following error messages:
FITSIO status = 1: non-CFITSIO program error
decompression error: hit end of compressed byte stream
This is because our FITS file images have been compressed using a newer
version of fpack & CFITSIO. The only solution is to first uncompress the
FITS file using a version of CFITSIO >= 3.1 (fpack >= 1.0) prior to
viewing.
Querying date/time quantities
All dates and time stamps used in the data flow system (i.e. from the
telescope/instrument, through the pipeline processing and at the archive
end) are universal time co-ordinate (UTC) quantities; the
SQL data type for these quantities in the archive is the MS SQL Server
datetime type. To query a datetime type, e.g. compare a datetime
quantity against a literal constant, simply express that constant as
a string in double quotes in the following format:
SELECT count(*)
FROM Multiframe
WHERE dateObs > "2005-Apr-18 14:00:00.0"
will return a count of the number of frames in the archive that have
observation dates and times since 14:00:00.0 18th April 2005 (UTC)
Querying string quantities
String quantities (SQL data types char or varchar) can be
queried in the WHERE clause of an SQL statement and compared using
certain comparators (see the Cookbook) in the same way as numerical
quantities. Perhaps the most useful comparator, however, is LIKE.
For example, you can select all archived FITS file observations from a
given date string in the filename (say 1st April 2005) by the following
statement input into the freeform SQL box:
SELECT fileName
FROM Multiframe
WHERE fileName LIKE "%20050401%"
Note the use of double quotes for string literals, and the use of the
wildcard percent character to match any string.
Querying cross-matched catalogue data (e.g. WFCAM v. SDSS)
WSA catalogue tables (i.e. those created for the UKIDSS surveys) are
automatically cross-matched to a number of external survey
catalogue datasets held locally, e.g. SDSS Data Releases; FIRST, IRAS, ROSAT
and 2MASS catalogues; and legacy photographic catalogues like USNO-B and
the SuperCOSMOS Sky Surveys held within the
SuperCOSMOS Science Archive (for
a complete list, click on "Browser" on a navigation bar).
Rather than prejoining any two datasets to create a static, merged table of
what are assumed to be associated sources on the basis of a fixed joining
algorithm, the WSA philosophy is to create a table of pointers between
any two matched datasets. This means that any externally catalogued source
co-incident or nearby a WFCAM source is readily available, out to some
predefined maximum angular search radius, and all sorts of science usages
are possible because the exact matching criteria (e.g. maximum allowable
displacement, consistency in morphological parameters or classification,
or even the number of possible external matches for a given WFCAM source)
can be tuned at query time by simple expressions in SQL. Furthermore, all
attributes of both datasets are available to querying, because no decision
has been made as to which attributes to propagate into a static merged set.
The flip side to this flexibility is the rather strange overall syntax for
querying cross-matched data in SQL: instead of querying a single merged
table, e.g. SELECT * FROM MergedTable WHERE ...,
in general you have to query three tables: the two cross-matched
tables and the table of pointers, e.g.
SELECT *
FROM Table1, ExternalDB..Table2, CrossNeighboursTable2
WHERE ...
A complete tutorial, and example queries employing cross-matched data,
are available in Section 4 of the WSA Cookbook.
Coordinates - decimal degrees, sexagesimal, radians:
Coordinates in the WSA are primarily stored as decimal degrees. The region
search page does accept sexagesimal format but the results are returned
as decimal degrees. Elsewhere e.g. entering constraints in the menu query or freeform SQL pages values must be entered as decimal degrees.
Note: RA and Dec values are returned as decimal degrees except when written
to FITS files where radians are used .
Viewing the ingest history:
All ingest and curation events that occur during operation of the WFCAM
Science Archive are logged in database table ArchiveCurationHistory.
If you wish to see, for example, the ingest time stamps and number of images
ingested at those times, use the following free-form SQL query:
SELECT timeStamp, framesIngested
FROM ArchiveCurationHistory as A, (SELECT cuEventID, COUNT(*) AS framesIngested FROM Multiframe GROUP BY cuEventID) AS B
WHERE A.cuEventID=B.cuEventID
ORDER BY timeStamp
Note that you will be querying the ingest history of a static, released
database product, not the database that is accumulating on a day-to-day
basis as a result of routine WSA operations. Hence, for example, data
that has been recently transfered from the pipeline processing centre at
CASU does not appear immediately in the online accessible database.
Obtaining observation dates for catalogue selections
Observation dates are available for each image in table Multiframe
in attribute dateObs. Since every single passband detection originates
in an image and each merged source originates from a set of single
passband detections, and furthermore the relationship between images,
detections and sources is preserved in the relational arrangement of data
in the archive, observation dates for catalogue selections are readily
available via simple table join expressions in SQL. For example, to select the
observation date as an output attribute when querying the lasDetection
table, use the following statement:
SELECT ra,dec,dateObs,...
FROM lasDetection AS d, Multiframe AS m
WHERE d.multiframeID=m.multiframeID
The required SQL is a little more complicated in the case of obtaining the
observation dates for merged sources, since each source is made up from
several detections that have, in general, different observations times (and
possibly dates). For example, consider lasSource: the survey design for the
UKIDSS LAS is that merged sources are made up from YJHK observations
(ignoring the complicating factor of the second epoch J observations for
the sake of simplicity). The table lasMergeLog stores information
about the image that each individual passband detection comes from in the
merged source, and the source table and this merge log are connected via
the frameSetID attribute in both.
Suppose we are interested in the Z and J observation dates
for an LAS source selection consisting of merged sources detected in at least
the J and Y passbands - the following SQL will do the trick:
SELECT s.ra,s.dec,my.dateObs AS yDateObs, mj.dateObs AS jDateObs,...
FROM lasSource AS s, lasMergeLog AS l, Multiframe AS mj, Multiframe AS my
WHERE s.frameSetID=l.frameSetID AND ymfID=my.multiframeID AND j_1mfID=mj.multiframeID AND ymfID > 0 AND j_1mfID > 0
Furthermore,
you can select a sample based on date differences (for example, you may
want to exclude the possibility of variability screwing up colour selection in
a sample by insisting on near-simultaneity of the times of observation in the
different filters) using the MS SQL Server DATEDIFF function, which
has arguments time units, date1 and date2:
SELECT COUNT(*)
FROM lasSource AS s, lasMergeLog AS l, Multiframe AS mj, Multiframe AS my
WHERE s.frameSetID=l.frameSetID AND ymfID=my.multiframeID AND j_1mfID=mj.multiframeID AND ymfID > 0 AND j_1mfID > 0
AND DATEDIFF(mi, mj.dateObs, my.dateObs) BETWEEN -60 AND +60
This query counts the number of merged LAS sources where the Y and J
observations were taken within one hour (60 minutes) of each other.
How to compute the median in SQL:
Microsoft Transact-SQL provides useful extensions to the SQL standard that
enable easy computation of the median value of an attribute: one simply
selects the middle value from an ordered list, generated by a nested
query, of the attribute in question as in the following example which
selects the median ellipticty of all detector frames:
SELECT MAX(T.avStellarEll) FROM
(SELECT TOP 50 PERCENT avStellarEll FROM MultiframeDetector
WHERE avStellarEll > -0.9E9 ORDER BY avStellarEll ASC) AS T
Note the exclusion of any default values of the attribute in the query. The
query can be generalised easily to produce the Nth percentile by
changing the number in "TOP 50 PERCENT".
Connection problems - port 8080 (wireless network):
WSA access is carried on port 8080. Please make
sure your network/firewall has this port open (this was first reported
by a user trying to access the SSA from a wireless network.
Home |
Overview |
Browser |
Access |
Login |
Cookbook |
nonSurvey
Listing |
Region |
MenuQuery |
FreeSQL
Links |
Credits
WFAU, Institute for Astronomy,
Royal Observatory, Blackford Hill
Edinburgh, EH9 3HJ, UK
wsa-support@roe.ac.uk
24/1/2014
|