Skip to content

The Segment object

rizac edited this page Aug 26, 2021 · 18 revisions

Stream2segment offers a simple interface to interact with the database. No need to dig into the complexity of Object-Relational Mapping or SQL syntax, the user just needs to know that:

  • Each waveform segment is exposed as a simple Python object called Segment with a rich set of attributes and methods

  • Many segment attributes can also be used to perform flexible and complex selections (nearly impossible with a classical file system storage) thanks also to a simplified syntax using simple string expressions

Table of contents

Attributes and methods

The list of all Segment attributes and methods can be printed with few lines of code in your Notebook as shown below (for a list of methods, scroll to the bottom of the table). Included in the list are also attributes of the so-called related objects (e.g. segment.event, segment.station, segment.channel). Attributes are "selectable" in that they can be used to perform powerful cusom selections (see next section).

from stream2segment.process import get_segment_help
from IPython.display import display, HTML
display(HTML(get_segment_help(format='html')))
Selectable attributes Type and optional description
id int: segment (unique) db id
has_data bool: if the segment waveform data is not empty, i.e. it has at least 1 byte of data saved. This parameter or has_valid_data are often necessary in segment selection, e.g.:
has_data: 'true'
Empty segments are those whose server did not return any data and are stored anyway for collecting stats and allow to customize what should be re-downloaded in further attempts
has_valid_data bool: if the segment waveform data is not empty and could be successfully read as miniSEED during download. Often necessary in segment selection, e.g.:
has_valid_data: 'true'
event_distance_deg float: distance between the segment station and the event, in degrees
event_distance_km float: distance between the segment station and the event, in km, assuming a perfectly spherical earth with a radius of 6371 km
start_time datetime.datetime: waveform start time
arrival_time datetime.datetime: waveform arrival time (value between 'start_time' and 'end_time')
end_time datetime.datetime: waveform end time
request_start datetime.datetime: waveform requested start time
request_end datetime.datetime: waveform requested end time
duration_sec float: waveform data duration, in seconds
missing_data_sec float: number of seconds of missing data, as ratio of the requested time window. It might also be negative (more data received than requested). Useful in segment selection: e.g., if we requested 5 minutes of data and we want to process segments with at least 4 minutes of downloaded data, then: missing_data_sec: '< 60'
missing_data_ratio float: portion of missing data, as ratio of the requested time window. It might also be negative (more data received than requested). Useful in segment selection: e.g., to process segments whose time window is at least 90% of the requested one: missing_data_ratio: '< 0.1'
sample_rate float: waveform sample rate. It might differ from the segment channel sample_rate
maxgap_numsamples float: maximum gap/overlap (G/O) found in the waveform, in number of points. If
0: segment has no G/O
>=1: segment has gaps
<=-1: segment has overlaps.
Values in (-1, 1) are difficult to interpret: a rule of thumb is to consider no G/O if values are within -0.5 and 0.5. Useful in segment selection: e.g., to process segments with no gaps/overlaps:
maxgap_numsamples: '(-0.5, 0.5)'
seed_id str: the seed identifier in the typical format [Network].[Station].[Location].[Channel]. For segments with waveform data, data_seed_id (see below) might be faster to fetch.
data_seed_id str: same as 'segment.seed_id', but faster to get because it reads the value stored in the waveform data. The drawback is that this value is null for segments with no waveform data
classlabels_count int: the number of class labels assigned to this segment
data bytes: the waveform (raw) data. Used by segment.stream()
queryauth bool: if the segment download required authentication (data is restricted)
event.id int
event.event_id str: the id returned by the web service or catalog
event.time datetime.datetime
event.latitude float
event.longitude float
event.depth_km float
event.author str
event.catalog str
event.contributor str
event.contributor_id str
event.mag_type str
event.magnitude float
event.mag_author str
event.event_location_name str
event.event_type str: the event type (e.g. "earthquake")
channel.id int
channel.location str
channel.channel str
channel.depth float
channel.azimuth float
channel.dip float
channel.sensor_description str
channel.scale float
channel.scale_freq float
channel.scale_units str
channel.sample_rate float
channel.band_code str: the first letter of channel.channel
channel.instrument_code str: the second letter of channel.channel
channel.orientation_code str: the third letter of channel.channel
channel.band_instrument_code str: the first two letters of channel.channel
station.id int
station.network str: the station's network code, e.g. 'AZ'
station.station str: the station code, e.g. 'NHZR'
station.netsta_code str: the network + station code, concatenated with the dot, e.g.: 'AZ.NHZR'
station.latitude float
station.longitude float
station.elevation float
station.site_name str
station.start_time datetime.datetime
station.end_time datetime.datetime
station.has_inventory bool: tells if the segment's station inventory has data saved (at least one byte of data). Useful in segment selection. E.g., to process only segments with inventory downloaded:
station.has_inventory: 'true'
station.datacenter object (same as segment.datacenter, see below)
datacenter.id int
datacenter.station_url str
datacenter.dataselect_url str
datacenter.organization_name str
download.id int
download.run_time datetime.datetime
classes.id int: the id(s) of the class labels assigned to the segment
classes.label int: the unique name(s) of the class labels assigned to the segment
classes.description int: the description(s) of the class labels assigned to the segment
Standard attributes or methods Description
stream(reload=False) Return the ObsPy Stream object representing the segment waveform data

Parameter reload: bool. Optional (default: False). Force reloading the Stream
object from the downloaded waveform data (bytes sequence), discarding
any ObsPy in-place operation that might have modified the Stream
inventory(reload=False) Return the inventory of the segment Station as ObsPy Response object

Parameter reload: bool. Optional (default: False). Force reloading the Response
object from the downloaded waveform data (bytes sequence). In most cases
you can ignore this parameter as a Response object is usually never modified
but used as read-only object
url Return the full URL that can be used to (re)download the Segment
waveform data in miniSEED format (For details, see GET request here:
https://www.fdsn.org/webservices/fdsnws-dataselect-1.1.pdf)
sds_path(root='.') Return a string representing the SeisComP data structure
which can be used as path to store the segment miniSEED:

root/EID/Year/NET/STA/CHAN.D/NET.STA.LOC.CHAN.TYPE.YEAR.DAY

where root is the optional argument, EID is the database unique id of
the event (integer), and all other fields are defined here:
https://www.seiscomp.de/seiscomp3/doc/applications/slarchive/SDS.html

Parameter root: Optional (defaults to '.' when missing). The root path of this
segment file (first argument of os.path.join)
dbsession Return the database session to which this object is attached. Use with care:
the session is for advanced users who need full freedom to interact with
the database.
For an introduction, see: https://docs.sqlalchemy.org/en/latest/orm/session.html
add_classlabel(*class_ids_or_labels, commit=True, empty_first=False, annotator=None) Add class label(s) to this segment

Parameter class_ids_or_labels: variable-length argument of the unique IDs
(int) or labels (str) of the classes to be added to this segment.
Classes already assigned to this segment will be ignored,
as well as IDs ot labels not matching any database class

Parameter commit: boolean (default: True) denoting if any change
should be saved to the database (flush pending changes and commit
the current transaction).
Advanced users can set this parameter to False to manage the
transaction manually and eventually call segment.dbsession.commit()
when needed

Parameter annotator: (str, default: None). The annotator assigning the labelling.
A None annotator should mean that the label assignment is the result of a
classifier prediction and not human inspection: providing an annotator
(not None) will set the is_hand_labelled property of the Class labelling
to True

Parameter empty_first: boolean (default False) telling if all existing class
labels associated to this segment should be removed first, before
adding new class labels

Raises :class:sqlalchemy.exc.SQLAlchemy if a commit error occurs.
For info see:
https://docs.sqlalchemy.org/en/latest/orm/session_basics.html
del_classlabel(*class_ids_or_labels, commit=True) Delete class labels previously associated to this segment

Parameter class_ids_or_labels: variable-length argument of the unique IDs
(int) or labels (str) of the classes to be removed from this
segment. When NO ids or labels are provided, ALL class labels
associated to this segment will be deleted. E.g.: segment.del_classes()

Parameter commit: boolean (default: True) denoting if any change
should be saved to the database (flush pending changes and commit
the current transaction).
Advanced users can set this parameter to False to manage the
transaction manually and eventually call segment.dbsession.commit()
when needed

Raises :class:sqlalchemy.exc.SQLAlchemy if a commit error occurs.
For info see:
https://docs.sqlalchemy.org/en/latest/orm/session_basics.html
siblings(*matching_attributes, include_self=False) Return an iterable of all Segment objects that are equal to this
segment in the given matching attribute(s). By default, these
are the segments recording the same event on the other channel
components / orientations.

E.g., given a segment object seg, seg.siblings() is equivalent to:

seg.siblings(
    'event.id'
    'station.id',
    'channel.location',
    'channel.band_instrument_code'
)
(channel.band_instrument_code is the channel code without the last
letter denoting the channel orientation).

In general, any segment selectable attribute can be given. For instance,
to get all segments from the same event, channel, station or network:
seg.siblings('event.id')
seg.siblings('channel.id')
seg.siblings('station.id')
seg.siblings('station.network')
Note that a station closed and reopened with a different start time is
not considered the same. To get all segments of the same station
identified only by its network and station code, you have two options:
seg.siblings('station.network', 'station.station')
seg.siblings('station.netsta_code')
If multiple attributes are given, they will be concatenated with a
logical "and", e.g.., to get all segments from the same seismic event
and recorded by the same instrument (e.g., if seg is an accelerometer,
yield all accelerometers recording the same event of seg):
seg.siblings('channel.instrument_code', 'event.id')
Note: The returned iterable is technically a SQLAlchemy Query object
that can be customized by advanced users (for further details, see
https://docs.sqlalchemy.org/en/latest/orm/tutorial.html#querying).
For instance, in case of huge collections, consider loading only the
desired attributes, e.g.:
from stream2segment.process import Segment
from sqlalchemy.orm import load_only
for seg in seg.siblings('event.id').options(load_only(Segment.id))):
    seg_id = seg.id
    ...
Parameter matching_attributes: variable-length argument of strings denoting
the attributes used to find a match: a segment will be yielded if
it equals this segment in all matching attributes.
Attributes of related objects should be typed with the dot as
separator, e.g. 'station.id', 'event.magnitude'.
When empty, the matching attributes will default to the tuple:
'station.id', 'channel.location', 'channel.band_instrument_code',
'event.id'.

Parameter include_self: boolean (default: False). Whether to include this
segment among the yielded siblings

Segments selection

The selection of suitable segments for processing (e.g., functions imap or process, or YAML configuration file for the command s2s process) is performed by creating a dict mapping one or more Segment attributes to a selection expression for that attribute:

Syntax

segments_selection: {
  "[attribute]": "[expression]",
  "[attribute]": "[expression]",
  ...
}

Or, in YAML syntax (if you are implementing your own processing config):

segments_selection: 
  [attribute]: "[expression]"
  [attribute]: "[expression]"
  ...

(the variable name segments_selection is mandatory only in a YAML configuration file to be passed to the command s2s process ...)

Examples:

  1. Not all segments on the database have waveform data. This might happen for several reason (server error, connection error and so on). As such, in most cases you might probably want to work with segments with waveform data ("has_data" : "true") or, to be even more strict, wit valid data:
{"has_valid_data": "true"}
  1. To select and work on segments of stations activated in 2017 only:
{"station.start_time": "[2017-01-01, 2018-01-01T00:00:00)"}

(brackets denote intervals. Square brackets include end-points, round brackets exclude endpoints)

  1. To select segments from specified ids, e.g. 1, 4, 342, 67 (e.g., ids which raised errors during a previous run and whose id where logged might need inspection in the GUI): segment_select:
{"id": "1 4 342 67"}
  1. To select segments whose event magnitude is greater than 4.2:
{"event.magnitude": ">4.2"}

(the same way work the operators: =, >=, <=, <, !=)

  1. To select segments with a particular channel sensor description:
{"channel.sensor_description": "'GURALP CMG-40T-30S'"}

(note: for attributes with str values and spaces, we need to quote twice, as otherwise "GURALP CMG-40T-30S" would match 'GURALP' and 'CMG-40T-30S', but not the whole string. See list of selectable attributes)

Clone this wiki locally