The Segment object

Stream2segment offers a simple interface to interact with the database. No need to dig into the complexity of Object-Relational Mapping or SQL syntax, the user just needs to know that:

Each waveform segment is exposed as a simple Python object called Segment with a rich set of attributes and methods
Many segment attributes can also be used to perform flexible and complex selections (nearly impossible with a classical file system storage) thanks also to a simplified syntax using simple string expressions

Attributes and methods

The list of all Segment attributes and methods can be printed with few lines of code in your Notebook as shown below (for a list of methods, scroll to the bottom of the table). Included in the list are also attributes of the so-called related objects (e.g. segment.event, segment.station, segment.channel). Attributes are "selectable" in that they can be used to perform powerful cusom selections (see next section).

from stream2segment.process import get_segment_help
from IPython.display import display, HTML
display(HTML(get_segment_help(format='html')))

Selectable attributes	Type and optional description
id	int: segment (unique) db id
has_data	bool: if the segment waveform data is not empty, i.e. it has at least 1 byte of data saved. This parameter or `has_valid_data` are often necessary in segment selection, e.g.: has_data: 'true' Empty segments are those whose server did not return any data and are stored anyway for collecting stats and allow to customize what should be re-downloaded in further attempts
has_valid_data	bool: if the segment waveform data is not empty and could be successfully read as miniSEED during download. Often necessary in segment selection, e.g.: has_valid_data: 'true'
event_distance_deg	float: distance between the segment station and the event, in degrees
event_distance_km	float: distance between the segment station and the event, in km, assuming a perfectly spherical earth with a radius of 6371 km
start_time	datetime.datetime: waveform start time
arrival_time	datetime.datetime: waveform arrival time (value between 'start_time' and 'end_time')
end_time	datetime.datetime: waveform end time
request_start	datetime.datetime: waveform requested start time
request_end	datetime.datetime: waveform requested end time
duration_sec	float: waveform data duration, in seconds
missing_data_sec	float: number of seconds of missing data, as ratio of the requested time window. It might also be negative (more data received than requested). Useful in segment selection: e.g., if we requested 5 minutes of data and we want to process segments with at least 4 minutes of downloaded data, then: missing_data_sec: '< 60'
missing_data_ratio	float: portion of missing data, as ratio of the requested time window. It might also be negative (more data received than requested). Useful in segment selection: e.g., to process segments whose time window is at least 90% of the requested one: missing_data_ratio: '< 0.1'
sample_rate	float: waveform sample rate. It might differ from the segment channel sample_rate
maxgap_numsamples	float: maximum gap/overlap (G/O) found in the waveform, in number of points. If 0: segment has no G/O >=1: segment has gaps <=-1: segment has overlaps. Values in (-1, 1) are difficult to interpret: a rule of thumb is to consider no G/O if values are within -0.5 and 0.5. Useful in segment selection: e.g., to process segments with no gaps/overlaps: maxgap_numsamples: '(-0.5, 0.5)'
seed_id	str: the seed identifier in the typical format [Network].[Station].[Location].[Channel]. For segments with waveform data, `data_seed_id` (see below) might be faster to fetch.
data_seed_id	str: same as 'segment.seed_id', but faster to get because it reads the value stored in the waveform data. The drawback is that this value is null for segments with no waveform data
classlabels_count	int: the number of class labels assigned to this segment
data	bytes: the waveform (raw) data. Used by `segment.stream()`
queryauth	bool: if the segment download required authentication (data is restricted)
event.id	int
event.event_id	str: the id returned by the web service or catalog
event.time	datetime.datetime
event.latitude	float
event.longitude	float
event.depth_km	float
event.author	str
event.catalog	str
event.contributor	str
event.contributor_id	str
event.mag_type	str
event.magnitude	float
event.mag_author	str
event.event_location_name	str
event.event_type	str: the event type (e.g. "earthquake")
channel.id	int
channel.location	str
channel.channel	str
channel.depth	float
channel.azimuth	float
channel.dip	float
channel.sensor_description	str
channel.scale	float
channel.scale_freq	float
channel.scale_units	str
channel.sample_rate	float
channel.band_code	str: the first letter of channel.channel
channel.instrument_code	str: the second letter of channel.channel
channel.orientation_code	str: the third letter of channel.channel
channel.band_instrument_code	str: the first two letters of channel.channel
station.id	int
station.network	str: the station's network code, e.g. 'AZ'
station.station	str: the station code, e.g. 'NHZR'
station.netsta_code	str: the network + station code, concatenated with the dot, e.g.: 'AZ.NHZR'
station.latitude	float
station.longitude	float
station.elevation	float
station.site_name	str
station.start_time	datetime.datetime
station.end_time	datetime.datetime
station.has_inventory	bool: tells if the segment's station inventory has data saved (at least one byte of data). Useful in segment selection. E.g., to process only segments with inventory downloaded: station.has_inventory: 'true'
station.datacenter	object (same as segment.datacenter, see below)
datacenter.id	int
datacenter.station_url	str
datacenter.dataselect_url	str
datacenter.organization_name	str
download.id	int
download.run_time	datetime.datetime
classes.id	int: the id(s) of the class labels assigned to the segment
classes.label	int: the unique name(s) of the class labels assigned to the segment
classes.description	int: the description(s) of the class labels assigned to the segment
Standard attributes or methods	Description
stream(reload=False)	Return the ObsPy Stream object representing the segment waveform data Parameter reload: bool. Optional (default: False). Force reloading the Stream object from the downloaded waveform data (bytes sequence), discarding any ObsPy in-place operation that might have modified the Stream
inventory(reload=False)	Return the inventory of the segment Station as ObsPy Response object Parameter reload: bool. Optional (default: False). Force reloading the Response object from the downloaded waveform data (bytes sequence). In most cases you can ignore this parameter as a Response object is usually never modified but used as read-only object
url	Return the full URL that can be used to (re)download the Segment waveform data in miniSEED format (For details, see GET request here: https://www.fdsn.org/webservices/fdsnws-dataselect-1.1.pdf)
sds_path(root='.')	Return a string representing the SeisComP data structure which can be used as path to store the segment miniSEED: `root/EID/Year/NET/STA/CHAN.D/NET.STA.LOC.CHAN.TYPE.YEAR.DAY` where `root` is the optional argument, EID is the database unique id of the event (integer), and all other fields are defined here: https://www.seiscomp.de/seiscomp3/doc/applications/slarchive/SDS.html Parameter root: Optional (defaults to '.' when missing). The root path of this segment file (first argument of `os.path.join`)
dbsession	Return the database session to which this object is attached. Use with care: the session is for advanced users who need full freedom to interact with the database. For an introduction, see: https://docs.sqlalchemy.org/en/latest/orm/session.html
add_classlabel(*class_ids_or_labels, commit=True, empty_first=False, annotator=None)	Add class label(s) to this segment Parameter class_ids_or_labels: variable-length argument of the unique IDs (int) or labels (str) of the classes to be added to this segment. Classes already assigned to this segment will be ignored, as well as IDs ot labels not matching any database class Parameter commit: boolean (default: True) denoting if any change should be saved to the database (flush pending changes and commit the current transaction). Advanced users can set this parameter to False to manage the transaction manually and eventually call `segment.dbsession.commit()` when needed Parameter annotator: (str, default: None). The annotator assigning the labelling. A None annotator should mean that the label assignment is the result of a classifier prediction and not human inspection: providing an annotator (not None) will set the `is_hand_labelled` property of the Class labelling to True Parameter empty_first: boolean (default False) telling if all existing class labels associated to this segment should be removed first, before adding new class labels Raises :class:`sqlalchemy.exc.SQLAlchemy` if a commit error occurs. For info see: https://docs.sqlalchemy.org/en/latest/orm/session_basics.html
del_classlabel(*class_ids_or_labels, commit=True)	Delete class labels previously associated to this segment Parameter class_ids_or_labels: variable-length argument of the unique IDs (int) or labels (str) of the classes to be removed from this segment. When NO ids or labels are provided, ALL class labels associated to this segment will be deleted. E.g.: `segment.del_classes()` Parameter commit: boolean (default: True) denoting if any change should be saved to the database (flush pending changes and commit the current transaction). Advanced users can set this parameter to False to manage the transaction manually and eventually call `segment.dbsession.commit()` when needed Raises :class:`sqlalchemy.exc.SQLAlchemy` if a commit error occurs. For info see: https://docs.sqlalchemy.org/en/latest/orm/session_basics.html
siblings(*matching_attributes, include_self=False)	Return an iterable of all Segment objects that are equal to this segment in the given matching attribute(s). By default, these are the segments recording the same event on the other channel components / orientations. E.g., given a segment object `seg`, `seg.siblings()` is equivalent to: `seg.siblings( 'event.id' 'station.id', 'channel.location', 'channel.band_instrument_code' )` (`channel.band_instrument_code` is the channel code without the last letter denoting the channel orientation). In general, any segment selectable attribute can be given. For instance, to get all segments from the same event, channel, station or network: `seg.siblings('event.id') seg.siblings('channel.id') seg.siblings('station.id') seg.siblings('station.network')` Note that a station closed and reopened with a different start time is not considered the same. To get all segments of the same station identified only by its network and station code, you have two options: `seg.siblings('station.network', 'station.station') seg.siblings('station.netsta_code')` If multiple attributes are given, they will be concatenated with a logical "and", e.g.., to get all segments from the same seismic event and recorded by the same instrument (e.g., if `seg` is an accelerometer, yield all accelerometers recording the same event of `seg`): `seg.siblings('channel.instrument_code', 'event.id')` Note: The returned iterable is technically a SQLAlchemy Query object that can be customized by advanced users (for further details, see https://docs.sqlalchemy.org/en/latest/orm/tutorial.html#querying). For instance, in case of huge collections, consider loading only the desired attributes, e.g.: `from stream2segment.process import Segment from sqlalchemy.orm import load_only for seg in seg.siblings('event.id').options(load_only(Segment.id))): seg_id = seg.id ...` Parameter matching_attributes: variable-length argument of strings denoting the attributes used to find a match: a segment will be yielded if it equals this segment in all matching attributes. Attributes of related objects should be typed with the dot as separator, e.g. 'station.id', 'event.magnitude'. When empty, the matching attributes will default to the tuple: 'station.id', 'channel.location', 'channel.band_instrument_code', 'event.id'. Parameter include_self: boolean (default: False). Whether to include this segment among the yielded siblings

Segments selection

The selection of suitable segments for processing (e.g., functions imap or process, or YAML configuration file for the command s2s process) is performed by creating a dict mapping one or more Segment attributes to a selection expression for that attribute:

Syntax

segments_selection: {
  "[attribute]": "[expression]",
  "[attribute]": "[expression]",
  ...
}

Or, in YAML syntax (if you are implementing your own processing config):

segments_selection: 
  [attribute]: "[expression]"
  [attribute]: "[expression]"
  ...

(the variable name segments_selection is mandatory only in a YAML configuration file to be passed to the command s2s process ...)

Examples:

Not all segments on the database have waveform data. This might happen for several reason (server error, connection error and so on). As such, in most cases you might probably want to work with segments with waveform data ("has_data" : "true") or, to be even more strict, wit valid data:

{"has_valid_data": "true"}

To select and work on segments of stations activated in 2017 only:

{"station.start_time": "[2017-01-01, 2018-01-01T00:00:00)"}

(brackets denote intervals. Square brackets include end-points, round brackets exclude endpoints)

To select segments from specified ids, e.g. 1, 4, 342, 67 (e.g., ids which raised errors during a previous run and whose id where logged might need inspection in the GUI): segment_select:

{"id": "1 4 342 67"}

To select segments whose event magnitude is greater than 4.2:

{"event.magnitude": ">4.2"}

(the same way work the operators: =, >=, <=, <, !=)

To select segments with a particular channel sensor description:

{"channel.sensor_description": "'GURALP CMG-40T-30S'"}

(note: for attributes with str values and spaces, we need to quote twice, as otherwise "GURALP CMG-40T-30S" would match 'GURALP' and 'CMG-40T-30S', but not the whole string. See list of selectable attributes)

This program is released under the GNU GENERAL PUBLIC LICENSE Version 3

Citations:

Research article: Riccardo Zaccarelli, Dino Bindi, Angelo Strollo, Javier Quinteros and Fabrice Cotton. Stream2segment: An Open‐Source Tool for Downloading, Processing, and Visualizing Massive Event‐Based Seismic Waveform Datasets. Seismological Research Letters (2019) https://doi.org/10.1785/0220180314
Software: Zaccarelli, Riccardo (2018): Stream2segment: a tool to download, process and visualize event-based seismic waveform data. V. 2.7.3. GFZ Data Services. http://doi.org/10.5880/GFZ.2.4.2019.002

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The Segment object

Table of contents

Attributes and methods

Segments selection

Syntax

Citations:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally