-
Notifications
You must be signed in to change notification settings - Fork 8
The Segment object
Stream2segment offers a simple interface to interact with the database. No need to dig into the complexity of Object-Relational Mapping or SQL syntax, the user just needs to know that:
-
Each waveform segment is exposed as a simple Python object called
Segment
with a rich set of attributes and methods -
Many segment attributes can also be used to perform flexible and complex selections (nearly impossible with a classical file system storage) thanks also to a simplified syntax using simple string expressions
The list of all Segment attributes and methods can be printed with few lines of code in your Notebook as shown below (for a list of methods, scroll to the bottom of the table). Included in the list are also attributes of the so-called related objects (e.g. segment.event
, segment.station
, segment.channel
). Attributes are "selectable" in that they can be used to perform powerful cusom selections (see next section).
from stream2segment.process import get_segment_help
from IPython.display import display, HTML
display(HTML(get_segment_help(format='html')))
Selectable attributes | Type and optional description |
id | int: segment (unique) db id |
has_data | bool: if the segment waveform data is not empty, i.e. it has at least 1 byte of data saved. This parameter or has_valid_data are often necessary in segment selection, e.g.: has_data: 'true' Empty segments are those whose server did not return any data and are stored anyway for collecting stats and allow to customize what should be re-downloaded in further attempts |
has_valid_data | bool: if the segment waveform data is not empty and could be successfully read as miniSEED during download. Often necessary in segment selection, e.g.: has_valid_data: 'true' |
event_distance_deg | float: distance between the segment station and the event, in degrees |
event_distance_km | float: distance between the segment station and the event, in km, assuming a perfectly spherical earth with a radius of 6371 km |
start_time | datetime.datetime: waveform start time |
arrival_time | datetime.datetime: waveform arrival time (value between 'start_time' and 'end_time') |
end_time | datetime.datetime: waveform end time |
request_start | datetime.datetime: waveform requested start time |
request_end | datetime.datetime: waveform requested end time |
duration_sec | float: waveform data duration, in seconds |
missing_data_sec | float: number of seconds of missing data, as ratio of the requested time window. It might also be negative (more data received than requested). Useful in segment selection: e.g., if we requested 5 minutes of data and we want to process segments with at least 4 minutes of downloaded data, then: missing_data_sec: '< 60' |
missing_data_ratio | float: portion of missing data, as ratio of the requested time window. It might also be negative (more data received than requested). Useful in segment selection: e.g., to process segments whose time window is at least 90% of the requested one: missing_data_ratio: '< 0.1' |
sample_rate | float: waveform sample rate. It might differ from the segment channel sample_rate |
maxgap_numsamples | float: maximum gap/overlap (G/O) found in the waveform, in number of points. If 0: segment has no G/O >=1: segment has gaps <=-1: segment has overlaps. Values in (-1, 1) are difficult to interpret: a rule of thumb is to consider no G/O if values are within -0.5 and 0.5. Useful in segment selection: e.g., to process segments with no gaps/overlaps: maxgap_numsamples: '(-0.5, 0.5)' |
seed_id | str: the seed identifier in the typical format [Network].[Station].[Location].[Channel]. For segments with waveform data, data_seed_id (see below) might be faster to fetch. |
data_seed_id | str: same as 'segment.seed_id', but faster to get because it reads the value stored in the waveform data. The drawback is that this value is null for segments with no waveform data |
classlabels_count | int: the number of class labels assigned to this segment |
data | bytes: the waveform (raw) data. Used by segment.stream()
|
queryauth | bool: if the segment download required authentication (data is restricted) |
event.id | int |
event.event_id | str: the id returned by the web service or catalog |
event.time | datetime.datetime |
event.latitude | float |
event.longitude | float |
event.depth_km | float |
event.author | str |
event.catalog | str |
event.contributor | str |
event.contributor_id | str |
event.mag_type | str |
event.magnitude | float |
event.mag_author | str |
event.event_location_name | str |
event.event_type | str: the event type (e.g. "earthquake") |
channel.id | int |
channel.location | str |
channel.channel | str |
channel.depth | float |
channel.azimuth | float |
channel.dip | float |
channel.sensor_description | str |
channel.scale | float |
channel.scale_freq | float |
channel.scale_units | str |
channel.sample_rate | float |
channel.band_code | str: the first letter of channel.channel |
channel.instrument_code | str: the second letter of channel.channel |
channel.orientation_code | str: the third letter of channel.channel |
channel.band_instrument_code | str: the first two letters of channel.channel |
station.id | int |
station.network | str: the station's network code, e.g. 'AZ' |
station.station | str: the station code, e.g. 'NHZR' |
station.netsta_code | str: the network + station code, concatenated with the dot, e.g.: 'AZ.NHZR' |
station.latitude | float |
station.longitude | float |
station.elevation | float |
station.site_name | str |
station.start_time | datetime.datetime |
station.end_time | datetime.datetime |
station.has_inventory | bool: tells if the segment's station inventory has data saved (at least one byte of data). Useful in segment selection. E.g., to process only segments with inventory downloaded: station.has_inventory: 'true' |
station.datacenter | object (same as segment.datacenter, see below) |
datacenter.id | int |
datacenter.station_url | str |
datacenter.dataselect_url | str |
datacenter.organization_name | str |
download.id | int |
download.run_time | datetime.datetime |
classes.id | int: the id(s) of the class labels assigned to the segment |
classes.label | int: the unique name(s) of the class labels assigned to the segment |
classes.description | int: the description(s) of the class labels assigned to the segment |
Standard attributes or methods | Description |
stream(reload=False) | Return the ObsPy Stream object representing the segment waveform data Parameter reload: bool. Optional (default: False). Force reloading the Stream object from the downloaded waveform data (bytes sequence), discarding any ObsPy in-place operation that might have modified the Stream |
inventory(reload=False) | Return the inventory of the segment Station as ObsPy Response object Parameter reload: bool. Optional (default: False). Force reloading the Response object from the downloaded waveform data (bytes sequence). In most cases you can ignore this parameter as a Response object is usually never modified but used as read-only object |
url | Return the full URL that can be used to (re)download the Segment waveform data in miniSEED format (For details, see GET request here: https://www.fdsn.org/webservices/fdsnws-dataselect-1.1.pdf) |
sds_path(root='.') | Return a string representing the SeisComP data structure which can be used as path to store the segment miniSEED: root/EID/Year/NET/STA/CHAN.D/NET.STA.LOC.CHAN.TYPE.YEAR.DAY where root is the optional argument, EID is the database unique id ofthe event (integer), and all other fields are defined here: https://www.seiscomp.de/seiscomp3/doc/applications/slarchive/SDS.html Parameter root: Optional (defaults to '.' when missing). The root path of this segment file (first argument of os.path.join ) |
dbsession | Return the database session to which this object is attached. Use with care: the session is for advanced users who need full freedom to interact with the database. For an introduction, see: https://docs.sqlalchemy.org/en/latest/orm/session.html |
add_classlabel(*class_ids_or_labels, commit=True, empty_first=False, annotator=None) | Add class label(s) to this segment Parameter class_ids_or_labels: variable-length argument of the unique IDs (int) or labels (str) of the classes to be added to this segment. Classes already assigned to this segment will be ignored, as well as IDs ot labels not matching any database class Parameter commit: boolean (default: True) denoting if any change should be saved to the database (flush pending changes and commit the current transaction). Advanced users can set this parameter to False to manage the transaction manually and eventually call segment.dbsession.commit() when needed Parameter annotator: (str, default: None). The annotator assigning the labelling. A None annotator should mean that the label assignment is the result of a classifier prediction and not human inspection: providing an annotator (not None) will set the is_hand_labelled property of the Class labellingto True Parameter empty_first: boolean (default False) telling if all existing class labels associated to this segment should be removed first, before adding new class labels Raises :class: sqlalchemy.exc.SQLAlchemy if a commit error occurs.For info see: https://docs.sqlalchemy.org/en/latest/orm/session_basics.html |
del_classlabel(*class_ids_or_labels, commit=True) | Delete class labels previously associated to this segment Parameter class_ids_or_labels: variable-length argument of the unique IDs (int) or labels (str) of the classes to be removed from this segment. When NO ids or labels are provided, ALL class labels associated to this segment will be deleted. E.g.: segment.del_classes() Parameter commit: boolean (default: True) denoting if any change should be saved to the database (flush pending changes and commit the current transaction). Advanced users can set this parameter to False to manage the transaction manually and eventually call segment.dbsession.commit() when needed Raises :class: sqlalchemy.exc.SQLAlchemy if a commit error occurs.For info see: https://docs.sqlalchemy.org/en/latest/orm/session_basics.html |
siblings(*matching_attributes, include_self=False) | Return an iterable of all Segment objects that are equal to this segment in the given matching attribute(s). By default, these are the segments recording the same event on the other channel components / orientations. E.g., given a segment object seg , seg.siblings() is equivalent to:
channel.band_instrument_code is the channel code without the lastletter denoting the channel orientation). In general, any segment selectable attribute can be given. For instance, to get all segments from the same event, channel, station or network:
not considered the same. To get all segments of the same station identified only by its network and station code, you have two options:
logical "and", e.g.., to get all segments from the same seismic event and recorded by the same instrument (e.g., if seg is an accelerometer,yield all accelerometers recording the same event of seg ):
that can be customized by advanced users (for further details, see https://docs.sqlalchemy.org/en/latest/orm/tutorial.html#querying). For instance, in case of huge collections, consider loading only the desired attributes, e.g.:
the attributes used to find a match: a segment will be yielded if it equals this segment in all matching attributes. Attributes of related objects should be typed with the dot as separator, e.g. 'station.id', 'event.magnitude'. When empty, the matching attributes will default to the tuple: 'station.id', 'channel.location', 'channel.band_instrument_code', 'event.id'. Parameter include_self: boolean (default: False). Whether to include this segment among the yielded siblings |
The selection of suitable segments for processing (e.g., functions imap
or process
, or YAML configuration file for the command s2s process
) is performed by creating a dict
mapping one or more Segment attributes to a selection expression for that attribute:
segments_selection: {
"[attribute]": "[expression]",
"[attribute]": "[expression]",
...
}
Or, in YAML syntax (if you are implementing your own processing config):
segments_selection:
[attribute]: "[expression]"
[attribute]: "[expression]"
...
(the variable name segments_selection
is mandatory only in a YAML configuration file to be passed to the command s2s process ...
)
Examples:
- Not all segments on the database have waveform data. This might happen for several reason (server error, connection error and so on). As such, in most cases you might probably want to work with segments with waveform data (
"has_data" : "true"
) or, to be even more strict, wit valid data:
{"has_valid_data": "true"}
- To select and work on segments of stations activated in 2017 only:
{"station.start_time": "[2017-01-01, 2018-01-01T00:00:00)"}
(brackets denote intervals. Square brackets include end-points, round brackets exclude endpoints)
- To select segments from specified ids, e.g. 1, 4, 342, 67 (e.g., ids which raised errors during a previous run and whose id where logged might need inspection in the GUI): segment_select:
{"id": "1 4 342 67"}
- To select segments whose event magnitude is greater than 4.2:
{"event.magnitude": ">4.2"}
(the same way work the operators: =, >=, <=, <, !=)
- To select segments with a particular channel sensor description:
{"channel.sensor_description": "'GURALP CMG-40T-30S'"}
(note: for attributes with str values and spaces, we need to quote twice, as otherwise "GURALP CMG-40T-30S" would match 'GURALP' and 'CMG-40T-30S', but not the whole string. See list of selectable attributes)
This program is released under the GNU GENERAL PUBLIC LICENSE Version 3
-
Research article: Riccardo Zaccarelli, Dino Bindi, Angelo Strollo, Javier Quinteros and Fabrice Cotton. Stream2segment: An Open‐Source Tool for Downloading, Processing, and Visualizing Massive Event‐Based Seismic Waveform Datasets. Seismological Research Letters (2019) https://doi.org/10.1785/0220180314
-
Software: Zaccarelli, Riccardo (2018): Stream2segment: a tool to download, process and visualize event-based seismic waveform data. V. 2.7.3. GFZ Data Services. http://doi.org/10.5880/GFZ.2.4.2019.002