-
Notifications
You must be signed in to change notification settings - Fork 3
ServicesExplained
The Metadata Service Toolkit (MST) is a platform that inputs a set of records (ie repository) and outputs another set of records (ie repository). A record is an xml document of a specific type. A metadata service is the process by which 1 input record produces 0..N output records of the same or a different type. The protocol used to pull records into a service is oai-pmh. All services process records one-by-one sequentially. So oai-pmh might get 5,000 records at a time, but the service still only processes one at a time. At the end of processing an incoming record (xml document) the service will have decided whether to add any output records to its repository. The MST platform handles all of the common functionality involved in this process so that individual services can focus entirely on processing and outputting records in a way unique to the service.
PLEASE NOTE!!!:
The MST is currently designed to process Services in the following order:
- MARC Normalization
- Aggregation Service
- MARC-XC Transformation
DL - I propose that you make all of the arrows go in the direction of record movement, rather than "request direction" So I would reverse the arrow heads on both red and yellow of the arrows in the picture.
BA - That’s how I originally had it way back when I created the presentation for code4lib. At Dave’s request I changed it to the way it is now. I can see it both ways.
The MST uses MySQL (a popular and fast open source relational database) to store records in repositories. It loads some of these tables into memory prior to harvesting and service processing to allow for higher throughput. Many of these in-memory data structures use trove (a high performance collections library for java). Some tables are, however, queried in realtime based on the estimated frequency of such queries. For example, the main focus of optimization is on the initial loading of data. Since the goal is to be able to process records at about a pace of 1ms/record, even a fast db query would be considered significant.
The top section of this diagram is the rough equivalent of a class diagram and the bottom part is an ERD. These diagrams aren't exhaustive, but give you a good idea of how the platform works and how a service implementer can make use of the data provided by the mst-platform. A more exhaustive list can be found in the actual sql files.
-
In-Memory Stuctures
-
oai_id_2_record_id map
- description: This cache determines if this particular record has previously been processed by the service. It also keeps track of what the record's status was at that point. An instance is a class of type DynMap. This class allows for both alpha and numeric identifiers. If the identifiers are numeric, it takes up considerably less memory. This is the purpose of the harvest.redundantToken property in the properties file. The value of this property is a comma separated list of redundant characters to strip out of an oai-id. That functionality is handled in the Util.getNonRedundantOaiId method. Perhaps in a future this method could replace the redundant portion with a numeric instead of just swapping it. That way uniqueness could be preserved across multiple repositories.
-
previous statuses map
- description: This cache determines if this particular record has previously been processed by the service. It also keeps track of what the record's status was at that point.
-
oai_id_2_record_id map
|
|
- Downloads
- Installing the Toolkit
- Hardware Requirements
- Installing 3rd Party Tools
- Installing the Metadata Services Toolkit
- Configuring
- Starting the MST
- Uninstalling and Reinstalling the MST
- Upgrading the MST
- Useful Info
- Using the Toolkit
- Services
- What is a service?
- What are Configuration 1 and Configuration 2?
- XC MARCXML Normalization
- MARCXML to XC Transformation
- DC to XC Transformation
- MARC Aggregation
- Multiple Instances of the Same Service
- Harvesting from an MST Service
- How To Implement a Service
- About the XC Schema
- MST Frequently Asked Questions
-
Performance Results
- RecordBreakdown
- MySQLCustomizations
- Release Notes
- Next Coding Period Summary
- Glossary
- Developer ScratchPad
- ServerChart
- Transformation 1.0
- TransformationDocumentationNotes
- new
- TransformationDocumentation
- old
- AdditionalWorksAndExpressions
- Transformation Service Documentation
- TransformationServiceSteps
- XcRoleTranslationTable
- AggregationServices
- MarcAggregation
- TransformationTwoPointOh
- old
- FirstIteration
- PriorDesign
- PackagingMST
- 1.0 Decisions
- ReleaseWork
- QuickInstallNotes
- MST Implementation Details
- OaiIdIndexAlgorithm
- CacheDetails
- MessageHandling
- ServiceTests
- ProcessingStepsExplained
- ResumptionToken->completeListSize
- UpdateDelete
- OaiPmhImpl
- record counts
- RecordCountsOnePtTwoPtOne
- in production
- how to log and display
- RecordCountsOnePtZero
- RecordCountTestRestarted
- UrRecordCounts
- RecordCountTesting
- TransformationWackiness
- OaiImplementation
- Testing
- randys-30
- RegressionTests
- QuickRef
- UnicodeNormalization
- LoggingHelp
- CodeFormatPolicy
- SvnBranchingStrategy
- MultipleEclipseWorkspaces
- DeleteReaddServiceForRetest
- FileHarvests
- CharsetEncodingWithEric
- DrupalSolrOptimization
- WorkPlan
- MetricsForAssessment
- IdeasForImprovement
- RandomNotes
-
Wiki en español
- Servicios
- Qué es un Servicio de Metadatos?
- Servicio de Normalización XC MARCXML
- Servicio de Transformación MARCXML a Esquema XC
- Servicio de Agregación MARC
- Servicios

