-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for external Data Mapping File #182
Comments
@clepple , @balooloo , @zykh : specs pushed to the wiki: |
Arno asked me to look for a small JSON parser library, and "jsmn" seems like a good fit for this use-case. It is small, so might be embedded into the NUT codebase rather than expected as an OS package (or as many projects do, pick the specific or newest variant between OS package and private copy during the "./configure" phase), and it is MIT-licensed. And extremely fast for both small and large documents, compared to others. To start with, I was suggested to look at the detailed Russian-language blog from Lev Walkin (aka lionet) at http://lionet.livejournal.com/118853.html on C JSON parsers. Note that it is from September 2012, so the projects reviewed probably have moved on and improved since then. In the first pass the author goes over a dozen parsers and projects suggested on http://json.org website. Apparently he looked carefully into the code and related files of each project, since he makes an opinion about the coding quality and test coverage (basic with valid data, invalid data, fuzz – none of these) and picks a few for the second pass – the practical comparison. Parsers that remained in queue for his testing are: yajl, json-c, json-parser, jsonsl, jsmn, jansson, cson. All or most of these had some critical comments and drawbacks, with some the blog author hesitated whether to test at all. Those that got into finals are MIT, BSD, ISC or public-domain licensed (none of the (L)GPL ones made it through pass 1). Ultimately, his test was a benchmark posted at http://github.com/vlm/vlm-json-test which:
The reviewed parsers, along with their licenses, notable features and performance results, are summarized in the table at the end of the post. The JSMN parser initially had some anomalic behavior based on size of input; after some coordination with its author this was revised and fixed (see http://lionet.livejournal.com/119755.html), and it is now consistently 2x faster than nearest competition among this bunch. Blogger’s notes about it include: Summary of other reviews of JSMN: • According to a back-and-forth discussion in the second blog (on revision of JSMN inconsistent results), it seems better suitable for embedded projects with a predefined schema, sacrificing convenience (e.g. iteration and counts of items) for projects with extensible data definitions in favor of performance which is best in the bigger class (other projects that other commenters had experience with). For some of the commenters the raw performance was a reason to migrate or consider migration onto this less programmer-friendly engine (or in their view less functional out-of-the-box). From the project’s own description this looks like a good choice, and I feel pleased that an outside benchmark proved that. I guess it takes some experimentation to see if its possible constraints are of any consequence for NUT. Since we care about approximately one predefined format in this case, we might care a bit less about universal applicability and extensibility. Possibly, some (additional?) tool to schema-validate the input markups might be useful to rule our broken files e.g. as part of a NUT driver startup script. While following more links on JSMN (looking if its low-level inconveniences as detected in blogs from a couple of years ago have already been wrapped into something higher-level), I found some examples that could help adopt it: • Blog http://alisdair.mcdiarmid.org/2012/08/14/jsmn-example.html and code https://github.com/alisdair/jsmn-example with examples |
As another option, if needed, there are also mixed reviews for cJSON available from http://sourceforge.net/projects/cjson/ (BSD-licensed). Some praise it for better functionality, portability and usability while still a compact project aimed at embedded platforms. The blog from Lev Walkin referenced earlier had this project discarded it (not usable for him two years ago) due to some memory leaks and other logic and/or coding issues that might have disappeared since. |
Wiki page URL seems to have changed to https://github.com/networkupstools/nut/wiki/Data-Mapping-File-(DMF) I think we need to also consider how this will be rolled out. JSON is certainly a well-known, flexible format-- but one that we do not currently parse in NUT (not counting the HCL, since the browser is doing the parsing on the user side). This will also require some thought about when things get parsed-- if usbhid-ups parses the JSON file every poll interval, users are going to be upset at the added CPU usage. (It should be sufficient to reconfigure on a SIGHUP, or even by killing and restarting the driver.) What about a parser with flex/bison? That only requires two additional well-tested dependencies on the developer side, and no additional runtime support. The resulting parser should be much more efficient than anything based on a generic format. If JSON ends up being the best choice, we should probably take the JSON conversion as an opportunity to expand the C struct arrays into key:value pairs. Positional tuples would be confusing for end users to edit, since the positions are almost never defined nearby: https://github.com/networkupstools/nut/wiki/Data-Mapping-File-(DMF)/d8c2c87b607792a8d78c5e2fe250f21ec2979daf#example-snmp-1 Proper parsing of the data files into fixed C structures would eliminate the key:value storage overhead after the file is parsed once. I see a DMF version number in the spec. I voiced some concerns about the NUT Network Protocol Version number, and the same concerns hold here. Using version numbers for anything other than informational purposes is being lazy about defining the specification, IMHO. In particular, you would need to define semantics for comparing version numbers. There is also the problem of assigning version numbers on branches. A much more descriptive approach is to define named capabilities, and have the DMF use something like The introduction states "As a corollary, holding these declarations in the driver binary implies that these drivers are growing (in size) very fast. So moving these information out of the driver binary will additionally improve the driver footprint and efficiency, which is desirable."-- I agree the drivers are growing, but adding a JSON interpreter will make them grow as well, and we don't have any overall efficiency numbers to know what the break-even point is. (I have not fully read Jim's JSON library notes, but a fair comparison would involve NUT 2.current versus NUT+JSON, and we don't have that information yet.) I would also argue against another one of the points in the introduction: recompiling drivers may be painful for users, but it also helps make the NUT developers aware of changes in hardware. If users can just tweak a file and go on with life, that solves their problem, but ensures that the next person with that hardware will have to go through the same process. I think our time would be better spent making it easier for users to recompile their distribution's packages. In a similar vein, we need a way to know what is going on when users submit bug reports with a data-file-defined driver. At the very least, a flag to say "this file was different than the one shipped with NUT", or we could do something more clever with CRCs or hashes. Also, since the spec raises the idea of data files under licenses other than NUT's license, the data files should have a license field that is displayed prominently in the NUT variable list. I want to be able to play the Linux kernel card, and if I am busy with other work, immediately bail out if I see a proprietary license in a bug report. (Hopefully the other licenses will be somewhat open-source in nature.) On the subject of licenses: extensions that link directly to NUT will be GPLv2+ or compatible, by definition. I hope there is no misunderstanding on that point. I think we have identified a few cases where a simple language to scale and shift values might be useful (and flex/bison could be used to parse that language, regardless of the rest of the data file format), but I would not want to unconditionally pull in all of Python or even Lua to get that capability. On an embedded platform, I would want the ability to compile without complex interpreters at runtime, and still have a functional driver. A formatting aside: GitHub's wikis are too narrow for paragraph-style text in tables. Please consider reformatting the table in this section: https://github.com/networkupstools/nut/wiki/Data-Mapping-File-(DMF)/d8c2c87b607792a8d78c5e2fe250f21ec2979daf#general-format-specification (the other tables are much more readable due to their short line lengths) |
On Mar 5, 2015, at 9:12 AM, Jim Klimov wrote:
Jim, if we were to use JSON, I think we would be best served by a generic JSON tokenizer, with a parser that builds NUT-specific C structures. After all, we can afford a little extra startup time to optimize driver performance in the main event loop. I think that either the HID or the SNMP driver would take a report/packet from the device, split it into components, and use the DMF to look up each of those components. Having to search a DOM-style tree for each device event would be slower. So something between JSMN and a DOM-style JSON parser would probably be best. |
Charles, I think your last comment coincides with my gut-feeling, that since with this DMF quest we have a fixed format (as opposed to "generic parsing of anything") and essentially an application-defined schema (some C structures into which that external data would be ultimately mapped for efficient programmatic consumption), a fast tokenizer and custom parser of those tokens into structures is an adequate choice. That parser might also take care of conversions (escape-chars or Unicode chars in strings, various number representations, etc.) and input-validation since it "knows" by virtue of coding what data types we expect to receive. As for the previous comment - when/how the DMF file should be re-read... I think SIGHUP is a good option, as well as checking the file timestamp with possible re-parsing upon each poll as well (inode metadata from recent reads or changes should be cached, so these checks likely won't end up as slow disk IOs) somewhat like crontab or milter-greylist do it. Possibly, subscribing for inode_notify or equivalent on those OSes and FSes that support the feature is also a way to do this efficiently. For reusability, testability and atomic switch-overs, it makes sense to store such structures and related data in instances of context. So when we have a new file, we create a new context, parse the file and fill out the structures, and when it all completes (and if it completes successfully) we near-instantly change the "current" context pointer onto the newly spawned one (and/or pass the pointer to context to the interested routines), and somehow schedule the obsolete context for de-allocation (usage counter reaches zero, if threading is involved?). This is better than taking nontrivial time to overwrite live configuration data, causing some conflicts while we parse stuff, and then finding that the input was invalid ;) |
Interesting, I hadn't considered that.
Threads? That's a dirty word around here ;-)
Again, that is certainly a good point, but it is almost an argument against having the first implementation handle reloading of DMF files. I think it would be best to make the driver parse the file cleanly at startup (using a single context to test that code), and make sure that all of the memory allocation is handled properly before adding the dynamic reloading feature. Users are also going to need notification of parse errors, and driver startup is probably the best time to do that (although many distribution init scripts suppress the output of upsdrvctl, but that's another story). |
@clepple , @jimklimov : indeed, the aim is to parse at startup time the DMF, and push the content into our current C structs, possibly augmented for smarter behavior. That wasn't that obvious reading back the Implementation Notes chapter, but that's the intent of the provided Mapping tables (i.e. make it easy to load into these C structs). @clepple : fully agreeing with the initial work on loading, prior to any attempt at reloading. |
@clepple : thanks for your excellent and constructive comments, as usual :) On DMF version number: you're right, and this is again something I neglected. Taking the point... On the fast growing size and improving the driver footprint and efficiency: this is something we've started to see boldly with snmp-ups, not so much for usbhid-ups. However, we expect to add more SNMP mapping. And we've already reached a limit for some time. Not much performances yet, but the footprint is going way beyond what a NUT driver is supposed to be. Although I agree that performance should be the topic of a benchmark, this was not directly the point. I'll be happy to see some however :) On the DMF modification and reporting: right, and this is still a pain point. That's a balance between allowing users to fix issues, and ensuring that it can benefit to the wider Community. CRCs / hashes checks along with mechanisms to easily submit modifications and new DMFs is something that we'll have to check for. On the DMF licensing:
On the simple language to extend the data processing / publication: On the formatting aside: fully agree and scheduled... for a bit later. It took me too long to translate (manually... hem) from Confluence to GH quickly. I'm investigating automated ways for the next iteration and the next specs to be pushed... |
In regard to lexers (yacc/bison/flex/...) as well as switchable-context Typos courtesy of GMail on my Samsung Android
|
On Mar 5, 2015, at 12:11 PM, Arnaud Quette wrote:
Proprietary extensions need to be combined at arms-length, like the GIMP external program plugin approach, to not fall under the scope of the NUT license: http://www.gnu.org/licenses/gpl-faq.html#MereAggregation |
I see an added value to consider DMF files with a layering capability in order to be able to override configuration.
Like that, I propose to have a layered config to be able to override DMF at many levels:
In such overriding, user could be able to override a value, add a new entry mapped to an existing nut namespace variable or to a new variable out of nut namespace for specific usage. Remove an entry. Probably the ability to override DMF for all devices (driver level) or one specific device (instance level) could be nice and useful. For licencing (or vanilla) DMF problem, we can verify that distribution-wide files ave not been modified, and warn the user that its configuration overrides distribution-wide config. As soon as we split-out driver data definitions out from driver source code, we have the problem of reporting modifications in device specifications. But I think we can mitigate this risk. Indeed, one of the most added value of NUT is it is widely distributed with these specifications. In order to be able to use nut in an almost standard system, you have to use standard DMF. If you do not, you must repackage it in a non-community non-standard package, which will not be widely distributed. So IMHO vendors will shared their modifications on DMF (or something to help nut community to modify them). A way to technically address this layering in an optimized way can be, at the first read need, to parse all DMF files and to generate an internal optimized hierarchical structure annotated with CRC and DMF source file modification date, dump it in a dedicated file then using it. When nut need to read them again, we can read the optimized file and compare dates instead of read DMF files and process them again. The two advantages are to be able to use an optimized structure from an computer-readable file with all added and preprocessed necessary information and easily verify if some DMF have been changed. |
Another point I want to discuss here and now is for long term: driver state.
To be able to achieve that, a way can be to add a categorization to each data in the form of an additional property specifying groups or tags. Tag (or group) names could be globally defined (global nut semantic and behavior) or specifically defined (for the needs of the driver). |
Just a few observations:
Plus, altough I can see the advantages, I confess I'm too a bit scared about the consequences of the added 'freedom' given to the user with the adoption of such a thing. |
Update: I'm working on a v1.1 (considering the initial proposal as v1). |
FYI, our internal team did work on this area up to the point that "it works for us" (in one target OS and by chance a few others). In the end we chose an XML format parsed by LibNEON which is already in NUT dependencies. For LTDL-capable systems, the library is only loaded when we parse the external files (into same structures as were provided statically) and unloaded afterwards, so runtime memory overhead should be negligible. Even better, you can load only one or few mapping tables rather than having them all built in and just one used for a device as is the case with current upstream snmp-ups driver. There were a few PRs posted to show our progress, but the most comprehensive one to date is PR #305 (and the preliminary accepted bits accumulate in upstream nut/DMF branch). The internal project's focus did not include compatibility with all OSes NUT might run on, so I volunteered to get at least 'make distcheck' pass on the NUT diverse buildbots farm (whether building these DMF-related bits or refusing to do so at the configure stage - but not crashing during 'make' phase) during my spare time. After this foundation is merged to at least nut/DMF, it might be the community members' burden to tweak the autoconfig and/or make-recipes to their environments not covered by buildbots. Or it might "just work" for them ;) There are also a few things I wanted to refactor for this codebase to be more future-proof, but that might get my timeslots later, perhaps in some other PR. |
NUT drivers (SNMP, XML/PDC, USB/HID, ...) hold hard-coded textual data, that maps NUT variables to the protocols-native ones. For example, each interesting SNMP OID (data) as an entry (line) in the snmp-ups driver that tells it how to retrieve and process this data:
https://github.com/networkupstools/nut/blob/master/drivers/eaton-mib.c#L253
https://github.com/networkupstools/nut/blob/master/drivers/apc-mib.c#L153
https://github.com/networkupstools/nut/blob/master/drivers/mge-xml.c#L903
These data should be moved out the drivers source code, into some text files, so that these data can be created and edited without the need to recompile NUT.
A specification is under writing to describe the Data Mapping File (DMF / .dmf):
https://github.com/networkupstools/nut/wiki/Data-Mapping-File-(DMF)
The text was updated successfully, but these errors were encountered: