Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanding this WG to all Diagnostics for Node #46

Closed
joshgav opened this issue Apr 13, 2016 · 43 comments
Closed

Expanding this WG to all Diagnostics for Node #46

joshgav opened this issue Apr 13, 2016 · 43 comments

Comments

@joshgav
Copy link
Contributor

joshgav commented Apr 13, 2016

Hello all! I've talked to several of you over the past month about invigorating this working group and creating a greater Diagnostics team for Node. To be helpful, I wanted to come with a tentative plan and hammer out details with everyone.

To that end, following is a summary of the diagnostic domains which we can help Node cover and around which we can orient our work. I propose creating a wiki-like structure in this repo oriented around these domains and then using PRs to add new domains, interfaces and APIs, tools, samples and docs.

Note: "Post-mortem" topics are listed here under "Heap and Memory Analysis" for completeness; that work should continue in the nodejs/post-mortem team.

I also propose we restart our monthly meeting to discuss items from this list. In our next meeting, for example, we should probably discuss AsyncWrap issues and how to support the coming Chrome Debugging Protocol in V8.

Could you provide feedback on the proposed structure? If we have consensus I'll go ahead and work with @pmuellr to organize this repo accordingly.


Goals

  • Node surfaces a set of comprehensive, documented, extensible diagnostic protocols, formats, and APIs for use by tooling clients and VM vendors.
  • Enable many tool vendors to provide reliable diagnostic tools for Node.

Plans

  • Collect, understand, and document existing diagnostic capabilities and entry-points throughout Node, V8, and other components.
  • Collect and document projects and products providing diagnostics for Node with brief description of their technical architecture and sponsoring organizations.
  • Identify and explore opportunities and gaps, feature requests, and conflicts in existing diagnostics capabilities.

Domains

  • Tracing
  • Profiling
  • Heap and Memory Analysis
  • Step Debugging
  • Protocol and API

Tracing

  • nodejs/tracing-wg
  • System: dtrace, lttng, etw, systemtap
  • V8: TRACE_EVENT
    • Requires implementation of v8::Platform::AddTraceEvent
  • Node: AsyncWrap, Domains, Zones
  • JavaScript: Domains, Zones

Profiling

  • V8 CPU Profiler: v8::Isolate::GetCpuProfiler (include/v8-profiler.h)
  • V8 Heap Profiler: v8::Isolate::GetHeapProfiler (include/v8-profiler.h)
  • Intel vTune (./configure --enable-vtune-profiling)

Heap and Memory Analysis

Step Debugging

Protocol and API

@joshgav
Copy link
Contributor Author

joshgav commented Apr 13, 2016

/cc @pmuellr @trevnorris @ofrobots @yunong @nodejs/post-mortem @nodejs/tracing

@AndreasMadsen
Copy link
Member

Step Debugging, Protocol and API, Domains, Zones, Heap and obviously Memory Analysis is not something I recall being discussed in this working group. It may belong here, but the work is typically done by voluntary interest. If it has never been discussed there may not be enough resources to support those areas.

Your proposal appears to be very much driven by the planning phase of classic project management. You are creating work breakdown structures, defining activities, planning communication, etc.. I have no objections to project management, but I can also see that you don't have many open source contributions on your github page (apologies if I'm mistaken). In my experience classic project management is best suited for being in the (cost, time) part of the iron triangle (the barycentric coordinate system with (cost, quality, time) as dimensions, though the terms vary), where open source is typically in the (cost, quality) part. This is not to say that part of project management don't apply, but in my (maybe limited) experience one also has to take a very different approach.

PS: it looks like something went wrong in your cc'ing.
/cc @nodejs/tracing @nodejs/post-mortem

@Jeff-Lewis
Copy link
Contributor

IMHO Step Debugging, Protocol and API, Domains, Zones, Heap and Memory Analysis should be included or at least considered as an impact point. joshgav's contribution to the WG should be welcomed. As you know this WG is responsible for such additions to Node that will make it more mature and enterprise friendly.

@AndreasMadsen
Copy link
Member

@Jeff-Lewis If all this can be done it would be amazing. I'm simply concerned that it can't, given the resources.

@joshgav are you planning to create all these detailed documents yourself?

@joshgav
Copy link
Contributor Author

joshgav commented Apr 13, 2016

@AndreasMadsen I'd say my goal is to organize and facilitate diagnostics work for Node rather than manage it. Goals of such organization include discoverability for new contributors and users, transparency for contributors and tool makers, opportunity for those who have new ideas, and smooth communication between all groups and stakeholders. Execution will continue at its own pace, though I hope all this will build momentum and encourage more contributions.

And yes, I anticipate starting a lot of the work described myself over the coming weeks and months, but wanted to build general consensus as it begins. I have a nascent 80/20 theory for open source work - do 80% of the work then come to the community to make adjustments and finish the last 20%; accordingly, I view my proposal above as about 80% complete :)

@AndreasMadsen
Copy link
Member

@joshgav Thanks, then I have no more major concerns. But other members should comment on this too.

I have a nascent 80/20 theory for open source work - do 80% of the work then come to the community to make adjustments and finish the last 20%

This was also my belief a some time ago. However I have found it much more productive to do the work as best as I can and then let the community comment on how it can be done better. Maybe that is what you meant.

edit: But feel free to do whatever work you are comfortable with, this is just my experience.

@pmuellr
Copy link
Contributor

pmuellr commented Apr 13, 2016

@joshgav thanks for doing this!

I have to admit that I have a bit more interest / history / knowledge on some of the "additional" items you listed, esp debugging, heap and cpu profiling. I was kinda hoping to either expand this WG out a bit more, or maybe start a new one on some of those topics.

I don't have a sense for if these end up making the WG a little too "wide" or not - guessing they might. But for lack of a better place to put them, I think there's no problem using this WG, for now, to house some of those bits.

Let's get another meeting scheduled! I've been hoping to come out of "swamped mode" for the last couple of weeks, but .. no luck. Next week is different - should have some time to schedule it then - but don't wait for me if you want to get it going ...

@pmuellr
Copy link
Contributor

pmuellr commented Apr 14, 2016

I'd say my goal is to organize and facilitate diagnostics work for Node rather than manage it.

Any attempts at organization, and starting to structure bits, is great. I just barely had time to organize the calls, much less anything else!

I've certainly been burned over the years by projects which attempt to over-organize things from the get-go, and end up stifling contribution. Almost always better to start with a small number of artifacts, split them up organically, as needed, into more specialized artifacts.

@joshgav Wanna draw up a "picture" (ascii indented list or whatever fine) of our current structure, and a proposed structure, and we'll take that as the action item for this issue?

I have a doc, with not a ton of info it, but at least a basic start, with some node debug context info, I've shared with other folk - https://github.com/pmuellr/node-debug-context - assume I want that in this repo, where would it go?

@mhdawson
Copy link
Member

I'm + 1 on a putting together an overall picture of what we think is important. I think its a good way help people understand the areas that are being worked and/or the areas the community thinks are important but need more contributors to help out. @pmuellr can you include me in the doodle when we schedule the next meeting.

@pmuellr
Copy link
Contributor

pmuellr commented Apr 14, 2016

@mhdawson suggest you watch the repo, you'll see the next meeting scheduling exercise as a new issue ; I'll cc the team/group/whatever as well ...

@joshgav
Copy link
Contributor Author

joshgav commented Apr 14, 2016

Awesome, thanks @pmuellr. How about I submit a PR with structure, current content and some new content and we hammer on that? Give me a few days, I'll try to have something early next week.

Seems like the Debug Context info belongs under "Step Debugging", by which I mean stopping execution on breakpoints and events/exceptions, reflecting on current state, and jumping around (stepping) to other states. Actually, ever since I started looking into Node diagnostics I've been wondering what the AsyncTaskEvent and PromiseEvent V8 debug events are and if they'd be useful for tools :)

I'm happy to set up our next meeting, do we have an agenda? Perhaps I'll start a thread soliciting agenda items and good times.

@pmuellr
Copy link
Contributor

pmuellr commented Apr 15, 2016

Sounds good.

Re: agenda - typically we talk about AsyncWrap and v8 trace API - get status on where those are - and then anything else. I typically create a Google Doc - I think - to let people add agenda items. Might want to look at some of the artifacts from previous meetings.

@joshgav
Copy link
Contributor Author

joshgav commented Apr 18, 2016

Please see #47 for concrete implementation of expansion.

Working on meeting invite now.

@joshgav
Copy link
Contributor Author

joshgav commented May 9, 2016

Hi all - Quick update. I've tarried on scheduling a meeting in order to have more time to prepare code and components to generate discussion, such as a small trace subsystem I'm now working on. Also, now that @trevnorris has opened an EP for AsyncWrap, perhaps that wouldn't be on our agenda here.

If everyone wants a meeting now I'll put one together. Otherwise, I'll continue to work on prototypes and implementations to generate discussion, and continue filling in the docs in #47. We can then arrange a meeting as these mature.

What do you think? Thanks!

@pmuellr
Copy link
Contributor

pmuellr commented May 18, 2016

Sorry for the delayed response.

Let's go ahead and get a meeting together. Still want to catch up with the chrome/v8 tracing work, which I don't think we've heard about in months. And the work you're doing. :-)

@joshgav
Copy link
Contributor Author

joshgav commented May 19, 2016

@pmuellr

Let's go ahead and get a meeting together.

Done - #49. Talk to you soon!

@joshgav joshgav changed the title Diagnostics work for Node Expanding this WG to all Diagnostics for Node May 31, 2016
@joshgav
Copy link
Contributor Author

joshgav commented Jun 2, 2016

We ran out of time and weren't able to discuss expansion of scope of this WG to all diagnostics in yesterday's call (#49), so let's continue the discussion here (I'd suggest closing #51 and referring here too; @thealphanerd what do you think?).

It seems most discussions about Node and CDP or other diagnostics domains should take place through GitHub issues. It also seems it would be best to collect such discussions in one place rather than distributing them across repos.

There will of course be individuals or teams focused on specific domains, such as post-mortem, tracing, or CDP; but these can still all be housed in this same repo/WG, perhaps with GitHub labels. A unified Diagnostics WG/repo would facilitate discoverability (anyone will be able to easily find all diagnostics-related discussions related to Node) and cooperation (we'll all be more aware of what others are working on). It will also make management of the WG and coordination with [nodejs/node] easier.

The source code repo itself is mostly for documentation, samples, and prototypes, and we could divide it as proposed in #47 to include a subdir for each domain.

The last steps would be to change the name of the repo to 'diagnostics' and to integrate nodejs/post-mortem here.

Thoughts?

/cc @rvagg

@pmuellr
Copy link
Contributor

pmuellr commented Jun 2, 2016

+1

I think post-mortem ultimately belongs in "diagnostics", but if they're
making good headway currently, don't want to derail them with administrivia
...

On Thu, Jun 2, 2016 at 4:31 PM, Josh Gavant notifications@github.com
wrote:

We ran out of time and weren't able to discuss expansion of scope of this
WG to all diagnostics in yesterday's call (#49
#49), so let's continue the
discussion here (I'd suggest closing #51
#51 and referring here too;
@thealphanerd https://github.com/TheAlphaNerd what do you think?).

It seems most discussions about Node and CDP or other diagnostics domains
should take place through GitHub issues. It also seems it would be best to
collect such discussions in one place rather than distributing them across
repos.

There will of course be individuals or teams focused on specific domains,
such as post-mortem, tracing, or CDP; but these can still all be housed in
this same repo/WG, perhaps with GitHub labels. A unified Diagnostics
WG/repo would facilitate discoverability (anyone will be able to easily
find all diagnostics-related discussions related to Node) and
cooperation (we'll all be more aware of what others are working on). It
will also make management of the WG and coordination with [nodejs/node]
easier.

The source code repo itself is mostly for documentation, samples, and
prototypes, and we could divide it as proposed in #47
#47 to include a subdir for
each domain.

The last steps would be to change the name of the repo to 'diagnostics'
and to integrate nodejs/post-mortem
https://github.com/nodejs/post-mortem here.

Thoughts?

/cc @rvagg https://github.com/rvagg


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#46 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AABiHYZ5aEQUoJi89ikkXbvinUKY95rpks5qHz2cgaJpZM4IGAoq
.

Patrick Mueller
http://muellerware.org

@jclulow
Copy link

jclulow commented Jun 2, 2016

I'd rather not conflate the work we're doing in the Post Mortem group, which is a relatively specific area, with any of the other things you're talking about. The meetings we have are already long enough, and I currently only receive mail about PMWG-related things.

If you are after better "discoverability", I would suggest instead making sure that all of the working groups are well-described and categorised in some central part of the main Node repository. Presumably that's where people look first when they're exploring the Node ecosystem.

@pmuellr
Copy link
Contributor

pmuellr commented Jun 2, 2016

I think I could see things like "post mortem", "tracing", "debug" as being
sub-groups of "diagnostics", but as mentioned don't want to derail the
existing post mortem work, so ... let's leave that one alone for now.

Suggest we lump all things not already in some existing group, pertaining
to diagnostics, into this newly renamed diagnostics group. We may well see
some topics like "debug" having to branch out on their own or something.
But until that happens, we can try to foster that work here ...

On Thu, Jun 2, 2016 at 5:37 PM, Joshua M. Clulow notifications@github.com
wrote:

I'd rather not conflate the work we're doing in the Post Mortem group,
which is a relatively specific area, with any of the other things you're
talking about. The meetings we have are already long enough, and I
currently only receive mail about PMWG-related things.

If you are after better "discoverability", I would suggest instead making
sure that all of the working groups are well-described and categorised
in some central part of the main Node repository. Presumably that's where
people look first when they're exploring the Node ecosystem.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#46 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AABiHZOCpxz9EOe_mHuGBJ4xRG8KWay1ks5qH00vgaJpZM4IGAoq
.

Patrick Mueller
http://muellerware.org

@rvagg
Copy link
Member

rvagg commented Jun 3, 2016

Agreed, lets leave PM stuff out of this for now, it's the debugging story that's not covered and that would presumably be the primary reason to expand scope. PM can be picked up later in some form if there's a clear reason to do so, otherwise if they functioning fine on their own then let them at it.

The biggest challenge I see here is getting a broad enough spread of stakeholders involved, both consumers (tools) and producers (VMs). How do we do that?

@joshgav
Copy link
Contributor Author

joshgav commented Jun 7, 2016

@pmuellr, @AndreasMadsen, @Qard, @nodejs/CTC and other stakeholders here - does anyone disagree with expanding this group into a Diagnostics group and merging #47?

If all agree, @pmuellr can you merge that commit and add me to the WG/repo owners? @rvagg can you rename the repo and alias to nodejs/diagnostics?

Then we can pick up work on nodejs/node#7182 and other tasks; and start reviewing membership and recruiting more stakeholders.

@rvagg

getting a broad enough spread of stakeholders involved, both consumers (tools) and producers (VMs). How do we do that?

First we build it, then we recruit them to come. :)

@AndreasMadsen
Copy link
Member

I'm not sure this working group is a good place to discuss Domains and Zones. In my opinion they are more related to error handling and not tracing, monitoring or diagnostics which more about understanding what caused the error.

Besides that I have no objections.

On the PM front I would suggests that we separate ours meetings, such they don't get too long and to avoid the hangout limit problem. For example I don't think AsyncWrap and Chrome Debugging Protocol has that many stakeholders in common.

@AndreasMadsen
Copy link
Member

@rvagg besides adding @joshgav to @nodejs/tracing, you should know that @domenic, @lykkin, @matthewloring, @jeffolfert are on the README list but not in the @nodejs/tracing team.

@joshgav
Copy link
Contributor Author

joshgav commented Jun 7, 2016

@AndreasMadsen

I would suggest that we separate our meetings

I agree, would be good to designate meetings for specific topics, it's unhelpful to lump everything together. Let's think about how to most efficiently do that.

@joshgav
Copy link
Contributor Author

joshgav commented Jun 7, 2016

@AndreasMadsen so error handling (and Domains and Zones) should be managed by nodejs/node. Fine with me, I'll remove the line in #47 which mentions it. We can always add later if needed.

@AndreasMadsen
Copy link
Member

@joshgav fantastic

@Jeff-Lewis
Copy link
Contributor

tl;dr - Dear Node Foundation: Fine, move Domains and Zones again to another working group, but please don't kick its can down the road any more and put effort into providing a solution to Node's lack of execution context.

With the recent large uptick in discussions and involvement around debugging, moving Domains & Zones from the tracing group makes sense, but it does add to my fear of Node continuing to lack support for a good execution context. I was originally very exciting to see this working group formed primarily for interest in async-wrap and it's potential to provide the foundation for a future for Domains and Zones.

With all of the talk about node being ready for the Enterprise, I feel like I must be missing something. How do enterprises or SAAS web applications handle execution/security/user context across asynchronous boundaries with Node? Or don't they by just avoiding middle-tier middleware that often does not have an execution context? Domains are deprecated, StrongLoop is deprecating its `getCurrentContext()' and using cls has a very large set of technical debt that comes along with it.

One solution to providing context is to build context info into all of the applications API's. Well, then it's no longer 'context' and just additional args. This doesn't seem so bad and works well with micro-services architectures. Yet, here I feel developers are writing code to handle what other frameworks do natively and what experienced developers expect from an 'Enterprise' framework.

Is this a common evolutionary path of a Node.js developer?

I would guess that the majority of enterprise and non-enterprise developer's first (couple) node projects are NOT written for micro-services architecture and planning execution context into the user-land design but rather with Express, a Mean stack or StrongLoop, all of which fail to provide context beyond the request object. For those developers that do go down this path and needing a dependable execution context, they can be left disappointed with a load of technical dept.

It's also my assumption that many enterprise developers are coming to Node.js from other run-times such as .NET and Java where the execution context is so natural and part of the framework that it's taken for granted and assumed that Node can provide the same. (Well maybe I'm projecting too much now...)

Anyway, I see Node's lack of support for execution context as one that breaks the Principle of least astonishment. I feel it's the elephant in the room because it's deeply rooted into Node and V8 and most Node contributors don't want to touch Domains with a ten foot pole.

I know there are large technical challenges supporting context with the event loop, dependencies on V8, and new specs from ES coupled with organizational & management challenges with open source developers and vested organizations, but I hope that we continue to pursue it and finally commit to solving it. It's worth pushing for more collaboration with Google/V8 or what ever the Node Foundation feels is right. Please use their movement back to the Node group as opportunity to kick start its work. Maybe a new team around @trevnorris? Involvement from the two big corps with $ who want to see Node in the Enterprise? Whatever it is, let's please act on it and push it forward.

@rvagg
Copy link
Member

rvagg commented Jun 7, 2016

@Jeff-Lewis coming to Node from Java or .NET and expecting to find everything the way you want it to exist is the wrong way to migrate and inevitably leads to very poor programming practices via heavy reliance on accepted Node anti-patterns and usually complete failure because you end up trying to make Node applications look like their old style counterparts and doing a terrible job at it. We've seen more than enough examples of this to reinforce the recommendation that migration to Node also requires a migration of programming styles, both at the organisational level and the technical level and that this reorganisation comes with its own benefits as well (not least of which is breaking apart monolithic application building practices which Java and .NET lend themselves so well to).

"Context" is provided perfectly well via the tools we currently have available to us in Node. Common web frameworks such as Express and Hapi do a fine job of providing what's needed for doing this. Moving to continuation-passing simply means changing the way you think about context. There's nothing new that you need for this unless you're going to overengineer, in which case you're setting yourself up for failure from the begining.

Even if Zones become first-class in JavaScript, relying on them for these kinds of concerns is also going to lead to tears. It's likely that their use will best applied as an exception rather than a norm and leaning on them to do what can be achieved perfectly well using existing patterns is going to overcomplicate and give you with terrible enough performance that you'll be begging for a JVM in short order.

There are very good reasons why "most Node contributors don't want to touch Domains with a ten foot pole" and refusing to listen to any of the reasons behind this is not going to lead to a happy place. Hiding behind the pretense that "Enterprise needs this" doesn't help either, a significant number of Node contributors either work in Enterprise environments or make their living from dealing with Enterprise deployments of Node in some form. That kind of argument doesn't work as well as it might have a few years ago.

@jkrems
Copy link
Contributor

jkrems commented Jun 8, 2016

Hiding behind the pretense that "Enterprise needs this" doesn't help either, a significant number of Node contributors either work in Enterprise environments or make their living from dealing with Enterprise deployments of Node in some form. That kind of argument doesn't work as well as it might have a few years ago.

That might be a bit too dismissive. I think "Enterprise" is a meaningless phrase here because it catches a wide variety of organizations with little in common. But I'd definitely put us (Groupon) on the side of "heavily using something like context and very interested in seeing it as an official feature". Continuation passing is fine but the amount of support and trouble shooting we as a platform team had to provide to different teams working with node went down considerably after switching from continuation passing to context. We tried and for us what express offered was not enough.

I understand that it's a hard and potentially impossible problem. So I'm not saying "this should take priority over everything else". But I do believe that if this would be figured out, it would provide real value. At least to some parts of the community.

@joshgav
Copy link
Contributor Author

joshgav commented Jun 8, 2016

Hi all - this thread is to discuss and now confirm expansion to a Diagnostics WG, and I think we're all agreed on that in principle now. Could we hash out specific topics like this one in their own issues/PRs, could be in this repo/WG? Thanks!

@Jeff-Lewis
Copy link
Contributor

Jeff-Lewis commented Jun 8, 2016

@joshgav please know that my request for the Node community to address Domains/context was from the result of the 'expansion' of the Diagnostics WG which seemingly is breaking apart the existing Tracing Working group where by Domains are no longer a part of.

I've branched my questions about domains and execution context here.

@rvagg
Copy link
Member

rvagg commented Jun 8, 2016

Repo has been renamed to nodejs/diagnostics and the team name is @nodejs/diagnostics. Congrats all!

I've added the missing members to the @nodejs/diagnostics that are on the README, @lykkin and @jeffolfert were not org members so have been invited. The remaining discrepancy is that @yunong is in the team but not on the README.

@gergelyke
Copy link

would love to help out as well! could you add me?

@pmuellr
Copy link
Contributor

pmuellr commented Jun 8, 2016

Could someone with appropriate powers fix the GH repo "description" from "Tracing Working Group" to "Node.js Diagnostics Working Group"? @rvagg?

Looks like the PR from @joshgav - https://github.com/nodejs/diagnostics/pull/47/files - has the rename in the base README.md ...

Thanks, all!

@bnoordhuis
Copy link
Member

Could someone with appropriate powers fix the GH repo "description" from "Tracing Working Group" to "Node.js Diagnostics Working Group"?

Done.

@samccone
Copy link

samccone commented Jun 8, 2016

Very interested in being a part of this 😁🙋

@joshgav
Copy link
Contributor Author

joshgav commented Jun 8, 2016

@Jeff-Lewis

where by Domains are no longer a part of

Fair complaint, we didn't intend to drop anything previously handled here. For now I just removed one placeholder line from the docs, no problem continuing discussion in #57.

@joshgav
Copy link
Contributor Author

joshgav commented Jun 8, 2016

@samccone @gergelyke @thealphanerd certainly we welcome and need all the help we can get :).

Two areas you might consider:

@joshgav
Copy link
Contributor Author

joshgav commented Jun 8, 2016

@rvagg - I'm not in the current list but am added in #47, could you add me? Thanks!

@pmuellr - Are we ready to merge #47? There's certainly more to be added, but I think it will be easier to do so once it's merged and we have a structure. It doesn't remove anything, just reorganizes.

@pmuellr
Copy link
Contributor

pmuellr commented Jun 8, 2016

@joshgav LGTM, merge away!

@rvagg
Copy link
Member

rvagg commented Jun 9, 2016

added @joshgav to the team, sorry about the oversight, #47's looking good to me fwiw!

@joshgav
Copy link
Contributor Author

joshgav commented Jun 9, 2016

Thanks for the helpful discussion everyone! Now let the real work begin ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests