-
Notifications
You must be signed in to change notification settings - Fork 71
Adds tee and cat commands and a supporting DataStore service #1101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
def init(nil), | ||
do: raise "Unable to start #{__MODULE__}: Data path not configured" | ||
def init(base_path) do | ||
state = %__MODULE__{base_path: base_path} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to verify this path exists, is a directory, can be written to, etc? Is that done somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NestedFile
module will try to create it if it does not exist the first time it is attempted to be written to and return an error if it is unable to. That felt like the right thing in this case since not everyone is likely to use the commands that make use of it and I wouldn't want to keep Cog from working. It also defaults to be nested under the default data directory just like logs and audit_logs so it feels pretty safe to me.
Happy to hear arguments to the contrary if you think we should do something else.
opts["insert"] -> | ||
handle_transform(:insert, req, data, state) | ||
opts["append"] -> | ||
{:reply, req.reply_to, List.wrap(data) ++ List.wrap(req.cog_env), state} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this actually "prepend"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's appending the current pipeline output to the saved pipeline output.
Example:
@ash seed '{ "foo": "value1" }' | tee foo
[
{
"foo": "value1"
}
]
@ash seed '{ "foo": "value2" }' | cat -a foo
[
{
"foo": "value1"
},
{
"foo": "value2"
}
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh, you're right... I got a tad mixed up.
end | ||
end | ||
|
||
defp transform_data(action, [prev], curr, opts), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to name this something like transform_map_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done.
do: GenServer.call(__MODULE__, {:delete, namespace, key}) | ||
|
||
def init(nil), | ||
do: raise "Unable to start #{__MODULE__}: Data path not configured" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the convention for this kind of thing is to return {:stop, reason}
. That way start_link
will return the correct {:error, reason}
response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derp, yeah. Thanks.
# must be the same one that data was written into. | ||
@data_namespace [ "commands", "tee" ] | ||
|
||
@description "Save and pass through pipeline data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might have been a copy-pasta'd and not updated, unless cat
actually "saves" stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
assert File.exists?(path) | ||
|
||
assert :ok = NestedFile.delete(base_paths, key) | ||
assert !File.exists?(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but refute
is handy for this kind of thing.
response = send_message(user, "@bot: seed '{\"foo\":\"fooval6\"}' | cat -a test") | ||
assert [%{foo: "fooval5"},%{foo: "fooval6"}] = response | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels weird to add a test that is skipped. Really want we need to do is to test the commands in isolation but we don't have any test helpers setup to do that yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems to pass and doesn't seem to have anything provider specific (since there are no templates, etc) so it's not clear to me why it needs to be skipped. I was just following the pattern for the rest of the command tests.
I think it's worth having even if it's skipped since it describes the expected behavior if nothing else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can run, lets run it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; just left some minor comments.
|
||
MemoryClient.delete(root, token, memory_key) | ||
|
||
case DataStore.replace(@data_namespace, key, data) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth calling out that existing keys are replaced in the desc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the security implications here? If someone tee
's sensitive information, anybody can cat
it if they know the name it was saved under, even if they couldn't have run the commands that generated the data in the first place.
Related: how do you actually delete data once you're done with it?
@@ -12,8 +11,10 @@ defmodule Cog.Command.Service.Supervisor do | |||
token_monitor_table = :ets.new(:token_monitor_table, [:public]) | |||
memory_table = :ets.new(:memory_table, [:public]) | |||
memory_monitor_table = :ets.new(:memory_monitor_table, [:public]) | |||
data_path = Application.get_env(:cog, Cog.Command.Service, [])[:data_path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we're defining this in config/config.exs
, and that the thing won't even start if this is nil
, how about using Application.fetch_env!/2
here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it to use:
Application.fetch_env!(:cog, Cog.Command.Service)
|> Keyword.fetch!(:data_path)
Which should raise if either Cog.Command.Service
or the data_path
key under it are missing from the Cog application configuration.
opts["insert"] -> | ||
handle_transform(:insert, req, data, state) | ||
opts["append"] -> | ||
{:reply, req.reply_to, List.wrap(data) ++ List.wrap(req.cog_env), state} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh, you're right... I got a tad mixed up.
""" | ||
require Logger | ||
|
||
def fetch(base_paths, key, ext \\ "data") do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to specify an extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was mostly there because I didn't know if we'd want to use NestedFile
in other places in Cog since it's fairly generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the case of simultaneous writes to the same key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last one wins. They're serialized through the DataStore service's GenServer.
|
||
Note: The key is sanitized to remove dangerous characters, but the | ||
base paths and extension are not. Make sure not to use user entered | ||
values for these unsafe arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just sanitize everything here, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't really, since "everything" includes the base path which will obviously have things like slashes in it. We could sanitize the contents of the namespace list that is passed to DataStore but it felt unnecessary since this is an internal API that's not exposed outside of Cog core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the key
the only thing that's user-specified? The wording of the comment led me to believe otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the key is the only thing that is user specified. That note is there for future-me who wants to reuse this for something else so that I don't make a mistake an let user-specified data into the namespace for instance.
foo1 = %{"foo1" => "fooval"} | ||
DataStore.replace(@namespace, "foo1", foo1) | ||
assert {:ok, ^foo1} = DataStore.fetch(@namespace, "foo1") | ||
DataStore.delete(@namespace, "foo1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this delete call (and others like it in these tests) really needed? If for some reason the assert failed, this line wouldn't be executed.
If it's important, an on_exit
callback should be used. If it's not important, we can just chuck it 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly there to keep cruft from accumulating after tests since files get written to the filesystem and we don't have transactions to roll them back. I can use an on_exit
instead.
response = send_message(user, "@bot: seed '{\"foo\":\"fooval6\"}' | cat -a test") | ||
assert [%{foo: "fooval5"},%{foo: "fooval6"}] = response | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can run, lets run it.
Yep, anything that anyone saves with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo question about behavior w/concurrent writes and/or deletes.
""" | ||
require Logger | ||
|
||
def fetch(base_paths, key, ext \\ "data") do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the case of simultaneous writes to the same key?
I commented the command test back out. They pass locally if I run it in |
add support for option descriptions to embedded bundle command options
e864e91
to
b199ff7
Compare
DataStore
service based heavily on theMemory
service, but which writes files to disk.service_data
directory under the top-level Cogdata
directory. This is configurable inconfig.exs
.NestedFile
module which creates a multi-level directory hierarchy in order to reduce number of files which will be written to a directory for performance reasons.tee
command to save pipeline output using the DataStore service for later usecat
command to retrieve saved pipeline outputtee
data with the current pipeline output in the following ways:Note that this does include tests for the
cat
andtee
commands which will pass but which are currently configured with@moduletag :skip
to align with the other command tests.