Skip to content

Created sample for 'ApplyWordEmbedding' API. #3142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 1, 2019

Conversation

zeahmed
Copy link
Contributor

@zeahmed zeahmed commented Mar 29, 2019

Related to #1209.

@codecov
Copy link

codecov bot commented Mar 29, 2019

Codecov Report

Merging #3142 into master will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3142      +/-   ##
==========================================
+ Coverage   72.52%   72.53%   +0.01%     
==========================================
  Files         808      808              
  Lines      144665   144740      +75     
  Branches    16198    16202       +4     
==========================================
+ Hits       104912   104981      +69     
- Misses      35342    35348       +6     
  Partials     4411     4411
Flag Coverage Δ
#Debug 72.53% <ø> (+0.01%) ⬆️
#production 68.12% <ø> (ø) ⬆️
#test 88.82% <ø> (+0.01%) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Transforms/Text/TextCatalog.cs 41.66% <ø> (ø) ⬆️
.../Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs 97.26% <0%> (-2.74%) ⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
...rc/Microsoft.ML.StaticPipe/SdcaStaticExtensions.cs 81.72% <0%> (-0.61%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.21%) ⬇️
...oft.ML.StandardTrainers/Standard/SdcaRegression.cs 95.83% <0%> (ø) ⬆️
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs 72.95% <0%> (+0.17%) ⬆️
...StandardTrainers/Standard/LinearModelParameters.cs 60.31% <0%> (+0.26%) ⬆️
...c/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs 24.46% <0%> (+0.73%) ⬆️
...StandardTrainers/Standard/StochasticTrainerBase.cs 93.02% <0%> (+4.65%) ⬆️

var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData, TransformedTextData>(textTransformer);

// Call the prediction API to convert the text into embedding vector.
var data = new TextData() { Text = "This is a greate product. I would like to buy it again." };
Copy link

@shmoradims shmoradims Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greate [](start = 58, length = 6)

typo: great #Resolved

{

file.WriteLine("This is custom file for 4 words with 3 dimensional word embedding vector. This first line in this file does not confirm to the '<word> <float> <float> <float>' pattern, and is therefore ignored");
file.WriteLine("greate" + " " + string.Join(" ", 1.0f, 2.0f, 3.0f));
Copy link

@shmoradims shmoradims Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greate [](start = 32, length = 6)

typo #Resolved

{

file.WriteLine("This is custom file for 4 words with 3 dimensional word embedding vector. This first line in this file does not confirm to the '<word> <float> <float> <float>' pattern, and is therefore ignored");
file.WriteLine("greate" + " " + string.Join(" ", 1.0f, 2.0f, 3.0f));
Copy link

@shmoradims shmoradims Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"greate" + " " + string.Join(" ", 1.0f, 2.0f, 3.0f) [](start = 31, length = 51)

i think it's more readable to write this as "great 1.0 2.0 3.0"
and just add a comment above this line that the following three lines are in format.

It would also be cleaner to turn the line above to comments:
// Write a custom 3-dimensional word embedding vector with 4 words.
// The vectors follow ' ' pattern.
#Resolved

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -125,7 +125,7 @@ public static class TextCatalog
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[FeaturizeText](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/WordEmbeddingTransform.cs)]
Copy link
Member

@abgoswam abgoswam Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WordEmbeddingTransform [](start = 101, length = 22)

Should we delete this file WordEmbeddingTransform.cs which already has a sample for ApplyWordEmbedding ?

Or did we retain this file because its used for some other API sample ? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be deleted now to avoid mess.


In reply to: 270972391 [](ancestors = 270972391)


namespace Microsoft.ML.Samples.Dynamic
{
public static class ApplyCustomWordEmbedding
Copy link
Member

@abgoswam abgoswam Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplyCustomWordEmbedding [](start = 24, length = 24)

is this sample used anywhere ? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the overloaded version of ApplyWordEmbedding method in TextCatalog. It is referenced as an example there similar to other methods.


In reply to: 270986934 [](ancestors = 270986934)

// Write a custom 3-dimensional word embedding model with 4 words.
// Each line follows '<word> <float> <float> <float>' pattern.
// Lines that do not confirm to the pattern are ignored.
var pathToCustomModel = @".\custommodel.txt";
Copy link
Member

@abgoswam abgoswam Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@".\custommodel.txt"; [](start = 36, length = 21)

would user reading the documentation get access to this file ? #Resolved

Copy link
Contributor Author

@zeahmed zeahmed Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get you properly. This file is created once the sample executes. It is needed to pass on to the ApplyWordEmbedding method.


In reply to: 270987259 [](ancestors = 270987259)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@abgoswam abgoswam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zeahmed zeahmed merged commit 21b5bb4 into dotnet:master Apr 1, 2019
@zeahmed
Copy link
Contributor Author

zeahmed commented Apr 1, 2019

Thanks!

zeahmed added a commit to zeahmed/machinelearning that referenced this pull request Apr 8, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants