Created sample for 'TokenizeIntoCharactersAsKeys' API. #3123

zeahmed · 2019-03-27T23:03:38Z

Related to #1209.

Ivanidzo4ka · 2019-03-28T00:18:58Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/TokenizeIntoCharacters.cs

+            //  Expected output:
+            //   Number of tokens: 112
+            //   Character Tokens: M,L,.,N,E,T,',s,<?>,T,o,k,e,n,i,z,e,I,n,t,o,C,h,a,r,a,c,t,e,r,s,A,s,K,e,y,s,<?>,A,P,I,<?>,
+            //                     s,p,l,i,t,s,<?>,t,e,x,t,/,s,t,r,i,n,g,<?>,i,n,t,o,<?>,c,h,a,r,a,c,t,e,r,s,.


[](start = 73, length = 3)

do we really present space as <?> ? #Resolved

Its a unit separator special character.

machinelearning/src/Microsoft.ML.Transforms/Text/TokenizingByCharacters.cs

Line 90 in 3663320

private const ushort UnitSeparator = 0x1f;

#Resolved

sorry! this is the control character used instead of spaces. Please disregard my previous comments

machinelearning/src/Microsoft.ML.Transforms/Text/TokenizingByCharacters.cs

Line 275 in 3663320

bldr.Append((char)(c + '\u2400'));

#Resolved

would be nice to left some comment about that.

In reply to: 270190306 [](ancestors = 270190306)

shmoradims · 2019-03-28T21:33:31Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/TokenizeIntoCharacters.cs

+
+namespace Microsoft.ML.Samples.Dynamic
+{
+    public static class TokenizeIntoCharacters


TokenizeIntoCharacters [](start = 24, length = 22)

let's name the file and class the same as the api. please make sure to update the name in the xml reference when you rename #Resolved

shmoradims · 2019-03-28T21:39:58Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/TokenizeIntoCharacters.cs

+            // Create an empty data sample list. The 'TokenizeIntoCharactersAsKeys' does not require training data as
+            // the estimator ('TokenizingByCharactersEstimator') created by 'TokenizeIntoCharactersAsKeys' API is not a trainable estimator.
+            // The empty list is only needed to pass input schema to the pipeline.
+            var samples = new List<TextData>();


samples [](start = 16, length = 7)

let's call this emptySamples to complement the comments above #Resolved

shmoradims · 2019-03-28T21:40:09Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/TokenizeIntoCharacters.cs

+            var samples = new List<TextData>();
+
+            // Convert sample list to an empty IDataView.
+            var dataview = mlContext.Data.LoadFromEnumerable(samples);


dataview [](start = 16, length = 8)

also emptyDataview #Resolved

shmoradims

Ivanidzo4ka

codecov · 2019-03-28T21:59:36Z

Codecov Report

Merging #3123 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3123      +/-   ##
==========================================
- Coverage   72.52%   72.51%   -0.01%     
==========================================
  Files         808      808              
  Lines      144665   144665              
  Branches    16198    16198              
==========================================
- Hits       104913   104903      -10     
- Misses      35342    35349       +7     
- Partials     4410     4413       +3

Flag	Coverage Δ
#Debug	`72.51% <ø> (-0.01%)`	⬇️
#production	`68.11% <ø> (-0.01%)`	⬇️
#test	`88.81% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
src/Microsoft.ML.Transforms/Text/TextCatalog.cs	`41.66% <ø> (ø)`	⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs	`79.48% <0%> (-20.52%)`	⬇️
...StandardTrainers/Standard/LinearModelParameters.cs	`60.05% <0%> (-0.27%)`	⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs	`86.1% <0%> (-0.16%)`	⬇️
...oft.ML.StandardTrainers/StandardTrainersCatalog.cs	`89.07% <0%> (ø)`	⬆️

zeahmed · 2019-03-28T23:06:57Z

Thanks!

zeahmed added 2 commits March 27, 2019 15:59

Created sample for 'TokenizeIntoCharactersAsKeys' API.

19b0d1b

Updated the catalog.

f4166a7

zeahmed requested review from shmoradims, sfilipi and rogancarr March 27, 2019 23:04

sfilipi mentioned this pull request Mar 27, 2019

API reference - Samples for Transforms #1209

Closed

Ivanidzo4ka reviewed Mar 28, 2019

View reviewed changes

Addressed reviewers' comments.

428d833

shmoradims reviewed Mar 28, 2019

View reviewed changes

shmoradims approved these changes Mar 28, 2019

View reviewed changes

Ivanidzo4ka approved these changes Mar 28, 2019

View reviewed changes

Addressed reviewers' comments.

04f92d5

zeahmed merged commit ee5fbe0 into dotnet:master Mar 28, 2019

zeahmed added a commit to zeahmed/machinelearning that referenced this pull request Apr 8, 2019

Created sample for 'TokenizeIntoCharactersAsKeys' API. (dotnet#3123)

e4edfdc

zeahmed mentioned this pull request Apr 8, 2019

Cherry pick for samples (Text) #3240

Closed

ghost locked as resolved and limited conversation to collaborators Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Created sample for 'TokenizeIntoCharactersAsKeys' API. #3123

Created sample for 'TokenizeIntoCharactersAsKeys' API. #3123

Uh oh!

zeahmed commented Mar 27, 2019

Uh oh!

Ivanidzo4ka Mar 28, 2019 •

edited by zeahmed

Loading

Uh oh!

zeahmed Mar 28, 2019 •

edited

Loading

Uh oh!

zeahmed Mar 28, 2019 •

edited

Loading

Uh oh!

Ivanidzo4ka Mar 28, 2019

Uh oh!

shmoradims Mar 28, 2019 •

edited by zeahmed

Loading

Uh oh!

shmoradims Mar 28, 2019 •

edited by zeahmed

Loading

Uh oh!

shmoradims Mar 28, 2019 •

edited by zeahmed

Loading

Uh oh!

shmoradims left a comment

Uh oh!

Ivanidzo4ka left a comment

Uh oh!

codecov bot commented Mar 28, 2019 •

edited

Loading

Uh oh!

zeahmed commented Mar 28, 2019

Uh oh!

Uh oh!

Created sample for 'TokenizeIntoCharactersAsKeys' API. #3123

Created sample for 'TokenizeIntoCharactersAsKeys' API. #3123

Uh oh!

Conversation

zeahmed commented Mar 27, 2019

Uh oh!

Ivanidzo4ka Mar 28, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zeahmed Mar 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zeahmed Mar 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Mar 28, 2019

Choose a reason for hiding this comment

Uh oh!

shmoradims Mar 28, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmoradims Mar 28, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmoradims Mar 28, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmoradims left a comment

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zeahmed commented Mar 28, 2019

Uh oh!

Uh oh!

Ivanidzo4ka Mar 28, 2019 •

edited by zeahmed

Loading

zeahmed Mar 28, 2019 •

edited

Loading

zeahmed Mar 28, 2019 •

edited

Loading

shmoradims Mar 28, 2019 •

edited by zeahmed

Loading

shmoradims Mar 28, 2019 •

edited by zeahmed

Loading

shmoradims Mar 28, 2019 •

edited by zeahmed

Loading

codecov bot commented Mar 28, 2019 •

edited

Loading