Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates docs build #229

Merged
merged 18 commits into from
Aug 21, 2019
Merged

Updates docs build #229

merged 18 commits into from
Aug 21, 2019

Conversation

Shazwazza
Copy link
Contributor

@Shazwazza Shazwazza commented Aug 12, 2019

  • Fixes build of csproj doc tools to not include the strong name signing
  • Fixes some namespaces issues in the JavaDocToMarkdownConverter
  • Re-executes the JavaDocToMarkdownConverter and commits updated files (incuding whitespaces so on next run there's no changes)
  • Fixes UI auto-expanding quirk with docfx and the docs (hamburger menu)
  • Fixes some links from the docs home page and removes the ICU to be completed note

I have upload the result of this to our temporary docs site https://lucenenetdocs.azurewebsites.net/

@NightOwl888 just regarding your comments here #206 (comment)

home page/package names

Can you elaborate a bit more on this:

It would probably be easier to understand if we updated the names on the home page to reflect the package names

... I've moved the Kuromoji and SmartCn headings outside of the big Analysis heading since they are separate packages and linked to them properly from the home page, is this the type of thing you mean? I also fixed the ICU link on the home page.

(there's currently still an issue with docfx when there is overlapping namespaces between packages, i haven't yet researched into how to fix this but i can, the docfx team are very responsive)

build/build times

A full clean build takes about 20 mins on my machine so should be ok for the build server. On the build server you'd want to run the powershell:

./websites/apidocs/docs.ps1 0 1

which is shorthand for

./websites/apidocs/docs.ps1 --ServeDocs 0 --Clean 1

The output website is in ./websites/apidocs/_site

versioning

I know that in the JavaDocToMarkdownConverter there's a TODO for passing in a tab/version which is for the method RepoLinkReplacer ... but, this method is looking for links in files like overview.md with this syntax src-html, but as far as i can see there is only 2x places in all of the source that contains these types of links which is in the Lucene.Net/overview.md file which is supposed to link to some demos. For this method it would prob just be easier to fix this file, unless there are more inline links i'm unsure of.

Apart from that is there another area where we need to have the version number/tag injected in places?

update: just noticed a version link here https://lucenenetdocs.azurewebsites.net/index.html#reference-documents ... so we'd need to pass a version into the build for that one, do you know of others?

hosting

The docs are currently just hosted on my azure subscription which is also why it has that temporary dns name. I'm fine to leave it there for any length of time but we should look at getting these hosted properly, it's just static files so nothing special.

@NightOwl888
Copy link
Contributor

Can you elaborate a bit more on this:

It would probably be easier to understand if we updated the names on the home page to reflect the package names

What I meant was the docs (on the home page here) don't reflect the actual names of the NuGet packages/assemblies. We should probably sync them up to make it less confusing. Here is how they should be mapped:

  • core > Lucene.Net
  • analyzers-common > Lucene.Net.Analysis.Common (note that the Thai analyzer functionality was moved to Lucene.Net.ICU)
  • analyzers-icu > Lucene.Net.ICU (Should probably order this package alphabetically instead of sticking it here)
  • analyers-kuromoji > Lucene.Net.Analysis.Kuromoji
  • analyzers-morfologik > Does not exist, probably never will because it is obscure and has a huge dependency
  • analyzers-phonetic > Lucene.Net.Analysis.Phonetic
  • analyzers-smartcn > Lucene.Net.Analysis.SmartCn
  • analyzers-stempel > Lucene.Net.Analysis.Stempel
  • analyzers-uima > Lucene.Net.Analysis.UIMA (Ported, but also depends on a huge library. The idea was to use IKVM to convert the dependency, which would only give .NET Framework support. Low priority, might not happen.)
  • benchmark > Lucene.Net.Benchmark
  • classification > Lucene.Net.Classification
  • codecs > Lucene.Net.Codecs
  • demo > Not a NuGet package. Runnable through lucene-cli, where the source code can also be viewed/exported. End users should also be made aware they can view the source directly on GitHub. It looks like some of the docs there are relevant (not the part about setting a classpath), but it might make sense to roll the "Location of the source", "IndexFiles" and "Searching Files" into the lucene-cli docs for the demos. The facet commands for lucene-cli should have "location of the source" as well.
  • expressions > Lucene.Net.Expressions
  • facet > Lucene.Net.Facet
  • grouping > Lucene.Net.Grouping
  • highlighter > Lucene.Net.Highlighter (Note that the Postings Highlighter and the BreakIteratorBoundaryScanner of Vector Highlighter were moved to Lucene.Net.ICU.)
  • join > Lucene.Net.Join
  • memory > Lucene.Net.Memory
  • misc > Lucene.Net.Misc
  • queries > Lucene.Net.Queries
  • queryparser > Lucene.Net.QueryParser
  • replicator > Lucene.Net.Replicator
  • sandbox > Lucene.Net.Sandbox
  • spatial > Lucene.Net.Spatial
  • suggest > Lucene.Net.Suggest
  • test-framework > (I started chipping away at this, needs lots of API work, bug fixes, doc updates, and a review to make sure it is all there. See: LUCENENET-614. Eventually, the main packages users install will be Lucene.Net.TestFramework.NUnit, Lucene.Net.TestFramework.xUnit, and Lucene.Net.TestFramework.MSTest. Will probably focus on this for the next release.)

We also have lucene-cli, which is also a NuGet package, but contains a dotnet tool. It actually wraps functionality from

  1. Lucene.Net
  2. Lucene.Net.Analysis.Kuromoji
  3. Lucene.Net.Analysis.Stempel
  4. Lucene.Net.Benchmark
  5. Lucene.Net.Demo

and makes it runnable from the command line. In Java, the commands can be run directly on the packages that contain them, lucene-cli was created as a way to fill the gap in functionality between Java and .NET.

Note that the functionality that was wrapped into Lucene.Net.ICU exists in its original namespace, but in a different assembly than the original. Only the stuff that depends directly on ICU4N was moved (well, some of the classes from Postings Highlighter didn't depend on it, but they couldn't be used without the rest of the classes that had been transplanted to Lucene.Net.ICU, so all of them were).

The name Lucene.Net.ICU was chosen to be less specific than Lucene.Net.Analysis.ICU because it contains not just analysis functionality, but also functionality from highlighter.

As for your other questions, let me review and try massaging my memory muscles...

@Shazwazza
Copy link
Contributor Author

@NightOwl888 ok great, i can certainly make the landing page listing reflect the actual package names, that's easy to do.

As for the interlinking namespaces between packages, i'll have to investigate the best way to deal with this. I'll try to get the site updated again this week and we can review from there.

@NightOwl888
Copy link
Contributor

versioning

I know that in the JavaDocToMarkdownConverter there's a TODO for passing in a tab/version which is for the method RepoLinkReplacer ... but, this method is looking for links in files like overview.md with this syntax src-html, but as far as i can see there is only 2x places in all of the source that contains these types of links which is in the Lucene.Net/overview.md file which is supposed to link to some demos. For this method it would prob just be easier to fix this file, unless there are more inline links i'm unsure of.

Apart from that is there another area where we need to have the version number/tag injected in places?

update: just noticed a version link here https://lucenenetdocs.azurewebsites.net/index.html#reference-documents ... so we'd need to pass a version into the build for that one, do you know of others?

I did a search using NotePad++'s "Find in Files" feature and here is the entire list

Search "src-html" (7 hits in 2 files)
  \\Boggle\F\Projects\_Test\lucene-solr-4.8.1\lucene\core\src\java\overview.html (2 hits)
	Line 147: &nbsp;<a href="../demo/src-html/org/apache/lucene/demo/IndexFiles.html">IndexFiles.java</a> creates an
	Line 151: &nbsp;<a href="../demo/src-html/org/apache/lucene/demo/SearchFiles.html">SearchFiles.java</a> prompts for
  \\Boggle\F\Projects\_Test\lucene-solr-4.8.1\lucene\demo\src\java\overview.html (5 hits)
	Line 101:      <li><a href="src-html/org/apache/lucene/demo/IndexFiles.html">IndexFiles.java</a>: code to create a Lucene index.
	Line 102:      <li><a href="src-html/org/apache/lucene/demo/SearchFiles.html">SearchFiles.java</a>: code to search a Lucene index.
	Line 110: "src-html/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a> class creates
	Line 181: "src-html/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a> class is
	Line 186: "src-html/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a> class as well)

My thought on the versioning was to pretty much copy what Lucene did. You can access any version by simply changing the version number in the URL:

This should include beta versions. We don't want to remove documentation that might be relevant only to a specific version if someone is still depending on that version. Especially if there have been breaking API changes between them.

Each version of the docs should point only to its own version of the source (using the tag)

This ensures the docs for a specific version stay static and point to the right code for that version even though the code is changing and new versions of docs are being released over time. We don't want to point to the head of the repository, because by the time the reader clicks the link, the doc could be years behind the code.

Perhaps there should even be an index page/directory listing at the root that shows all of the versions that are available (at https://lucenenet.somewhere.com/). Also, it might make sense to make a copy (or redirect) of the latest version at https://lucenenet.somewhere.com/latest/ so we can have links that never need to drift in some places.

The fact that you are hosting them in a temporary location is fine, but they shouldn't be at the top level of the site, they should be in a directory with the version number on it (or at least one that is escaped in a way that works in the URL).

building

I am having issues getting this working.

  1. The first obstacle I ran into was that it prompted for the credentials for the NuGet feeds I have referenced that aren't public. I think the appropriate way to fix this is to add a NuGet.config file to the appropriate folder to temporarily override what is configured on the machine. It looks like you are trying to restore both Lucene.Net.sln and LuceneDocsPlugins.sln? I was able to work around this by disabling those feeds on my machine.
  2. After that, it seems that vswhere isn't correctly identifying the location of MSBuild on my machine. I only have VS2019 Community and VS2017 Community installed. I tried downloading and installing the VS2015 build tools as per the comments, opening a new instance of Powershell and running again, but get the same result.
Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.

PS C:\Users\shad> f:
PS F:\> cd projects/lucenenet
PS F:\projects\lucenenet> ./websites/apidocs/docs.ps1 0 1


    Directory: F:\projects\lucenenet\websites\apidocs


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----        8/13/2019   5:28 PM                tools
Cleaning tools...


    Directory: F:\projects\lucenenet\websites\apidocs\tools


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----        8/13/2019   5:43 PM                tmp
d-----        8/13/2019   5:43 PM                docfx
Retrieving docfx...
d-----        8/13/2019   5:44 PM                nuget
Download NuGet...
d-----        8/13/2019   5:44 PM                vswhere
Download VsWhere...
Feeds used:
  https://api.nuget.org/v3/index.json
  C:\Program Files (x86)\Microsoft SDKs\NuGetPackages\

Installing package 'vswhere' to 'F:\projects\lucenenet\websites\apidocs\tools\tmp'.
  GET https://api.nuget.org/v3/registration3-gz-semver2/vswhere/index.json
  OK https://api.nuget.org/v3/registration3-gz-semver2/vswhere/index.json 867ms


Attempting to gather dependency information for package 'vswhere.2.7.1' with respect to project 'F:\projects\lucenenet\websites\apidocs\tools\tmp', targeting 'Any,Version=v0.0'
Gathering dependency information took 25.38 ms
Attempting to resolve dependencies for package 'vswhere.2.7.1' with DependencyBehavior 'Lowest'
Resolving dependency information took 0 ms
Resolving actions to install package 'vswhere.2.7.1'
Resolved actions to install package 'vswhere.2.7.1'
Retrieving package 'vswhere 2.7.1' from 'nuget.org'.
Adding package 'vswhere.2.7.1' to folder 'F:\projects\lucenenet\websites\apidocs\tools\tmp'
Added package 'vswhere.2.7.1' to folder 'F:\projects\lucenenet\websites\apidocs\tools\tmp'
Successfully installed 'vswhere 2.7.1' to F:\projects\lucenenet\websites\apidocs\tools\tmp
Executing nuget actions took 200.65 ms
Cleaning...
MSBuild path = C:\Program Files (x86)\Microsoft Visual Studio\2019\Community
MSBuild not found!
At F:\projects\lucenenet\websites\apidocs\docs.ps1:112 char:2
+     throw "MSBuild not found!"
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (MSBuild not found!:String) [], RuntimeException
    + FullyQualifiedErrorId : MSBuild not found!

Not sure where to go from here.

Do we need to use MSBuild? Can this be done using dotnet.exe?

code samples

I had a thought about how we might automate the code samples more easily and reliably than a code converter, since lack of using blocks will totally make what a code converter gives us irrelevant, anyway. If the java code sample block can be isolated as a single block of text, we could generate a hash for it and put that hash into a text (markdown?) file along with the sample, for example:

<hash>C34406A7F4070BC61B9256F6239E2B251CE691F83C2F4A6DD1ADC846FC9847A2</hash>
<code language="java">
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead:
    //Directory directory = FSDirectory.open("/tmp/testindex");
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
    IndexWriter iwriter = new IndexWriter(directory, config);
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
    iwriter.addDocument(doc);
    iwriter.close();
    
    // Now search the index:
    DirectoryReader ireader = DirectoryReader.open(directory);
    IndexSearcher isearcher = new IndexSearcher(ireader);
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "fieldname", analyzer);
    Query query = parser.parse("text");
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    assertEquals(1, hits.length);
    // Iterate through the results:
    for (int i = 0; i < hits.length; i++) {
      Document hitDoc = isearcher.doc(hits[i].doc);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    ireader.close();
    directory.close();
<code>

Then these text files can be manually converted to c# and vb (the latter using the roslyn online code converter) and can be committed to the lucenenet repo. A converted file may look something like:

<hash>C34406A7F4070BC61B9256F6239E2B251CE691F83C2F4A6DD1ADC846FC9847A2</hash>
<code language="c#">
    Analyzer analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_CURRENT);

    // Store the index in memory:
    using (Directory directory = new RAMDirectory())
    // To store an index on disk, use this instead:
    //using (Directory directory = FSDirectory.Open("/tmp/testindex"))
    {
        IndexWriterConfig config = new IndexWriterConfig(LuceneVersion.LUCENE_CURRENT, analyzer);
        using (IndexWriter iwriter = new IndexWriter(directory, config))
        {
            Document doc = new Document();
            string text = "This is the text to be indexed.";
            doc.Add(new Field("fieldname", text, TextField.TYPE_STORED));
            iwriter.AddDocument(doc);
        }

        // Now search the index:
        using (DirectoryReader ireader = DirectoryReader.Open(directory))
        {
            IndexSearcher isearcher = new IndexSearcher(ireader);

            // Parse a simple query that searches for "text":
            QueryParser parser = new QueryParser(LuceneVersion.LUCENE_CURRENT, "fieldname", analyzer);
            Query query = parser.Parse("text");
            ScoreDoc[] hits = isearcher.Search(query, null, 1000).ScoreDocs;
            Assert.AreEqual(1, hits.Length);
            // Iterate through the results:
            for (int i = 0; i < hits.Length; i++)
            {
                Document hitDoc = isearcher.Doc(hits[i].Doc);
                Assert.AreEqual("This is the text to be indexed.", hitDoc.Get("fieldname"));
            }
        }
    }
<code>
<code language="vb">
    Dim analyzer As Analyzer = New StandardAnalyzer(LuceneVersion.LUCENE_CURRENT)

    Using directory As Directory = New RAMDirectory()
        Dim config As IndexWriterConfig = New IndexWriterConfig(LuceneVersion.LUCENE_CURRENT, analyzer)

        Using iwriter As IndexWriter = New IndexWriter(directory, config)
            Dim doc As Document = New Document()
            Dim text As String = "This is the text to be indexed."
            doc.Add(New Field("fieldname", text, TextField.TYPE_STORED))
            iwriter.AddDocument(doc)
        End Using

        Using ireader As DirectoryReader = DirectoryReader.Open(directory)
            Dim isearcher As IndexSearcher = New IndexSearcher(ireader)
            Dim parser As QueryParser = New QueryParser(LuceneVersion.LUCENE_CURRENT, "fieldname", analyzer)
            Dim query As Query = parser.Parse("text")
            Dim hits As ScoreDoc() = isearcher.Search(query, Nothing, 1000).ScoreDocs
            Assert.AreEqual(1, hits.Length)

            For i As Integer = 0 To hits.Length - 1
                Dim hitDoc As Document = isearcher.Doc(hits(i).Doc)
                Assert.AreEqual("This is the text to be indexed.", hitDoc.[Get]("fieldname"))
            Next
        End Using
    End Using
<code>

During doc generation, the hash can be re-generated based off of the Java code and checked against this file, and if it has not changed, no change will be made and the code in the converted text file will be used in the documentation. If it has changed, then the java code block can be inserted/appended to the text file, the hash updated to the new value, and a warning/log generated so the code changes can be manually propagated to the c# and vb code blocks, then we can manually remove the java code block.

We will need to do 2 passes to generate the docs if the original Java code changes, but since that will only happen if we upgrade to target a new version of Lucene it won't be a common case. We should have the build process write warning messages to stdout and also to a log that gets uploaded as a build artifact just to ensure we don't miss this during deployment. We could do the second stage offline:

  1. Download the build artifacts from the automated generation.
  2. If we have any code changes, update the files manually and commit those files to the lucenenet official repo.
  3. Regenerate the docs locally based on the changes.
  4. Deploy.

And of course, if there were no code changes, then we can just skip 2 & 3 and deploy.

We don't necessarily have to use <hash> and <code> elements, I am just giving that as an example. Whatever is easiest to integrate into the doc generator would be fine.

Of course, it would be best if the end user had some way to switch between the VB and C# code sample in the generated documents, but for now we should focus on C# if VB is going to be too difficult or time consuming to deal with.

The original doc could then have some specially constructed token that the doc generator knows how to use to grab the code sample from the text file and insert it into the right place in the generated HTML. Some more thought will need to be put in to the exact layout and number of text files in relation to number of code samples per generated document, but I am sure you can work that out. Maybe using a GUID as both a filename and a placeholder in the documentation is the appropriate way to go, but we don't want to use the hash because that may change over time.

@Shazwazza
Copy link
Contributor Author

For the building we can't use dotnet because the plugin system for docfx is netframework only. I've tried updated the project to use PackageReference but it still fails and i guess netframework projects are just not supported by dotnet

But the problem is that I can see you are using an older revision. I fixed a bunch of the build stuff a couple of days ago. Your log output currently has

MSBuild path = ....

which is no longer logged. If you update to latest it hopefully should work.

@NightOwl888
Copy link
Contributor

Yes, you are right I was using an older version. I had just realized that when I got your reply. I will try the latest from this branch and see if that helps.

@NightOwl888
Copy link
Contributor

I just submitted a PR to your branch. There were some issues with some of the arguments showing up in the lucene-cli markdown docs as well as some that were missing altogether.

I also added a job to azure-pipelines.yml to build the docs in the PR. They are disabled by default, until the variable GenerateDocs is defined and set to true.

The docs seem to build fine locally, but when I pushed to Azure DevOps there was a problem:

Build failed.
[19-08-15 03:47:36.933]Error:Referenced TOC file D:\a\1\s\websites\apidocs\obj\docfx\api\Lucene.Net\toc.yml does not exist.
	0 Warning(s)
	1 Error(s)

You can view the full logs at: https://dev.azure.com/LuceneNET-Temp/Lucene.NET/_build/results?buildId=154

Hopefully you can work out what is going on - perhaps a missing dependency?

Also, I noticed that the markdown for the benchmark index.md has a strange blurb at the top:

---
uid: Lucene.Net.Cli.Benchmark
summary: *content
---

Is this something that is required by the doc generator, or an errant commit?

@Shazwazza
Copy link
Contributor Author

Nice, just got that merged.

Regarding the strange blurb: This is a common format for metadata with markdown which uses Yaml in it's headers. I think the term is "YAML Front Matter" . Each document can contain a uid which is how the xref links work. For example, the benchmark doc is linked to from the docs home page using the format [benchmark](xref:Lucene.Net.Cli.Benchmark): System for benchmarking Lucene and the special summary: *content metadata is a special value for docfx to support 'overrides' which is used when clicking the "Improve this doc" button (but in this example overrides aren't a thing since that is only for api doc overrides, so the summary: *content could be removed from this md file). The reason the other CLI md docs might not have this metadata is just because we aren't currently linking to them so they don't really need a uid. The docs converter automatically adds uid metadata to all generted .md based on the lucene overview html files so that we can xref them.

For the build issues on the server, i think docfx fails right away, the first logs when docfx runs is:

Building metadata...
[19-08-15 03:47:25.597]Info:[ExtractMetadata]Environment variable VSINSTALLDIR is set to C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools, it is used as the inner compiler.
[19-08-15 03:47:26.114]Info:[ExtractMetadata]Loading projects...
[19-08-15 03:47:28.354]Warning:[ExtractMetadata](D:/a/1/s/src/Lucene.Net/Lucene.Net.csproj)Workspace failed with: [Failure] Msbuild failed when processing the file 'D:\a\1\s\src\Lucene.Net\Lucene.Net.csproj' with message: The SDK 'Microsoft.NET.Sdk' specified could not be found.  D:\a\1\s\src\Lucene.Net\Lucene.Net.csproj

AFAIK this is because the version of docfx being used requires VS 2017 build tools installed.... BUT I've just run some tests locally and now for whatever reason it works by passing in the 2019 msbuild version to the env variable.

I've also realized that docfx builds the sln in the background anyways so the build script doesn't actually need to build the lucene sln at all which will save some time.

I've pushed changes for that and will check on the build.

@Shazwazza
Copy link
Contributor Author

Looks like the build server doesn't execute with changes based on PRs so you might have to give it a nudge and see how it goes.

@NightOwl888
Copy link
Contributor

Looks like the build server doesn't execute with changes based on PRs so you might have to give it a nudge and see how it goes.

That would be because I haven't investigated setting up the PR options yet.

I pulled down your changes and pushed them up to Azure DevOps. This time it failed much quicker: https://dev.azure.com/LuceneNET-Temp/Lucene.NET/_build/results?buildId=155

BTW - You could potentially setup your own Azure DevOps account, add a Lucene.NET project to it pointing to your GitHub fork, and use the azure-pipelines.yml to run the build. This will allow you to check whether it works and iterate more quickly. On the last step of the "Create build pipeline" workflow, you see the "run" button but it doesn't give you any options to enter variables. After you click it, you can cancel the first build, then enter variables. They have hidden them pretty well, though.

  1. Navigate to <your organization> > <your project> > Pipelines > Build
  2. Select the pipeline on the left, then click the "Edit" button on the right
  3. If you setup your project as "public", you will see a "Variables" button at the top right of the page. If it is private, you need to click the ellipsis in the corner and choose "Triggers", then once that page opens there will be a "Variables" tab.

You will probably want to set:

  1. GenerateDocs = true
  2. RunTests = false

We will probably end up changing the exact location the doc generation happens in the pipeline, but for now this will get you started so you can work out how to get it running.

Once it runs all the way through, the files will be zipped and added as a docs build artifact on the Summary tab of the build, which you can then download and verify.

@Shazwazza
Copy link
Contributor Author

Cool, i have some investigation to do. I 'think' it's because the VS 2017 build tools are required to be installed with this docfx version. The latest docfx version doesn't have this requirement apparently but i ran into some trouble getting it work on the latest version so first I need to figure out that issue and then go from there.

I'll see what i can do on monday - but i'm on holidays from next wed until beginning of sept so just a heads up on that :)

@NightOwl888
Copy link
Contributor

No problem. Let's try to get this PR merged before then.

@Shazwazza
Copy link
Contributor Author

@NightOwl888 I've commented on a currently logged issue on DocFx here dotnet/docfx#4869

Any chance you can configure the build server to run VS 2017 for this particular build step and see if that works?

@NightOwl888
Copy link
Contributor

NightOwl888 commented Aug 19, 2019

Actually, it is pretty easy. Just change the azure-pipelines.yml file which looks like:

  - job: Docs
    pool:
      vmImage: 'windows-2019'

to:

  - job: Docs
    pool:
      vmImage: 'vs2017-win2016'

@NightOwl888
Copy link
Contributor

Oops. Submitted that before it was done by accident. See my edit above.

@NightOwl888
Copy link
Contributor

While going through and cleaning up the code in the test framework, I was reminded about an important c# syntax that is not demonstrated on the home page (https://lucenenet.apache.org), that probably should be.

// Add to the index

var source = new
{
    Name = "Kermit the Frog",
    FavouritePhrase = "The quick brown fox jumps over the lazy dog"
};
var doc = new Document();
// StringField indexes but doesn't tokenise
doc.Add(new StringField("name", source.Name, Field.Store.YES));

doc.Add(new TextField("favouritePhrase", source.FavouritePhrase, Field.Store.YES));

writer.AddDocument(doc);
writer.Flush(triggerMerge: false, applyAllDeletes: false);

can be simplified to:

// Add to the index

var source = new
{
    Name = "Kermit the Frog",
    FavoritePhrase = "The quick brown fox jumps over the lazy dog"
};
Document doc = new Document
{
    // StringField indexes but doesn't tokenize
    new StringField("name", source.Name, Field.Store.YES),
    new TextField("favoritePhrase", source.FavoritePhrase, Field.Store.YES)
};

writer.AddDocument(doc);
writer.Flush(triggerMerge: false, applyAllDeletes: false);

Not urgent, but could we make sure this is updated eventually?

VS2019 is smart enough to give you a hint to do this, but it would be great if the main samples showed how much we have bent the Java-ness toward .NET.

Note to self: Now that I think about it, there is probably a way to add an extension method to IndexWriter so the whole document initialization and addition can be done in a single operation. Food for thought...

Side note: could we normalize it to US English?

  • favorite instead of favourite
  • tokenize instead of tokenise

Or, just change the samples so they are culture neutral instead, then there doesn't need to be a debate about it.

@Shazwazza
Copy link
Contributor Author

🎉that worked on my personal azure pipelines build. Have pushed a change to the yaml file. Just FYI you will noticed a boat load of warnings during the build (approx 665) and that's because some of the cross links in the docs and in the other md files have some issues. I'll have to go through and fix them up prob on a case by case basis with our converter tool. In some cases it's probably just namespace imports. In many other cases it's because the docs are cross linking types that are in the .Net types which we don't link to so i think we can just ignore those but i'll see if i can suppress the warnings for those at some stage.

  • I'll look at updating the main page with the package links now.
  • I can update the home page example, that's just in md.

... will push an update for this soon.

I won't have time before my hols to do anything with the version number stuff but can look into that when i'm back (2nd week of sept).

Re: the code samples idea: DocFx has a nifty way to replace or extend parts of docs with external files. I sort of mentioned this above with the content* syntax. I think we might be able to leverage that for the code samples stuff too but i'll need to look into that when I'm back.

@NightOwl888
Copy link
Contributor

NightOwl888 commented Aug 20, 2019

Alright. Just a heads up I have made lots of changes to the docs in test framework (but nowhere else). So be sure to skip that module. I am trying to get to a point where I can merge it so we can sync up.

Some of the issues with the warnings are due to the fact some types don't exist in .NET Standard 1.6.

I found a workaround that we might be able to use for ConcurrentMergeScheduler. In the test framework, some of the docs refer to NUnit-specific types, such as InconclusiveException, so I did this:

// LUCENENET NOTE: These are primarily here because they are referred to
// in the XML documentation. Be sure to add a new option if a new test framework
// is being supported.
#if TESTFRAMEWORK_MSTEST
using AssumptionViolatedException = // Whatever the type is in MSTest
#elif TESTFRAMEWORK_XUNIT
using AssumptionViolatedException = // Whatever the type is in xUnit
#else // #elif TESTFRAMEWORK_NUNIT
using AssumptionViolatedException = NUnit.Framework.InconclusiveException;
#endif

Unfortunately, that will only work for classes. Some of the warnings are due to methods that don't exist in .NET Standard 1.6 and I am not sure what to do in that case. It may be that duplicating the entire documentation that refers to the methods may be the only choice.

@Shazwazza
Copy link
Contributor Author

@NightOwl888 I've pushed a bunch of changes:

  • Updates to the docs converter to deal with more cases (still plenty more to do but ... baby steps)
  • Pushes changes made from the docs converter
    • some dealing with the Demo stuff - but there's prob quite a bit to change in that doc which will need to be part of the docs converter in some way, otherwise by using a docfx 'override'
    • removes unnecessary surround
      sections
    • fixes up named anchors (in some cases, not all yet)
  • I've changed the docs home page to list the packages, i've categorized this by "Packages" and "Tools" where demo + cli are under tools. I'm unsure about the UIMA package since I don't know what project that belongs too so it's not actually part of the docs build right now which means it doesn't currently link anywhere
  • Have updated the website home page example code - sent a PR for that change to he site repo, we should automate this too some day, see: Simplifies home page example code lucenenet-site#4
  • I've pushed up these changes to the docs site https://lucenenetdocs.azurewebsites.net/

@NightOwl888
Copy link
Contributor

Looking good.

I've changed the docs home page to list the packages, i've categorized this by "Packages" and "Tools" where demo + cli are under tools. I'm unsure about the UIMA package since I don't know what project that belongs too so it's not actually part of the docs build right now which means it doesn't currently link anywhere

Packages is sort of a Java-ism. I think it would make more sense to call them libraries, or the more generic "modules". Libraries is probably the most specific and indicates they are not executable.

@Shazwazza
Copy link
Contributor Author

Sounds good, will update and push.

@NightOwl888
Copy link
Contributor

NightOwl888 commented Aug 20, 2019

Sorry, once again my message was posted before I was done with it. What happened to ENTER always meaning CRLF instead of submit?

I've changed the docs home page to list the packages, i've categorized this by "Packages" and "Tools" where demo + cli are under tools. I'm unsure about the UIMA package since I don't know what project that belongs too so it's not actually part of the docs build right now which means it doesn't currently link anywhere

UIMA - probably not a thing. If it is possible, just leave it there, but make it generate HTML comments so it is not visible.

Demo - I thought it, but forgot to mention that these are all included in Lucene-CLI. What I'd like to do is pull some of the documentation from that module into the CLI docs (maybe I could just do that manually). I was also thinking that it would be simpler if instead of linking to the code in the repo, just grab it and put it directly into the docs (MSDN style). Each file is a standalone console app and would just need to be pasted inside of the docs.

Come to think of it, I already put in some special tokens so only the code sample is grabbed and none of the surrounding junk that isn't important for the sample. They are used by the CLI tool itself to display the code on screen.

So, basically instead of both Demo and CLI, we should just have CLI. But no problem keeping the Tools category in case it grows.

@NightOwl888
Copy link
Contributor

Come to think of it, I already put in some special tokens so only the code sample is grabbed and none of the surrounding junk that isn't important for the sample. They are used by the CLI tool itself to display the code on screen.

Scratch that, looks like the files don't have any tokens. The entire contents can be grabbed and inserted except for maybe the license header (I think the CLI supports exclusion tokens, but I ended up not using them). Is there some kind of placeholder we need in the doc files so you can grab the latest code from the demo (looks like the demos could use some updates to the cleaner APIs)? The demos map one-to-one to each one of the files in Lucene.Net.Demo. Would those "blubs" happen to look exactly like the one in the benchmark docs, and if different, could you go through and link them? Then I can go through the docs and pull out what is needed and arrange the text around the code as appropriate.

@Shazwazza
Copy link
Contributor Author

I've pushed another update: hides UIMA and removes the API Docs header so we just have Libraries and Tools.

Since Benchmark isn't a lib and it's part of the CLI, I was going move that under tools too?

I'm trying to get my head around exactly what you want but TBH I'm not quite sure :)

I understand we have the CLI Demo stuff here for example: https://lucenenetdocs.azurewebsites.net/cli/demo/simple-facets.html
and the demo's more or less map 1:1 with the c# demo files: https://lucenenetdocs.azurewebsites.net/api/Lucene.Net.Demo/Lucene.Net.Demo.Facet.SimpleFacetsExample.html and I get that the c# demo code is actually in source code not in displayable docs files.

Based on your comments above I'm just unsure what the end result is that you are after?

I don't have much more time today and then I'm overseas for 3 weeks but can see what i can do in the next hr or 2.

@NightOwl888
Copy link
Contributor

NightOwl888 commented Aug 20, 2019

Sorry if I am a bit scatterbrained :).

Since Benchmark isn't a lib and it's part of the CLI, I was going move that under tools too?

Actually, it is both a library and a tool. The library is available if someone wants to extend it to do customized benchmarks. So, looks good as-is.

I understand we have the CLI Demo stuff here for example: https://lucenenetdocs.azurewebsites.net/cli/demo/simple-facets.html
and the demo's more or less map 1:1 with the c# demo files: https://lucenenetdocs.azurewebsites.net/api/Lucene.Net.Demo/Lucene.Net.Demo.Facet.SimpleFacetsExample.html and I get that the c# demo code is actually in source code not in displayable docs files.
Based on your comments above I'm just unsure what the end result is that you are after?

Basically, I am hoping to get some of the documentation from the original JavaDocs into markdown (which I can do manually), and get the code into the same markdown files (which I am hoping you can achieve). So, the end user will basically have 4 ways to get the demo code:

  1. View it on screen with the CLI
  2. Export it using the CLI
  3. View it on the markdown doc (1:1 - I am hoping to have you automate getting the code from the .cs files in the repo into the markdown so they stay up to date)
  4. They can view the code directly on GitHub

The on-screen and export from the CLI embed the code as resources, so it is always the same version as the lucene-cli tool. I am hoping to do something similar so the code samples in the docs stay up to date as well.

Once that is achieved, we really have no need for the original Demo documentation and can remove the link/generation for it.

Is that clear?

@Shazwazza
Copy link
Contributor Author

So, the end user will basically have 4 ways to get the demo code:

View it on screen with the CLI
Export it using the CLI
View it on the markdown doc (1:1 - I am hoping to have you automate getting the code from the .cs files in the repo into the markdown so they stay up to date)
They can view the code directly on GitHub

Gotcha. So I'll need to do is: Take all of the *.cs files in the Lucene.Net.Demo project and automatically output the code into .md files that we can use to directly display this code in our docs. For example, on this page: https://lucenenetdocs.azurewebsites.net/api/Lucene.Net.Demo/Lucene.Net.Demo.Facet.SimpleFacetsExample.html ... we'd want to have the actual cs code shown there. The user could then still click on "View Source" to navigate to the source file in GitHub.

Correct?

If so, we can definitely do that by using the DocFx "overwrite" feature.

@NightOwl888
Copy link
Contributor

So, the end user will basically have 4 ways to get the demo code:
View it on screen with the CLI
Export it using the CLI
View it on the markdown doc (1:1 - I am hoping to have you automate getting the code from the .cs files in the repo into the markdown so they stay up to date)
They can view the code directly on GitHub

Gotcha. So I'll need to do is: Take all of the *.cs files in the Lucene.Net.Demo project and automatically output the code into .md files that we can use to directly display this code in our docs. For example, on this page: https://lucenenetdocs.azurewebsites.net/api/Lucene.Net.Demo/Lucene.Net.Demo.Facet.SimpleFacetsExample.html ... we'd want to have the actual cs code shown there. The user could then still click on "View Source" to navigate to the source file in GitHub.
Correct?

Yes, that is correct.

@NightOwl888
Copy link
Contributor

You don't have to do this today, but one thing that I think is crucial for the .NET ecosystem is to put the books that have been published about Lucene and Lucene.NET on the home page. A few people have asked for a "user manual" of sorts on the user and dev mailing lists, and I think that is about as close as our team can get to answering that request. There is even a "Lucene 4 cookbook" that I think should go front and center. Not sure if they have updated these books since then.

https://www.amazon.com/s?k=lucene&ref=nb_sb_noss_1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants