Skip to content

Investigate and reconcile differences in how different langauges encode URLs and the C SDK #686

Closed

Description

Currently, we only URL encode (using percent encoding) the following characters, which follows RFC 3986 and only leaving unreserved characters un-encoded https://tools.ietf.org/html/rfc3986#section-2.3:

AZ_NODISCARD AZ_INLINE bool _az_span_url_should_encode(uint8_t c)
{
switch (c)
{
case '-':
case '_':
case '.':
case '~':
return false;
default:
return !(('0' <= c && c <= '9') || ('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z'));
}
}

Other languages behave differently and we should make sure that either we have specific reasons for the discrepancy, OR we try to mimic what other languages do after figuring out what precedent to follow (since there isn't clear consensus).

Things to consider:

C#: https://dotnetfiddle.net/TH7z7h

using System;
using System.Text.Encodings.Web;
					
public class Program
{
	public static void Main()
	{
		string query = "Hellö Wörld@Some lang$!*()+;";
		string encodedQuery = UrlEncoder.Default.Encode(query);
		// Hell%C3%B6%20W%C3%B6rld@Some%20lang$!*()%2B;
		Console.WriteLine(encodedQuery);
		
		// Hell%C3%B6%20W%C3%B6rld%40Some%20lang%24%21%2A%28%29%2B%3B
		Console.WriteLine(Uri.EscapeDataString(query));
	}
}

https://source.dot.net/#System.Text.Encodings.Web/System/Text/Encodings/Web/UrlEncoder.cs,123
https://docs.microsoft.com/en-us/dotnet/api/system.uri.escapedatastring?view=netcore-3.1
https://docs.microsoft.com/en-us/dotnet/api/system.net.webutility.urlencode?view=netcore-3.1#remarks
Azure/azure-sdk-for-net#11657

Golang: https://play.golang.org/p/BMrXEtoFR_K

package main

import (
	"fmt"
	"net/url"
)

func main() {
	query := "Hellö Wörld@Some lang$!*()+;"
	// Hell%C3%B6+W%C3%B6rld%40Some+lang%24%21%2A%28%29%2B%3B
	fmt.Println(url.QueryEscape(query))
}

Java: https://repl.it/repls/VapidMealyKeygens

import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;
import java.io.UnsupportedEncodingException;

class Main {

  // Method to encode a string value using `UTF-8` encoding scheme
    private static String encodeValue(String value) {
        try {
            return URLEncoder.encode(value, StandardCharsets.UTF_8.toString());
        } catch (UnsupportedEncodingException ex) {
            throw new RuntimeException(ex.getCause());
        }
    }
    
  public static void main(String[] args) {
    String query = "Hellö Wörld@Some lang$!*()+;";
    String encodedQuery = encodeValue(query);
    // Hell%C3%B6+W%C3%B6rld%40Some+lang%24%21*%28%29%2B%3B
    System.out.println(encodedQuery);
  }
}

https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html
Azure/azure-sdk-for-java#10273

cc @antkmsft, @JeffreyRichter, @gilbertw, @alzimmermsft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    ClientThis issue points to a problem in the data-plane of the library.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions