Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.json and .csv exports fail to apply base_url #1091

Closed
simonw opened this issue Nov 12, 2020 · 22 comments
Closed

.json and .csv exports fail to apply base_url #1091

simonw opened this issue Nov 12, 2020 · 22 comments
Labels

Comments

@simonw
Copy link
Owner

simonw commented Nov 12, 2020

Just tested with the latest Docker image, and it works pretty much everywhere! THANK YOU!

I did notice that if I try to export json or csv, the base is not applied. Not sure if I should reopen this issue or open a new one.

To see this, go here: https://corpora.tika.apache.org/datasette/corpora-metadata/REF_PARSE_EXCEPTION_TYPES

Click/hover over json or CSV and you'll see that the 'datasette' base is not included.

Originally posted by @tballison in #865 (comment)

@simonw
Copy link
Owner Author

simonw commented Nov 12, 2020

Hmm... it's not just the .csv and .json export links - it's the column headings (which can be clicked to change the sort order) as well. Here's an extract of the HTML from that page:

<p class="export-links">This data as 
  <a href="/corpora-metadata/REF_PARSE_EXCEPTION_TYPES.json">json</a>, 
  <a href="/corpora-metadata/REF_PARSE_EXCEPTION_TYPES.csv?_size=max">CSV</a> (
  <a href="#export">advanced</a>)
</p>
<div class="table-wrapper">
  <table class="rows-and-columns">
    <thead>
      <tr>
        <th class="col-Link" scope="col" data-column="Link" data-column-type="" data-column-not-null="0" data-is-pk="0">
          Link
        </th>
        <th class="col-rowid" scope="col" data-column="rowid" data-column-type="integer" data-column-not-null="0" data-is-pk="1">
          <a href="/corpora-metadata/REF_PARSE_EXCEPTION_TYPES?_sort_desc=rowid" rel="nofollow">rowid&nbsp;▼</a>
        </th>
        <th class="col-PARSE_EXCEPTION_ID" scope="col" data-column="PARSE_EXCEPTION_ID" data-column-type="INTEGER" data-column-not-null="0" data-is-pk="0">
          <a href="/corpora-metadata/REF_PARSE_EXCEPTION_TYPES?_sort=PARSE_EXCEPTION_ID" rel="nofollow">PARSE_EXCEPTION_ID</a>
        </th>
        <th class="col-PARSE_EXCEPTION_DESCRIPTION" scope="col" data-column="PARSE_EXCEPTION_DESCRIPTION" data-column-type="VARCHAR(128)" data-column-not-null="0" data-is-pk="0">
          <a href="/corpora-metadata/REF_PARSE_EXCEPTION_TYPES?_sort=PARSE_EXCEPTION_DESCRIPTION" rel="nofollow">PARSE_EXCEPTION_DESCRIPTION</a>
        </th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td class="col-Link type-pk">
          <a href="/datasette/corpora-metadata/REF_PARSE_EXCEPTION_TYPES/1">1</a>
        </td>
        <td class="col-rowid type-int">1</td>
        <td class="col-PARSE_EXCEPTION_ID type-int">0</td>
        <td class="col-PARSE_EXCEPTION_DESCRIPTION type-str">RUNTIME</td>
      </tr>
      <tr>
        <td class="col-Link type-pk">
          <a href="/datasette/corpora-metadata/REF_PARSE_EXCEPTION_TYPES/2">2</a>
        </td>
        <td class="col-rowid type-int">2</td>
        <td class="col-PARSE_EXCEPTION_ID type-int">1</td>
        <td class="col-PARSE_EXCEPTION_DESCRIPTION type-str">ENCRYPTION</td>
      </tr>
      <tr>
        <td class="col-Link type-pk">
          <a href="/datasette/corpora-metadata/REF_PARSE_EXCEPTION_TYPES/3">3</a>
        </td>
        <td class="col-rowid type-int">3</td>
        <td class="col-PARSE_EXCEPTION_ID type-int">2</td>
        <td class="col-PARSE_EXCEPTION_DESCRIPTION type-str">ACCESS_PERMISSION</td>
      </tr>
      <tr>
        <td class="col-Link type-pk">
          <a href="/datasette/corpora-metadata/REF_PARSE_EXCEPTION_TYPES/4">4</a>
        </td>
        <td class="col-rowid type-int">4</td>
        <td class="col-PARSE_EXCEPTION_ID type-int">3</td>
        <td class="col-PARSE_EXCEPTION_DESCRIPTION type-str">UNSUPPORTED_VERSION</td>
      </tr>
    </tbody>
  </table>
</div>
<div id="export" class="advanced-export">
  <h3>Advanced export</h3>
  <p>JSON shape:
    <a href="/corpora-metadata/REF_PARSE_EXCEPTION_TYPES.json">default</a>,
    <a href="/corpora-metadata/REF_PARSE_EXCEPTION_TYPES.json?_shape=array">array</a>,
    <a href="/corpora-metadata/REF_PARSE_EXCEPTION_TYPES.json?_shape=array&amp;_nl=on">newline-delimited</a>
  </p>
  <form action="/corpora-metadata/REF_PARSE_EXCEPTION_TYPES.csv" method="get">
    <p>
      CSV options:            
      <label>
        <input type="checkbox" name="_dl"> download file
        </label>
        <input type="submit" value="Export CSV">
          <input type="hidden" name="_size" value="max">
          </p>
        </form>
      </div>

But here's something really weird - the links to the individual rows DO include the /datasette/ prefix:

<td class="col-Link type-pk">
    <a href="/datasette/corpora-metadata/REF_PARSE_EXCEPTION_TYPES/2">2</a>
</td>

The navigation bar on that page is correct too:

<p class="crumbs">
        <a href="/datasette/">home</a> /
        <a href="/datasette/corpora-metadata">corpora-metadata</a>
</p>

I've also been unable to replicate this in my own local environment, running datasette fixtures.db --config base_url:/datasette/.

@simonw
Copy link
Owner Author

simonw commented Nov 12, 2020

@tballison is there any chance you're running any custom templates in that installation? I'm really confused as to why I can't replicate the bug.

@simonw
Copy link
Owner Author

simonw commented Nov 12, 2020

The sort headers are generated by this template code:

{% if column.name == sort %}
<a href="{{ path_with_replaced_args(request, {'_sort_desc': column.name, '_sort': None, '_next': None}) }}" rel="nofollow">{{ column.name }}&nbsp;▼</a>
{% else %}
<a href="{{ path_with_replaced_args(request, {'_sort': column.name, '_sort_desc': None, '_next': None}) }}" rel="nofollow">{{ column.name }}{% if column.name == sort_desc %}&nbsp;▲{% endif %}</a>
{% endif %}

The export links use this code:

<p class="export-links">This data as {% for name, url in renderers.items() %}<a href="{{ url }}">{{ name }}</a>{{ ", " if not loop.last }}{% endfor %}{% if display_rows %}, <a href="{{ url_csv }}">CSV</a> (<a href="#export">advanced</a>){% endif %}</p>

<div id="export" class="advanced-export">
<h3>Advanced export</h3>
<p>JSON shape:
<a href="{{ renderers['json'] }}">default</a>,
<a href="{{ append_querystring(renderers['json'], '_shape=array') }}">array</a>,
<a href="{{ append_querystring(renderers['json'], '_shape=array&_nl=on') }}">newline-delimited</a>{% if primary_keys %},
<a href="{{ append_querystring(renderers['json'], '_shape=object') }}">object</a>
{% endif %}
</p>
<form action="{{ url_csv_path }}" method="get">
<p>
CSV options:
<label><input type="checkbox" name="_dl"> download file</label>
{% if expandable_columns %}<label><input type="checkbox" name="_labels" checked> expand labels</label>{% endif %}
{% if next_url and config.allow_csv_stream %}<label><input type="checkbox" name="_stream"> stream all rows</label>{% endif %}
<input type="submit" value="Export CSV">
{% for key, value in url_csv_hidden_args %}
<input type="hidden" name="{{ key }}" value="{{ value }}">
{% endfor %}
</p>
</form>
</div>

@simonw
Copy link
Owner Author

simonw commented Nov 13, 2020

Here's where url_csv comes from:

url_csv = path_with_format(
request=request, format="csv", extra_qs=url_csv_args
)
url_csv_path = url_csv.split("?")[0]

@tballison
Copy link

I'm starting this with docker like so:

docker run --name datasette -d -p 8001:8001 -v pwd:/mnt datasetteproject/datasette datasette -p 8001 -h 0.0.0.0 /mnt/file_profiles.db --config sql_time_limit_ms:120000 --config max_returned_rows:100000 --config base_url:/datasette/ --config cache_size_kb:50000

I'm not doing any templating or anything else custom.

Apropos of nothing, I swapped out a simpler db, so this query should now work:

https://corpora.tika.apache.org/datasette/file_profiles?sql=select%0D%0A++*%0D%0Afrom%0D%0A++file_profiles+fp%0D%0Alimit%0D%0A++10

@tballison
Copy link

My headers aren't clickable/sortable with custom sql, but I think that's by design.

In the default view, https://corpora.tika.apache.org/datasette/file_profiles/file_profiles, ah, y, now I see that the headers should be sortable, but you're right the base_url is not applied.

base_url works with "View and Edit SQL" and with "(advanced)"

As you point out, does not work with the export csv, json, other or with the "Next page" navigational button at the bottom.

@simonw
Copy link
Owner Author

simonw commented Nov 14, 2020

@tballison could I see the section of your Apache config that configures the proxying to /datasette/?

@simonw
Copy link
Owner Author

simonw commented Nov 16, 2020

I have a hunch that there may be some extra configuration in play here - could Apache itself be rewriting some of the links using mod_proxy_html?

@tballison
Copy link

I don't think we are, but I'll check with Maruan.

I think this is the relevant part of our config?

  Alias "/base/" "/usr/share/corpora/"
  <Directory "/usr/share/corpora/">
    Options +Indexes -Multiviews
    AllowOverride None
  </Directory>

  ProxyPreserveHost On

  ProxyPass /datasette http://0.0.0.0:8001
  ProxyPassReverse /datasette http://0.0.0.0:8001

</VirtualHost>

@tballison
Copy link

We're using mod_proxy.

@tballison
Copy link

Anything we can do to help debug this? Thank you, again!

@simonw
Copy link
Owner Author

simonw commented Dec 9, 2020

Could you try removing the ProxyPassReverse /datasette http://0.0.0.0:8001 line?

My hunch is that ProxyPassReverse is rewriting some of the links in the HTML (or maybe in the HTTP headers) in a way that breaks things.

Normally you would need ProxyPassReverse to compensate for the underlying application being unable to rewrite its links - but Datasette's base_url setting causes Datasette to rewrite all of the links for you, so ProxyPassReverse should be unneccessary.

@tballison
Copy link

tballison commented Dec 9, 2020

I don't think this fixes it:

grep -R datasette .
./sites-available/000-default.conf:        ProxyPass /datasette http://127.0.0.1:8001/
./sites-available/000-default.conf:        #ProxyPassReverse /datasette http://127.0.0.1:8001/
./sites-available/corpora-le-ssl.conf:  ProxyPass /datasette http://0.0.0.0:8001
./sites-available/corpora-le-ssl.conf:  #ProxyPassReverse /datasette http://0.0.0.0:8001
./sites-enabled/corpora-le-ssl.conf:  ProxyPass /datasette http://0.0.0.0:8001
./sites-enabled/corpora-le-ssl.conf:  #ProxyPassReverse /datasette http://0.0.0.0:8001

And I confirmed that I actually restarted the server. 🤣

https://corpora.tika.apache.org/datasette/file_profiles

@tballison
Copy link

tballison commented Dec 9, 2020

I can't imagine this helps (esp. given your point about potential rewrites), but you can see that /datasette/ was correctly added to the sql form, but not to the "export-links"

Screen Shot 2020-12-09 at 2 51 09 PM

@simonw
Copy link
Owner Author

simonw commented Dec 9, 2020

OK that is really weird. I'll have another go at replicating this locally.

@henry501
Copy link

henry501 commented Jan 7, 2021

I found this issue while troubleshooting the same behavior with an nginx reverse proxy. The solution was to make sure I set:

proxy_pass http://server:8001/baseurl/
instead of just:

proxy_pass http://server:8001
The custom SQL query and header links are now correct.

@simonw
Copy link
Owner Author

simonw commented Jan 7, 2021

@tballison I think that's the solution! It looks like you need to use this in your config:

ProxyPass /datasette http://127.0.0.1:8001/datasette

Instead of this:

ProxyPass /datasette http://127.0.0.1:8001/

Give that a go and let me know if it fixes it.

@simonw
Copy link
Owner Author

simonw commented Jan 7, 2021

@henry501 it looks like you spotted a bug in the documentation - I just addressed that, the fix is now live here: https://docs.datasette.io/en/latest/deploying.html#running-datasette-behind-a-proxy

@simonw simonw closed this as completed Jan 9, 2021
@tballison
Copy link

+1

Yep! Fixes it. If I navigate to https://corpora.tika.apache.org/datasette, I get a 404 (database not found: datasette), but if I navigate to https://corpora.tika.apache.org/datasette/file_profiles/, everything WORKS!

Thank you!

@simonw
Copy link
Owner Author

simonw commented Jan 11, 2021

Fantastic!

@henry501
Copy link

Great, really happy I could help! Reverse proxies get tricky.

@tballison
Copy link

Y, thank you to both of you!

simonw added a commit that referenced this issue Jan 19, 2021
@simonw simonw added this to the Datasette 0.54 milestone Jan 24, 2021
This was referenced Jan 25, 2021
simonw added a commit that referenced this issue Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants