Page MenuHomePhabricator

Download to PDF: HTTP 500 error on some wikis for some users
Closed, ResolvedPublicBUG REPORT

Description

"Download to PDF" on en.wv is returning error: "{" name ":" HTTPError "," message ":" 500 "," status ":500," detail ":" Internal Server Error "}"

  • Client-Side workaround ---

As many users have been referred here, a potential workaround some clients may use is:

  1. Install a virtual printer that creates PDF documents on your client
  2. Use your browser Print function
  3. Select the virtual printer from (1) above
  4. Use the virtual printer save function

Event Timeline

Koavf's link works for me.

Even when you click on "Download"? The *page* renders properly, but the *PDF* gives the HTTP error.

Yes. Here's the PDF it gave me.

@Jtneill:Thanks for reporting this. For future reference, please use the bug report form (linked from the top of the task creation page) to create a bug report, and fill in all the sections in the template instead of deleting them, to avoid followup questions for more information and examples. Thanks.

If this is about "Download as PDF" onhttps://en.wikiversity.org/wiki/Motivation_and_emotion/Book/2024/Dopamine_and_social_behaviour,then it works for me...

Aklapperrenamed this task fromDownload to PDF - Error on en.wvtoDownload to PDF: HTTP 500 error on en.wikiversity for some folks.Oct 4 2024, 6:31 AM

@AklapperThanks for testing and clarifying about bug reporting.

To confirm, I tested 3 browsers on a Windows 10 device and each gave a HTTP 500 error when clicking "Download" fromhttps://en.wikiversity.org/w/index.php?title=Special:DownloadAsPdf&page=Motivation_and_emotion%2FBook%2F2024%2FDopamine_and_social_behaviour&action=show-download-screen:

  • Google Chrome
  • Mozilla Firefox
  • Microsoft Edge

However, download pdf works using Chrome on an Android device.

A reader reported similar problem on en.wikipedia in ticket 2024100510006742

"
{ "name": "HTTPError", "message": "500", "status":500, "detail": "Internal Server Error" }

I'm using macOS Sequoia 15.0 on my 2019 MacBook Pro. I get the same results using either Google Chrome browser, Safari browser or Arc browser.
"

I was unable to duplicate.

I tried to downloadthe same article PDF on zhwikithrough several VPN IP addresses. I found that I could download it from some regions (e.g. Germany) but not from others (e.g. Hong Kong). Even after I switched to using another browser, I still couldn't download the PDF.

I can share which IPs could or couldn't download the PDF privately if anyone needs that information. Perhaps data centers in some regions have the issue?

SCP-2000renamed this task fromDownload to PDF: HTTP 500 error on en.wikiversity for some folkstoDownload to PDF: HTTP 500 error on some wikis for some folks.Oct 8 2024, 8:38 AM

Users are continuing to report this problem:

Screenshot_2024-10-11_104037_Wiki_Unable2Download.png (341×1 px, 16 KB)
Xaosfluxrenamed this task fromDownload to PDF: HTTP 500 error on some wikis for some folkstoDownload to PDF: HTTP 500 error on some wikis for some users.Oct 11 2024, 6:15 PM
TheDJsubscribed.

I also experience it forhttps://en.wikiversity.org/api/rest_v1/page/pdf/Motivation_and_emotion%2FBook%2F2024%2FDopamine_and_social_behaviour

I don't see anything obvious inhttps://grafana.wikimedia.org/d/U4TuF-lMk/proton?orgId=1

The response headers list x-cache: cp3069 miss, cp3069 miss

Logstash has lots of spawn process errors:

Error: Failed to launch the browser process!
/usr/bin/chromium: 5: /etc/chromium.d/extensions: Cannot fork
/usr/bin/chromium: 121: Cannot fork
/usr/bin/chromium: 123: Cannot fork
[826540:826540:1015/085235.180776:FATAL:spawn_subprocess.cc(237)] posix_spawn /usr/lib/chromium/chrome_crashpad_handler: Resource temporarily unavailable (11)

https://logstash.wikimedia.org/goto/ff20ff6a1109dc85f8a0fcd7cc2d9c0b

Screenshot 2024-10-15 at 10.53.45.png (152×1 px, 21 KB)

This appears to be a rerun ofT375521- temporary fix last time was a roll restart, but there's clearly a deeper issue.

I suspect that the issue is that we don't close or somehow we end up in a sitation with stale browser instances. Given the level of traffic/support of the pdf service would it be enough to just restart the service?

Users continue to report this problem, such as in otrs ticket 2024101810000205. I was also able to reproduce this issue myself.

I suspect that the issue is that we don't close or somehow we end up in a sitation with stale browser instances. Given the level of traffic/support of the pdf service would it be enough to just restart the service?

Works for me - the service can be restarted using thestandard roll_restart pattern.We might see this come back soon, as this happened last month also.

Chromium is leaking processes, leavingchromium_crashpads lying around after a failure most likey:

root@wikikube-worker2070:/home/hnowlan# ps uax| grep chrome_crashpad | wc -l
115357

Done, I will keep an eye on the logs.

For what is worth, I also update the dashboard athttps://grafana-rw.wikimedia.org/d/U4TuF-lMk/proton?orgId=1to allow querying both DCs at once as well as selectively and fixed the per container saturation panels that were broken. I also removed the nodejs garbage collection panels as those metrics aren't being emitted anymore (support has been dropped at the service runner level IIRC)

Here are some people with similar experiences.A suggestion was to run with--single-process --disable-crashpad-for-testing(because crashpad disabling does not propagate to subprocesses)

Maybe this is related?https://github.com/GoogleChromeLabs/chrome-for-testing/issues/114

Errors are clearly spiking again

https://grafana.wikimedia.org/d/U4TuF-lMk/proton?orgId=1&from=1730305791822&to=1730910591822

IMG_2006.png (2×1 px, 515 KB)

(I’d open a separate ticket, but im on my phone)

TheDJtriaged this task asUnbreak Now!priority.Nov 6 2024, 4:31 PM

Looks like the same crashpad flood issue again. The service needs a restart, and I think we should implement the flags@TheDJhas mentioned.

I run a rolling restart in k8s. Regarding the chromium parameters we already pass the--single-process.I am taking a look at what the other parameter does in detail.

Change #1088271 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[operations/deployment-charts@master] chromium-render: Add cli flag to avoid flooding with crashpad processes

https://gerrit.wikimedia.org/r/1088271

hmm. there is still some failures occurring above the average..

Screenshot 2024-11-08 at 10.14.42.png (536×1 px, 208 KB)

The error rate is quickly increasing again:

Screenshot 2024-11-19 at 14.37.52.png (554×1 px, 207 KB)

Change #1088271mergedby jenkins-bot:

[operations/deployment-charts@master] chromium-render: Add cli flag to avoid flooding with crashpad processes

https://gerrit.wikimedia.org/r/1088271

I got this on enwiki w/ Chromium v126 (Falklands War)

image.png (768×1 px, 21 KB)

Reloadinghttps://en.wikipedia.org/api/rest_v1/page/pdf/Falklands_Warseemed to work.

@ihurbainjust deployed the crashpad flag flip patch and (at least for now) Proton looks happier.

There have not been significant 5xx errors for 7+ days now. Calling this fixed until proven otherwise, Thanks everyone!