[go: up one dir, main page]

Page MenuHomePhabricator

RFC: Re-evaluate librsvg as SVG renderer on Wikimedia wikis
Open, HighPublic

Assigned To
None
Authored By
MZMcBride
Jun 27 2012, 10:00 PM
Referenced Files
F34149094: SVG_CSS_Test.png
Mar 10 2021, 5:40 AM
F34144683: Screenshot from 2021-03-08 07-48-13.png
Mar 8 2021, 8:37 PM
F28610136: Screenshot from 2019-04-11 13-39-13.png
Apr 11 2019, 11:41 AM
Tokens
"Yellow Medal" token, awarded by JayCubby."Like" token, awarded by Jc86035."Like" token, awarded by JoKalliauer."Like" token, awarded by MichaelSchoenitzer."Like" token, awarded by Liuxinyu970226.

Description

Wikimedia wikis support uploading SVG files since 2005. At the time, web browsers had particularly poor support for rendering SVG. The ability to use SVG files in HTML/wiki pages (inline) was added by converting to bitmaps server-side. The rasterizer chosen for that was librsvg, which takes the SVG code and generates PNGs. librsvg was upgraded several times since and it has always remained Wikimedia's unique SVG rasterizer as of 2024 (see librsvg bugs for more history).

Despite the enormous progress achieved with these upgrades, rasterization with librsvg remains unreliable. The resulting bitmaps are frequently not faithful, as explained in librsvg bugs. And it is hard to blame that on upstream given how it describes its mission. From its README:

Goals of librsvg

Librsvg aims to be a low-footprint library for rendering SVG1.1 and SVG2 images. It is used primarily in the GNOME project to render SVG icons and vector images that appear on the desktop. It is also used in Wikimedia to render the SVG images that appear in Wikipedia, so that even old web browsers can display them. Many projects which casually need to render static SVG images use librsvg.

We aim to be a "render this SVG for me, quickly, and with a minimal API" kind of library.
Feature additions will be considered on a case-by-case basis.
You can read about librsvg's supported SVG and CSS features in the development guide.

Non-goals of librsvg

We don't aim to:
Implement every single SVG feature that is in the spec.
Implement scripting or external access to the SVG's DOM.
Implement support for CSS-based animations (but if you can think of a nice API to do this, we would be glad to know!)
Replace the industrial-strength SVG rendering machinery in modern web browsers.

In short, librsvg was never designed for encyclopedic usage with arbitrary files, but one major reason why it was chosen is its performance. This aspect still matters today, since Wikimedia performs on average more than 200 rasterizations per minute (October 2024). As Gilles Dubuc explained:

The problem isn't as much the amount of SVGs we get per day, than the fact that we render thumbnails on demand when they're for a file/size combination never requested before. Any extra rendering time is a penalty for that viewer. The issue compounds if they request a lot of new thumbnails at once, making them more likely to run into throttling limits, resulting in erroring images. That can easily happen on galleries that get visited very rarely. But some people's workflows get them to visit those a lot and their overall experience becomes terrible.

We prerender the most common sizes at upload time, but there's a very long tail of more exotic thumbnail sizes requested because editors customised the sizes they wanted with wikitext, or the wiki itself has different defaults, etc.

thumbor's Grafana dashboard provides details about load in its Engines section. "qps" means "Queries per second".


During the 19 years since librsvg was chosen, other renderers have improved a lot, some even more than librsvg. 2 particular developments seem to allow reducing our reliance on librsvg and should be evaluated:

  1. The appearance of resvg, which would be the natural successor to librsvg insofar as it's also a free software performant server-side rasterizer written in Rust which implements a large part of SVG
  2. Browser support has become ubiquitous, and its quality is on par with librsvg.
resvg

resvg appeared in late 2017 and has quickly become a leader, thanks to tireless work by its creator, @RazrFalcon . Its major difference with librsvg is that it has a goal of supporting the whole SVG specification (although it concedes it has no plans to support animations). Tests run by @JoKalliauer in 2021 suggest resvg 0.14 had already surpassed librsvg at that time, both in quality and performance.

User agents

In principle, client-side rendering should be best, saving not only computation time but bandwidth, which should increase performance and quality while minimizing costs. All of the main browser engines (Blink, Gecko and WebKit) use their own renderer, but their quality appears to be comparable. Since they support animations, client-side rendering should be the preferred mechanism.

But one big concern is the variability of support between browser engines. For example, SVGator documents some differences with animation support. Allowing client-side rendering (basically) loses our ability to hack problematic files to ensure they are supported by the renderer.
This variability would however not be entirely new; when uploading an SVG file, the preview displayed is already rendered client-side, which can be particularly confusing when after confirmation the same browser then misdisplays the same file which librsvg doesn't properly render.

Transitional issues

Even if librsvg was deemed to offer 85% quality vs 90% for resvg, switching renderer needs careful consideration since the―say―10% files which resvg does not render well may contain files which librsvg renders well. With something in the order of 1 million SVG files on Wikimedia Commons, even regressions in 2% of cases could affect thousands of files. Keeping librsvg next to resvg to avoid regressions is a possibility, but identifying which files it handles better would be challenging. To switch confidently, we need to consider the impact on new files, on existing files, and anticipate how each renderer will evolve in the future.

Even evaluating a renderer's current quality is far from trivial. W3C's test suite was abandoned a long time ago, so that the most useful test suite appears to be resvg's as of 2024. This potentially biases our comparison.

Downsides of resvg

Unlike librsvg, resvg:

  1. is not in Debian (except for an excessively old version)
  2. is not a stock option for $wgSVGConverter
  3. does not claim to be ready (it remains at version 0 as of October 2024), but this should have limited impact on Wikimedia, since it reflects (up to now) constant API breaks, but Wikimedia. Yevhenii Reizner "would say that resvg is finished" as an SVG 1 renderer.
  4. still has no article on either the English or the French Wikipedia.

None of these are major problems, and they could be solved in the near future, but resvg's loss of Yevhenii Reizner, who had been doing about 9/10 of the work from the beginning, is a serious threat to its future, decimating its chances of overcoming downstream inertia. The bus factor of librsvg (which was revived in 2016) appears to be a lot less bad than resvg's. resvg has a limited collaboration infrastructure and apparently no business model.

See Also

SVG test suites
SVG image support
T120746: Improve SVG rendering
T10901: [DO NOT USE] SVG rasterisation and management on Wikimedia sites (tracking)

Details

Reference
bz38010

Related Objects

StatusSubtypeAssignedTask
Resolvedhnowlan
Resolvedhnowlan
Resolvedhnowlan
Resolvedhnowlan
Resolvedhnowlan
OpenFeatureNone
OpenBUG REPORTNone
ResolvedBUG REPORThnowlan
StalledNone
DuplicateBUG REPORTNone
OpenNone
ResolvedBUG REPORThnowlan
StalledBUG REPORTNone
StalledBUG REPORTNone
OpenNone
ResolvedBUG REPORThnowlan
StalledNone
StalledNone
OpenBUG REPORTNone
Resolvedhnowlan
Resolvedhnowlan
OpenNone
OpenNone
DuplicateNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

librsvg repo haas been disabled and doesn't support node v12+ (https://github.com/2gis/node-rsvg/tree/0.7.0). I see we could switch to puppeteer. E.g. https://github.com/etienne-martin/svg-to-img as a replacement?

(Debian bullseye uses nodejs 12.22.5).

Even the repo it says to use hasn't received an update since 2019...

librsvg repo haas been disabled and doesn't support node v12+ (https://github.com/2gis/node-rsvg/tree/0.7.0). I see we could switch to puppeteer. E.g. https://github.com/etienne-martin/svg-to-img as a replacement?

(Debian bullseye uses nodejs 12.22.5).

Even the repo it says to use hasn't received an update since 2019...

We don't use that. Thumbor is written in Python (2, we know), but we shell out to rsvg-convert anyway. Librsvg is written mostly in Rust now, but the version currently in production is still C. Upstream is https://gitlab.gnome.org/GNOME/librsvg, packaged as https://packages.debian.org/stretch/librsvg2-bin.

librsvg repo haas been disabled and doesn't support node v12+ (https://github.com/2gis/node-rsvg/tree/0.7.0). I see we could switch to puppeteer. E.g. https://github.com/etienne-martin/svg-to-img as a replacement?

(Debian bullseye uses nodejs 12.22.5).

Even the repo it says to use hasn't received an update since 2019...

We don't use that. Thumbor is written in Python (2, we know), but we shell out to rsvg-convert anyway. Librsvg is written mostly in Rust now, but the version currently in production is still C. Upstream is https://gitlab.gnome.org/GNOME/librsvg, packaged as https://packages.debian.org/stretch/librsvg2-bin.

Mathoid uses it though. I presumed that's what this task was about.

librsvg repo haas been disabled

No it has not been. That's an unrelated repo.

Mathoid uses it though. I presumed that's what this task was about.

Indeed. Please file a different task though tagged Mathoid and Platform Engineering and subscribe @Physikerwelt as well. If a dependency is abandoned, a replacement needs to be found/written. Otherwise we 'll eventually need to disable that functionality

Mathoid uses it though. I presumed that's what this task was about.

Indeed. Please file a different task though tagged Mathoid and Platform Engineering and subscribe @Physikerwelt as well. If a dependency is abandoned, a replacement needs to be found/written. Otherwise we 'll eventually need to disable that functionality

T247697: Rethink mathoids SVG to PNG conversion already exists as a subtask of this one, which may have been the cause of the confusion. I'm not sure it really should be, as Mathoid is self-contained (producing both the SVG and PNG output) and doesn't have to use the general-purpose renderer.

Mathoid uses it though. I presumed that's what this task was about.

Indeed. Please file a different task though tagged Mathoid and Platform Engineering and subscribe @Physikerwelt as well. If a dependency is abandoned, a replacement needs to be found/written. Otherwise we 'll eventually need to disable that functionality

T247697: Rethink mathoids SVG to PNG conversion already exists as a subtask of this one, which may have been the cause of the confusion.

I see, my mistake I missed that. Thanks for pointing it out. I did leave some comments on that task. Let's not hijack this task more for Mathoid's SVG functionality.

@JoKalliauer

I've basically read through all the discussion here, is there currently a problem with librsvg not being upgraded to the latest version(Rust)?

If you need a Chrome-like rendering effect, I would suggest trying skr-canvas, which runs in Node.js and uses skia to render SVG.

With skr-canvas + resvg it should solve most of your problems in SVG_test_suites [1], and you might also consider adding skr-canvas to SVG_test_suites for test results.

[1]: https://commons.wikimedia.org/wiki/User:JoKalliauer/SVG_test_suites/resvg_Issues_details

@Yisibl The better is the enemy of good. Thanks for that comment, however skr-canvas is "This project is in pre-release stage. And there may some bugs existed.". I don't see any benchmarks. resvg had a optional skia-backend-support in earlier versions, but @RazrFalcon dropped it, so I'm doubtfull that is on a comparable development level as resvg.

Making Time benchmarks is pretty straightforward, but compare functions is tricky, because you often have to read and interpret the SVG1.1-Spec to know who is right and who is wrong, and many are not defined. However you are free to add it to User:JoKalliauer/SVG_test_suites.

As long as WMF-developers do not have any time to do any progress on this, I don't see any sense in adding another renderer to the benchmark. If WMF needs a benchmark to decide, I will check skr-canvas, otherwise it is imho useless waste of time.

Improve SVG Rendering is currently on the 5th Place in the Community Wishlist Survey 2022. In the last two years they took the first 5 Projects each. If you like to support the project your support might be essential. You can vote till 11. Februar, 18:00 UTC (so vote in advance to avoid time-zone-problems).

My opinion is to let the browser do the rendering. We just have to strip the javascript withing the SVG on upload to prevent XSS attacks.

About the SVG language, all browsers currently support it. https://caniuse.com/?search=systemLanguage

I get that you say to allow the user to switch the language depending on which wiki they are. This could be solved with javascript. First we can get the current browser language by navigator.languages, then we search the DOM using Javascript's querySelector() and do the changes in each <switch> tag.

Just my two cents.

@Arthurfragoso, the only issue is that font rendering can be wildly inconsistent between devices and browsers. It can cause labels to not line up correctly, text to overlap, or even certain characters not to show up. Sure, best practice may be to convert raw text to paths, but there are lots of cases where that isn't practical or desirable (especially if an SVG file needs frequent edits or updates).

My opinion is to let the browser do the rendering. We just have to strip the javascript withing the SVG on upload to prevent XSS attacks.

No, there are several more things to consider. E.g. embedding an external SVG into the SVG. But I'm not an expert on this.

I listed all illegal SVG-content i found out on https://commons.wikimedia.org/wiki/User:JoKalliauer/IllegalSVGPattern JavaScript is Point 5. (So it already get blocked during upload, even though some files still exist which were uploaded before the filter was introduced.)

About the SVG language, all browsers currently support it. https://caniuse.com/?search=systemLanguage

Yes and No. Yes all common browsers support simple cases, however systemLanguage is handled completely differently by different engines.
E.g. a systemLanguage without a switch or systemLanguage="en_US" aswell with a systemLanguage="en".
Some things are rendered wrong, others are not even defined in the Definition, therefore there exists no unique rendering.

In the end you have to check every SVG if it is supported or not, that's what I documented on User:JoKalliauer/SVG_test_suites/ReSVG-Test-suite for 1000s of files.
E.g. https://commons.wikimedia.org/wiki/File:Test_suite_resvg_a-systemLanguage-006.svg is rendered correctly by Wikimedia, but rendered wrong by Chrome, so e.g Chrome does not fully support systemLanguage, and Firefox is not better.

The SVG-Compatibily in Browsers is a own task: T134410 Evaluate SVG rendering compatibility in browsers

If a SVG-file contains systemLanguage (e.g. File:Unicode_Geschlechtersymbole.svg) and the SVG should be rendered locally, Wikimedia should imho provide a single Image for every language.

@Arthurfragoso, the only issue is that font rendering can be wildly inconsistent between devices and browsers. It can cause labels to not line up correctly, text to overlap, or even certain characters not to show up. Sure, best practice may be to convert raw text to paths, but there are lots of cases where that isn't practical or desirable (especially if an SVG file needs frequent edits or updates).

That is a very valid point. Most SVGs that are embedded directly, converted the text to path. However files on Commons should be editable for derivatives, therefore they should almost always contain real text. Often Users only think about their own Device and often use copyright-protected fonts like Arial or Times, which are not available on e.g. default-Linux-Systems. Since the image is rendered correctly on their PC, they don't care.
We imho need to ensure a unique rendering that it is the uploaders responsibility and not the readers one, otherwise we will end up having many broken files.

@Ahecht : Maybe using µsvg might be a solution. µsvg converts the SVG into the most simplistic SVG (converts text to path, CSS&use will be resolved,...). Those svgs can be generally rendered correctly even by simple SVG_renderer. So the uploaded SVG could be complex, but the SVG for rendering by the client would be simple.

Another problem about browser-rendering is imho that browsers mostly support the unreleased draft of SVG2.0. Supporting the latest cutting edge features is something everyone want's; but in 10years rendering those files correctly might be a pain in the ass. (Okay SVG2.0 won't change that much, but you might get the point.)

I would like WMF to directly serve SVG files. Today's browsers offer reasonable SVG support. Letting the browser render the image also allows for dynamic interaction.

The font issue is solvable. The brute force solution would have WMF serve webfonts.

The systemLanguage issue can be solved many ways. It can be localized at the server or it can be localized at the client. WMF could even choose to change the semantics of its webpages: the SVG image displays the image in the user's preferred language rather than that wiki's language. It would have little impact on most users. If I'm usually on the de.Wiki and my preferred language is de, then it does not impact my viewing. It's only strange when I visit the zh.Wiki and see German illustrations.

I do not know WMF's constraints, but one of the advantages of its servers converting SVG to PNG and then serving PNG is lower average network bandwidth. That is a big deal for JPEG and PNG files, which I think make up the bulk of WMF's images. Instead of serving a full-size 3 MB JPEG, WMF can serve a small 40 kB thumbnail. My gut tells me that WMF needs to do that to be efficient. I also expect that cellphone users appreciate the smaller bandwidth and quicker page loads.

I believe SVG files are a small fraction of the images on Commons, so serving them at full size would not hurt as much as serving full-size JPEGs. I still see merit in thumbnailing SVGs into PNGs. One editor's measurements suggested the average SVG file is 700 kB. With GZip compression, the transferred size might be 200 kB. That's probably still 5 times larger than the expected PNG thumbnail.

Furthermore, thumbnailing protects users from not only XSS, but it also protects them from malicious rendering. WMF puts a clock on rendering an image into PNG. It must render in a few seconds or it is abandoned. Even if the SVG took several seconds to render on a WMF server, the rendering of the resulting PNG on a browser at 1:1 will be fast (sub second). Somebody can make an SVG file that is computationally expensive. Consider and image that has 8,192 Gaussian-blurred layers that are semi-opaque. If that image is not slow enough, then use a million layers. What happens when one views a wiki page with several such images with her browser?

There are even more sophisticated issues. I can make an SVG file that blinks at 3 Hz. If that file is directly served, then it might trigger an epileptic seizure. I might also generate some SVG that flashes single frame subliminal messages.

I would like WMF to directly serve some SVG files, but there are both technical and security issues that arise.

JoKalliauer raised the priority of this task from Low to High.Jun 10 2022, 9:04 PM

According to https://www.mediawiki.org/wiki/User:JoKalliauer/phab/wikimedia-svg-rendering#table I think this task might be the most important one of Wikimedia-SVG-rendering . Almost all bugs reported in Wikimedia-SVG-rendering depend in the renderer.

Here's what I understand.

Directly serving SVG is not a short-term decision.

There are only two thumbnail renderers in contention: the Rust version of librsvg and the newly-minted resvg. Either should fix a huge backlog of SVG rendering problems, so either renderer will be a big improvement. Both have better CSS support than the current renderer. Developers for both renderers want to support WMF.

Both those renderers use Rust, a language that WMF's current version of Debian does not support.

The task of upgrading Debian will allow Rust-based renderers such as the new librsvg and resvg to be dropped in.

I do not know, but I expect both will consult the operating system for the list of fonts. Nothing special should be needed.

Both renderers should work for hyphenated systemLanguage langtags such as zh-Hant (the current WMF version of librsvg does not).

However, MW currently passes the language-to-render $lang variable in the $LANG environment variable. That is a type violation: $LANG should be a Unix locale string; it is not an IETF langtag. There is not a 1:1 mapping between IETF langtags and Unix locale strings (which are also supposed to be opaque!).

Consequently, MW / Thumbor must change how the $lang variable is passed to the renderer. The argument passing should not use locale environment variables.

As I understand it, both librsvg and resvg now take command line arguments. (Citations)

librsvg 2.52.x will have a new --accept-language parameter, which will allow to specify the user's preferred languages by passing the HTTP Accept-Language header to librsvg: https://gitlab.gnome.org/GNOME/librsvg/-/issues/356 (Not sure if it will get backported to the 2.50.x series)

guessing this is the Rust source for command line arguments...

For MW, that means a rewrite of rasterize().

That is, starting at line 343, the external command must also process a $lang pattern substitution. The code starting at line 355 should be deleted: $lang is not a Unix locale string.

The $lang pattern substitution could be involved if $lang is false, but I think that could be simplified. The current MW semantics render SVG files in English if a language is not specified. Therefore, if $lang is not specified, then set it to en. That should give the same semantics as before. In the past, if $lang was not specified, then the operating system's $LANG environment variable would come into play, and that variable would specify en.

Consequently upload URLs with .../300px... would be rendered in en English, and URLs with .../langzh-hant-300px... would be rendered in zh-hant Chinese.

For Thumbor, engine/svg/svg.py needs similar changes.

See also T261192.

T308395 reveals a problem when the SVG filename contains strings such as $output.

The code should be written so either librsvg or resvg can be used.

We should choose one of the renderers. JoKalliauer should have good insight into that choice.

Thanks for that roundup, @Girx! There's a lot that has happened in the SVG world since I dared to call myself an expert on the subject.

Who is the expert on the SVG sanitation code in MediaWiki core these days, and when was the last time they made a serious change to that part of the code? My hunch/fear is that the answer is "no one has seriously looked at that code in a while", and I'm guessing that's the biggest bottleneck.

Your hunch/fear would be correct.

I would like to note that this can all easily be implemented for non-wmf wikis. If someone just spent some time on adapting SVGHandler (or created an extension to override SVGHandler).

It just CANNOT easily go to WMF production any time soon because of security reviews, thumbor plugins which would have to be made, and the fact that the thumbor install itself is stuck in old systems that require updating all things for which there currently are no WMF budgets..

I am… getting impatient enough to ask: how hard is it to, really, just make our own statically-compiled rsvg-convert binary into a deb package and then deploy it? I mean:

  • Rust already builds binaries with rust stuff statically linked in.
  • rustup is available for getting us an installation of rust without going through debian, and without interfering with anything stored in a prefix. Only thing that could stop rustup is the glibc version, but even then we could just build it on a newer distro and do static-crt.
  • System C deps for librsvg feel… reasonably conservative? I am not ruling out the possibility that it’s too new though.
  • deb packages are easily assembled from a DESTDIR structure with dpkg-buildpackage.

We can get this as a stop-gap measure *while* we talk about what else to switch to. The surface for any security review would be minimal compared to anything that requires adding a layer of adaptation to the PHP side (hopefully we just do the language code change).

I am… getting impatient enough to ask: how hard is it to, really, just make our own statically-compiled rsvg-convert binary into a deb package and then deploy it?

Discussion on upgrading librsvg should go to T265549. It's too hard to follow this task when multiple proposals are discussed in parallel.

In a recent discussion in WikiProject Mathematics yet another rendering bug was encountered. Several users expressed the sentiment that SVG support in MediaWiki will never get fixed, and it is better to give up on them altogether and revert to PNGs. This is specially frustrating because the browsers can render the SVG correctly, but MediaWiki insists on passing it through librsvg and serving the resulting garbage instead.

Mathoid stopped serving PNG images for quite a while, and there were no complains, even though the SVG images from mathoid are quite special. Thus, we have one more data point that the browsers svg support is quite good.

In a recent discussion in WikiProject Mathematics yet another rendering bug was encountered. Several users expressed the sentiment that SVG support in MediaWiki will never get fixed, and it is better to give up on them altogether and revert to PNGs. This is specially frustrating because the browsers can render the SVG correctly, but MediaWiki insists on passing it through librsvg and serving the resulting garbage instead.

We have the same problem with music notation and the Score extension, I have people who remove my Lilypond snippets (which could be SVG but currently produces bad PNGs) and replace them with some JPEG garbage from Sibelius or MuseScore.

My recollection of why we don't serve user-submitted SVGs directly as thumbnails is that the last time this was looked at there was no robust and up-to-date FLOSS SVG sanitiser that could ensure that the SVGs were safe to display directly in the browser.

XML is notoriously hard to sanitise and there are new tricks invented regularly to bypass sanitisation. Essentially, we don't want to deal with the possibility of a badly intentioned actor being able to inject a tracking URL inside an SVG that would let them collect IP addresses of anyone viewing that image in an article, run some arbitrary javascript, or worse, being able to leverage a browser security flaw in SVG parsing.

Furthermore, we would still need to have fallbacks for browsers that either don't render SVG natively or do a terrible job at it.

Thank you for this comment @Gilles , and thanks to @Glrx and others for their replies.

I don't think that browser security flaws in SVG parsing is something we should consider (if there are, Wikimedia would merely increase the exposure of users).

As for malicious JavaScript and tracking URLs, if such risks exist, they are far from specific to Wikimedia and I very much hope the Web platform allows including an SVG file preventing them (perhaps with a "restricted" profile which is a bit like disabling JavaScript but just for the image). If such risks are unavoidable (without some indeed fragile sanitization/validation process), I encourage to mention them in T5593, where that concern would be highly relevant (even if it is relevant here too).
Note that @Gilles's account is disabled, so I do not expect a follow-up from his part.

there was no robust and up-to-date FLOSS SVG sanitiser that could ensure that the SVGs were safe to display directly in the browser.

DOMPurify exists now and meets that criteria imho. However that is actually besides the point since svg in <img> tags do not execute javascript or external resources so is safe (embedding in an iframe/object is more risky, but probably not any more than the status quo and i dont think that is wanted anyways. The only really risky thing here would be to directly embed the svg tags in the html page, which i dont think anyone is suggesting). Browser 0-days are of course always a risk, but i think its one we should just accept in this context as too unlikely to worry about. [This is just my personal opinion. Obviously for this to move forward wikimedia security team would have to approve whatever the plan is]

Imo, the real blocker here is two fold:

  • inconsistent rendering between different browsers (esp. Fonts) and between browsers and what users expect from librsvg. To a very minor extent, also our support for i18n svgs might be problematic.
  • some very large svgs where there would be a performance impact (e.g. maps that are 25 mb big)

All of these seem solvable if the desire is there (probably in the form of a system that serves some svgs directly and others as png depending on the nature of the file in question). Its just a matter of deciding which trade-offs to take.

there was no robust and up-to-date FLOSS SVG sanitiser that could ensure that the SVGs were safe to display directly in the browser.

DOMPurify exists now and meets that criteria imho. However that is actually besides the point since svg in <img> tags do not execute javascript or external resources so is safe (embedding in an iframe/object is more risky, but probably not any more than the status quo and i dont think that is wanted anyways. The only really risky thing here would be to directly embed the svg tags in the html page, which i dont think anyone is suggesting). […]

Thank you very much Bryan, I had no idea. If you are effectively saying that an SVG rasterizer yields better results on files which contain JavaScript than client-side rendering of the same file via <img>, please highlight that significant concern in T5593.

Imo, the real blocker here is two fold:

  • inconsistent rendering between different browsers (esp. Fonts) and between browsers and what users expect from librsvg. To a very minor extent, also our support for i18n svgs might be problematic.
  • some very large svgs where there would be a performance impact (e.g. maps that are 25 mb big)

All of these seem solvable if the desire is there (probably in the form of a system that serves some svgs directly and others as png depending on the nature of the file in question). Its just a matter of deciding which trade-offs to take.

I think in practice migrating to client-side rendering will need to be gradual, so <img srcset> is unavoidable at least temporarily. And srcset largely answers the first issue, by letting users (usually via their browser vendor) decide whether they prefer SVG or bitmaps.

As for large files, it seems the browser would be again best positioned to decide. However, srcset visibly wasn't designed for SVG and can't express an SVG file's size. So the client would have to make at least 2 HEAD requests to compare options. The bandwidth required would be minimal, but it would add 1 round-trip of latency.

If you are effectively saying that an SVG rasterizer yields better results on files which contain JavaScript than client-side rendering of the same file via <img>, please highlight that significant concern in T5593.

No, it would be mostly the same. Javascript is disabled in both contexts.

I think in practice migrating to client-side rendering will need to be gradual, so <img srcset> is unavoidable at least temporarily. And srcset largely answers the first issue, by letting users (usually via their browser vendor) decide whether they prefer SVG or bitmaps.

Srcset doesnt really have a browser preference associated with it (beyond your zoom level). Normally in that context, srcset is about support for browsers that do not support svg. All browsers since 2009 support svg to some extent, so that is not super relavent in a modern context. I suppose srcset could still be used just in case.

As for large files, it seems the browser would be again best positioned to decide. However, srcset visibly wasn't designed for SVG and can't express an SVG file's size. So the client would have to make at least 2 HEAD requests to compare options. The bandwidth required would be minimal, but it would add 1 round-trip of latency.

If implented, i suspect we would just implement on the server side.

There are many issues with WMF's support of SVG.

Many people want some client side rendering of SVG. T5593 T208578 I would like to see small files (say < 40 kB) directly served.

I'm not an expert, but I think that change would be localized to Thumbor. If Thumbor is asked to rasterize an SVG file, it can notice the file is small and then serve it directly. If Thumbor sets the MIME type, then I think the img element will display it properly. But it also butchers the current semantics. A URL that formerly always gave a PNG file now might give an SVG file. Some OCR code I use will not take SVG but will take PNG; I use something like {{filepath:foo.svg|800}} to get a PNG. Maybe add something to the URL that requires a PNG or obey HTTP requests that ask only for a PNG MIME type.

In general, modern browsers have better SVG support than librsvg on WMF. Modern browsers support textPath but librsvg does not. Recent librsvg has better support for hyphenated langtags, but WMF does not engage that ability. See https://commons.wikimedia.org/w/index.php?lang=en-us&title=File%3ASystem_language_attribute_bug_demo.svg . IIRC, the current librsvg does not support textLength. Some browsers are weaker than librsvg; IIRC, Firefox still does not support the common idioms for sub- and superscripts. (For that matter, SVG has a poor concept of nesting graphics state; superscripts on superscripts do not work in SVG but work in HTML.)

WMF's language semantics is different from SVG language semantics. If I view a page on the Italian wiki, then WMF will render multilingual SVG files in Italian. A browser will render the multilingual SVG in the user's preferred language. (Sigh. Today I noticed that Chrome no longer implements language preferences correctly). WMF needs to either accept the semantic variation (not a big problem) or serve localized SVG files (add new code).

An astute Thumbor can notice the /langit-300px in a URL (Thumbor already extracts the lang variable) and serve a localized Italian SVG file. If it serves the SVG directly, it might issue a redirect, but that might cause some other troubles. I do not want a mobile user downloading a 50 MB image file unexpectedly.

In the long run, WMF should adopt a different approach for multilingual SVG files. AFAIK, the only graphic editor that can handle them is Inkscape. Even then, people need to be skilled in editing such files. I've seen multilingual files where graphics elements are imbedded in what should be a text-only switch element.

I do not see the font substitution problem as a big issue. There are many issues. Adobe tools encode the font weight and style in the font family name. Many font substitution systems do not handle condensed fonts. Many designers do not use fallback fonts. I think all those issues ultimately fall to the graphic designer. Graphic designers who insist on using their favorite fonts should expect rendering surprises. I do not expect many machines to have a font for an ancient Indian script. That problem can be surmounted by asking users to download the required font. WebFonts are a solution, but they invite tracking and other headaches.

There are other subtle issues. An img tag turns off JavaScript and clicks, but it does not turn off animation. I see that as a good thing because then animated SVGs could replace animated GIFs. However, animations can trigger seizures, so some method of suppression is in order.

Another benefit of serving PNG instead of SVG is some protection against attack. I can make a small SVG file that consumes a lot of computational resources. The rasterizer gets around that with a clock time limit. I'm not sure that we can expect such limits in a browser. A good practice might be to not serve an SVG unless it passes a server-side rendering test.

All in all, I think WMF should serve small SVG files. The database should have a flag the prevents serving files that are identified as having issues.

If you are effectively saying that an SVG rasterizer yields better results on files which contain JavaScript than client-side rendering of the same file via <img>, please highlight that significant concern in T5593.

No, it would be mostly the same. Javascript is disabled in both contexts.

Ah, thanks, I see that according to @JoKalliauer uploads with JavaScript are actually disabled, which makes a lot more sense than ignoring it.

I think in practice migrating to client-side rendering will need to be gradual, so <img srcset> is unavoidable at least temporarily. And srcset largely answers the first issue, by letting users (usually via their browser vendor) decide whether they prefer SVG or bitmaps.

Srcset doesnt really have a browser preference associated with it (beyond your zoom level). Normally in that context, srcset is about support for browsers that do not support svg. All browsers since 2009 support svg to some extent, so that is not super relavent in a modern context. I suppose srcset could still be used just in case.

I'm sorry that my comment was unclear. What I meant to say is that srcset allows the client to choose between options. Indeed, I am not aware of any browser which lets users set how to deal with such a choice (at the very least without extensions). However, users can choose which browser they use, and they are technically able to make their own browser, so in theory, users can already decide. My point is that the client is best aware of the context (resolution / use case, user agent SVG rendering ability, vision accuracy and bandwidth cost) and we should not make impose a choice server-side just for compatibility reasons. If the user knows it can render the SVG perfectly, then it's best to get the SVG regardless of how other clients display it.

As for large files, it seems the browser would be again best positioned to decide. However, srcset visibly wasn't designed for SVG and can't express an SVG file's size. So the client would have to make at least 2 HEAD requests to compare options. The bandwidth required would be minimal, but it would add 1 round-trip of latency.

If implented, i suspect we would just implement on the server side.

Again, I was not suggesting we implement that either server-side or client-side. This is a general issue with the Web platform which is best addressed by user agents.

According to Grafana, eqiad and codfw each get an average of 0.8 queries for new SVGs per second, with spikes up to 4 qps. More than 75% of those requests are handled using 575ms of CPU time on average. For context, there are 8.4 requests per second to eqiad and codfw for filetypes handled by imagemagick, including SVGs, which use 2-4s of CPU time.

Thanks @AntiCompositeNumber but that link doesn't work anymore. I guess the same data can now be obtained from https://grafana.wikimedia.org/d/Pukjw6cWk/thumbor?orgId=1&from=now-1h&to=now&refresh=1m&var-quantile=0.75&var-engine=svg, but units are not specified and I don't know how to interpret. Can you at least tell us which graphs convey that information?

Re glrx:

I'm not an expert, but I think that change would be localized to Thumbor. If Thumbor is asked to rasterize an SVG file, it can notice the file is small and then serve it directly. If Thumbor sets the MIME type, then I think the img element will display it properly. But it also butchers the current semantics. A URL that formerly always gave a PNG file now might give an SVG file. Some OCR code I use will not take SVG but will take PNG; I use something like {{filepath:foo.svg|800}} to get a PNG. Maybe add something to the URL that requires a PNG or obey HTTP requests that ask only for a PNG MIME type.

While this is an option, i actually think its better to not have thumbor involved at all. I think it would be better to just make files with systemLanguage attributes always be rasterized (at least in the beginning). For other files i think we should treat it similar to jpgs, where sometimes we thumbnail and sometimes we send the original asset.

Another benefit of serving PNG instead of SVG is some protection against attack. I can make a small SVG file that consumes a lot of computational resources. The rasterizer gets around that with a clock time limit. I'm not sure that we can expect such limits in a browser.

I think some real world testing might be in order, but i honestly wouldn't worry too much about that unless browser behavior is fully terrible. I see this more like ordinary vandalism - you can already sort of do this with large gifs. I dont think its that different from people putting attack images on a page.

While this is an option, i actually think its better to not have thumbor involved at all. I think it would be better to just make files with systemLanguage attributes always be rasterized (at least in the beginning). For other files i think we should treat it similar to jpgs, where sometimes we thumbnail and sometimes we send the original asset.

Agreed, it's much better for MediaWiki to make the decision about whether a SVG should be rendered client-side (either directly or through some sort of transformation mechanism for consistent language handling) or server-side.

According to Grafana, eqiad and codfw each get an average of 0.8 queries for new SVGs per second, with spikes up to 4 qps. More than 75% of those requests are handled using 575ms of CPU time on average. For context, there are 8.4 requests per second to eqiad and codfw for filetypes handled by imagemagick, including SVGs, which use 2-4s of CPU time.

Thanks @AntiCompositeNumber but that link doesn't work anymore. I guess the same data can now be obtained from https://grafana.wikimedia.org/d/Pukjw6cWk/thumbor?orgId=1&from=now-1h&to=now&refresh=1m&var-quantile=0.75&var-engine=svg, but units are not specified and I don't know how to interpret. Can you at least tell us which graphs convey that information?

The qps codfw k8s and qps eqiad k8s graphs contain queries per second data for the current primary and secondary datacenter, respectively. Thumbor doesn't unconditionally echo requests from the primary to the secondary datacenter anymore, so the primary datacenter has a higher load. The CPU time for >75% of requests is recorded in the Processing CPU time graphs.
The numbers have all gone slightly up since 2020.

As a note, both the MediaWiki thumbnailing code and Thumbor are unstewarded and unmaintained. There are some developers willing to review simple patches, but large changes like the implementation of client-side arbitrary SVG rendering will require WMF support that currently does not exist.

Re glrx:

I'm not an expert, but I think that change would be localized to Thumbor. If Thumbor is asked to rasterize an SVG file, it can notice the file is small and then serve it directly. If Thumbor sets the MIME type, then I think the img element will display it properly. But it also butchers the current semantics. A URL that formerly always gave a PNG file now might give an SVG file. Some OCR code I use will not take SVG but will take PNG; I use something like {{filepath:foo.svg|800}} to get a PNG. Maybe add something to the URL that requires a PNG or obey HTTP requests that ask only for a PNG MIME type.

While this is an option, i actually think its better to not have thumbor involved at all. I think it would be better to just make files with systemLanguage attributes always be rasterized (at least in the beginning). For other files i think we should treat it similar to jpgs, where sometimes we thumbnail and sometimes we send the original asset.

Say the wiki page selects either the SVG or PNG rendering based on size. Say it is a small SVG file and SVG is selected. Now somebody comes along and uploads a 20 MB SVG images on top of the original, small, SVG. That would mean all the pages that reference that SVG file need to be rebuilt even though the aspect ratio did not change. Alternatively, the fetch of the overweight SVG should be turned into a PNG fetch. Maybe page rebuilds are not expensive, but some SVG files are used on a lot of pages.

The qps codfw k8s and qps eqiad k8s graphs contain queries per second data for the current primary and secondary datacenter, respectively. Thumbor doesn't unconditionally echo requests from the primary to the secondary datacenter anymore, so the primary datacenter has a higher load. The CPU time for >75% of requests is recorded in the Processing CPU time graphs.
The numbers have all gone slightly up since 2020.

Oh, so "qps" means Queries per second... thanks AntiComposite, that was not particularly obvious.

So here are the graphs showing each server's SVG rasterization rate for the previous week:

I have no idea what Rendering Engine Quantile means, but the value doesn't seem to influence statistics. eqiad averages 1.41 SVG rasterizations/s over the previous week and codfw 2.18, for a total of 3.59 rasterizations per second.

Say the wiki page selects either the SVG or PNG rendering based on size. Say it is a small SVG file and SVG is selected. Now somebody comes along and uploads a 20 MB SVG images on top of the original, small, SVG. That would mean all the pages that reference that SVG file need to be rebuilt even though the aspect ratio did not change. Alternatively, the fetch of the overweight SVG should be turned into a PNG fetch. Maybe page rebuilds are not expensive, but some SVG files are used on a lot of pages.

We already do this rebuild step anyways even though its not neccesary currently. (Except in the instant commons case)

I note that some of the problems (file size and translations) with Native SVG rendering are already 'recognized' by:

which would keep some of them on png by thumbor

The outstanding issues I see are:

  • CSP for the original files (stuck on operations figuring out what they want)
  • differences due to client side (font) rendering (could be tackled by adding a render flag to the image syntax, similar to the lossy keyword for tiff).
  • user testing and rollout (the most expensive part most likely. People should not forget that even though there are many errors, introducing changes generally introduces tons of NEW errors. Dealing and or waiting for people to know and accept the limitations of the new solution is VERY expensive and requires a lot of support for a considerable time)
  • wmf making people available to work on things like this

I have no idea what Rendering Engine Quantile means, but the value doesn't seem to influence statistics. eqiad averages 1.41 SVG rasterizations/s over the previous week and codfw 2.18, for a total of 3.59 rasterizations per second.

It's a variable (with 4 set values) and it is used in only a subset of the panels in that dashboard. It isn't used in the ones you linked to directly, but it is used in the ones under the "Engines" row and the "Swift thumb storage" row. You can tell by the fact that graph have a (pXX) value in the title. The title changes as you choose different quantiles.

The specifics vary a bit by panel, but at a high level it answers a question like "If in a given timeperiod we pick the response times of all requests and we linearly order them from lowest to highest, which value sits at the X% spot"? A different phrasing would be "What is the latency value by which X% of our requests will have completed by?"

Again at a very high level (nothing specific to Thumbor, although it applies), the reason behind this is because overall min/max/avg doesn't tell us much for operating a service. Minimum and Average response times can be great, but 5% of all requests (and thus users) might be having a horrible experience, with everything very slow for them. Max is also misleading cause it represents pathological cases that are often outside of our control. Quantiles allow us to see that experience and figure out ways to improve if needed.

What are the exact criteria to evaluate against, in order to get this ticket fixed? Currently this sounds unfixable due to vagueness.

This question remains unanswered more than a decade later, and given that the reporter's account (@MZMcBride) is disabled, it will likely remain so. The request/task is still very vague.

Reevaluating tools/libraries as new options emerge is an ongoing task, for which a task ticket makes little sense. An actionable/useful request would need somewhat concrete proposals. Having gone through all of this ticket's history, I see no serious proposals besides substitution with resvg and/or client-side rendering (T5593).

Does someone oppose either:

  • reframing this ticket as a request to consider reducing (possibly stopping) usage of librsvg by favoring resvg and/or client-side rendering?
  • or just marking this request as declined?

BTW, thanks for your replies @TheDJ and @akosiaris

I went ahead and proceeded to reframe this as a more specific request. I must clarify this does not mean I oppose marking this as a declined request. Or even as a processed request, since if this is an RFC as the title claims, this has already managed to gather its fair share of comments.

My main goal in this update/modification was precisely to summarize all of these comments and the current situation so that contributors who stumble on this ticket don't have to spend hours reading comments. I am far from an SVG expert and don't claim to have properly compiled all of the relevant information, so be welcome to improve the description if you think I neglected something important.

I'd like to use this opportunity to thank everyone who commented here, and in particular @RazrFalcon , @JoKalliauer, @Aklapper and @Isarra.🙏

Do we have access to data showing the average time (CPU) librsvg consumes for each request?

but resvg's loss of Yevhenii Reizner, who had been doing about 9/10 of the work from the beginning, is a serious threat to its future

Funnily enough, when I've started working on resvg around 2017, librsvg was dead/abandoned for more than a decade, which was one of the reasons behind resvg creation.
And like a year later, 2018ish, librsvg authors started a rewrite to Rust, which revitalized the library.

But yeah, there will be no commits from me in the near feature. Maybe ever.

is not in Debian (except for an excessively old version)

I've heard there is some work being done here. Who knows, maybe it will be in Debian one day. It doesn't depend on me.

does not claim to be ready (it remains at version 0 as of October 2024)

This version doesn't mean anything. It is as ready as it can be. In fact, almost all SVG 1 features are already implemented and we have a wast test suite to guarantee correctness and stability.
What this version means is that there are constant API breaks, but I doubt it will affect you.

Honestly, I would say that resvg, as a SVG 1 renderer, is finished. As for SVG 2 - no one supports it anyways. Some bits here and there, but it's still a minefield if you try using it across browsers. Heck, Safari doesn't even support mix-blend-mode, which is like one of the most basic SVG 2 features. No one supports new SVG 2 text layout as well. And so on. SVG 2 support would take forever, including browsers. It's basically Chrome or nothing.

As for serving SVG as is, I'm not a web-dev. I don't know anything about security risks and so on. All I can say is that SVG works best in Chrome. Meaning that if you think you can serve SVG as is and everyone would see the same "image" - this would not happen. Especially if text is involved. On the other hand, 90+% of the users probably use Chrome anyway.

but resvg's loss of Yevhenii Reizner, who had been doing about 9/10 of the work from the beginning, is a serious threat to its future

Funnily enough, when I've started working on resvg around 2017, librsvg was dead/abandoned for more than a decade, which was one of the reasons behind resvg creation.
And like a year later, 2018ish, librsvg authors started a rewrite to Rust, which revitalized the library.

That's an optimistic way to put it. I would see it as a sad coincidence, without which the right course of action would surely be obvious.😞 To be exact, librsvg was adopted in 2015 and the rewrite started in October 2016.

But yeah, there will be no commits from me in the near feature. Maybe ever.

Thank you very much for your highly valuable transparency

is not in Debian (except for an excessively old version)

I've heard there is some work being done here. Who knows, maybe it will be in Debian one day. It doesn't depend on me.

A request to fully remove it was filed a month ago. But thanks, I can see that Jonas Smedegaard, a veteran Debian developer, indeed adopted resvg a few weeks ago.😀

does not claim to be ready (it remains at version 0 as of October 2024)

This version doesn't mean anything. It is as ready as it can be. In fact, almost all SVG 1 features are already implemented and we have a wast test suite to guarantee correctness and stability.
What this version means is that there are constant API breaks, but I doubt it will affect you.

Honestly, I would say that resvg, as a SVG 1 renderer, is finished. As for SVG 2 - no one supports it anyways. Some bits here and there, but it's still a minefield if you try using it across browsers. Heck, Safari doesn't even support mix-blend-mode, which is like one of the most basic SVG 2 features. No one supports new SVG 2 text layout as well. And so on. SVG 2 support would take forever, including browsers. It's basically Chrome or nothing.

Thank you very much

As for serving SVG as is, I'm not a web-dev. I don't know anything about security risks and so on. All I can say is that SVG works best in Chrome. Meaning that if you think you can serve SVG as is and everyone would see the same "image" - this would not happen. Especially if text is involved. On the other hand, 90+% of the users probably use Chrome anyway.

According to StatCounter, the market share of non-Blink-based browsers is between 16 and 18% on the desktop. Less than 1% (total) of these are old versions deprecated in favor of Blink-based browsers, but I am not aware of any plans for Firefox of Safari to switch to Blink.

I also wonder if we're trying to do much. It's already evident we don't have enough engineering resource to move some of these tickets. If the goal is to serve SVG files to be rendered in the client, then we should probably stick to SVG 1 and if there are weird text issues, then it is incumbent on the uploader to fix - use a common font, convert text to shapes, etc. I'm just an punter not an employee, so maybe I'm wrong, but it seems to me that the overall scope is ultimately to provide vector diagrams to illustrate an encyplopædia. Maybe we don't need to get too fancy :) It sure would be nice to have SVG music snippets T49578 for example, some time this decade.