[go: up one dir, main page]

Page MenuHomePhabricator

FileRepo::getInfo() information is very different depending on the subclass in use for each repo
Open, LowPublic

Description

For example, ForeignAPIRepo includes articlepath and server while other foreign repos don't, and these URLs may be useful when interacting with the foreign repositories.

Event Timeline

XZise raised the priority of this task from to Needs Triage.
XZise updated the task description. (Show Details)
XZise changed Security from none to None.
XZise subscribed.

I see no evidence that any properties have been removed from meta=filerepoinfo. The 'articlepath' and 'server' properties are still present for ForeignAPIRepo repos, and do not appear to ever have been present for any other repo types.

See

http://mapping.referata.com/w/api.php?action=query&format=jsonfm&meta=filerepoinfo&friprop=name%7Cdisplayname|rootUrl|local|apiurl|articlepath|server

{
    "query": {
        "repos": [
            {
                "name": "shared",
                "displayname": "a shared repository",
                "rootUrl": "/w/images",
                "local": false,
                "apiurl": "http://commons.wikimedia.org/w/api.php",
                "articlepath": "/wiki/$1",
                "server": "//commons.wikimedia.org"
            },
            {
                "name": "local",
                "displayname": null,
                "rootUrl": "/w/images",
                "local": true
            }
        ]
    }
}

vs

http://en.wikipedia.org/w/api.php?action=query&meta=filerepoinfo&friprop=name%7Cdisplayname|rootUrl|local|apiurl|articlepath|server

{
    "warnings": {
        "filerepoinfo": {
            "*": "Unrecognized values for parameter 'friprop': apiurl, articlepath, server"
        },
        "query": {
            "*": "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."
        }
    },
    "query": {
        "repos": [
            {
                "name": "shared",
                "displayname": "Wikimedia Commons",
                "rootUrl": "//upload.wikimedia.org/wikipedia/commons",
                "local": false
            },
            {
                "name": "local",
                "displayname": "Wikipedia",
                "rootUrl": "//upload.wikimedia.org/wikipedia/en",
                "local": true
            }
        ]
    }
}

Maybe it is something special about WMF projects that is causing filerepoinfo to not work correctly?

Well articlepath and server are not recognized (as seen in @jayvdb's comment above). And apart from that it's hard to find a commonality. For example the rootUrl on the wikipedia server is //upload.wikimedia.org/wikipedia/commons which can be worked with, but on referata.com it's /w/images. And well another example is wikitravel which uses /upload/en.

"repos": [
    {
        "name": "shared",
        "displayname": "a shared repository",
        "rootUrl": "http://wikitravel.org/upload/shared/",
        "local": false
    },
    {
        "name": "wikimediacommons",
        "displayname": "'''[https://commons.wikimedia.org/wiki/Main_Page Wikimedia Commons]'''",
        "rootUrl": "/upload/en",
        "local": false,
        "apiurl": "http://commons.wikimedia.org/w/api.php",
        "articlepath": "/wiki/$1",
        "server": "//commons.wikimedia.org"
    },
    {
        "name": "local",
        "displayname": null,
        "rootUrl": "/upload/en",
        "local": true
    }
]

Then articlepath and server aren't always present on non-local repositories as seen in wikitravel's example for the repo “shared”. This is even worse: Except for rootUrl there is no destinctive feature which would allow a bot to identify it and detect whether another wiki is using it. The only other aren't unique too as seen in the examples here: the name “shared” is commons in wikipedia so not unique, displayname is “Wikimedia Commons” in enwp and “a shared repository” in referata).

Maybe it is something special about WMF projects that is causing filerepoinfo to not work correctly?

filerepoinfo is working correctly. The difference is that the referata.com site is using ForeignAPIRepo while WMF wikis use ForeignDBViaLBRepo, and the two classes supply different information.

For example the rootUrl on the wikipedia server is //upload.wikimedia.org/wikipedia/commons which can be worked with, but on referata.com it's /w/images. And well another example is wikitravel which uses /upload/en.

That depends on how the repo is configured on each wiki.

Then articlepath and server aren't always present on non-local repositories as seen in wikitravel's example for the repo “shared”. This is even worse:

Presumably the "shared" repo there is using ForeignDBRepo or ForeignDBViaLBRepo, while the "wikimediacommons" repo is using ForeignAPIRepo.

The only other aren't unique too as seen in the examples here: the name “shared” is commons in wikipedia so not unique, displayname is “Wikimedia Commons” in enwp and “a shared repository” in referata).

The name used also entirely depends on how the repos are configured on each wiki.

Instead of focusing on the fact that things are different without any context, you'd probably be better served by trying to work out which bits of information are possible to standardize across the different FileRepo subclasses based on the configuration options each receives.

Anomie renamed this task from Removal of 'articlepath' and 'server' from filerepoinfo to FileRepo::getInfo() information is very different depending on the subclass in use for each repo.Dec 23 2014, 2:52 PM
Anomie triaged this task as Low priority.
Anomie updated the task description. (Show Details)

Hmm okay from a bot standpoint it's pretty hard to determine the site a repo is actually using. And that would be helpful for example to determine a username and to get information about the file there.

I know it's probably hard because how would you identify such a site, but that would save quite a bit of manual managing on our side (as we don't need to save that 'shared' on en.wikipedia.org is actually 'commons').

So to rephrase the task in positive terms, you want to be able to retrieve an API endpoint for the site to which the file repo is local?

@Tgr, yes, that is about it.

pywikibot currently has two main site concepts: 'BaseSite' and subclass 'APISite'. We have a FilePage class, and instances may be local or shared files. As an absolute minimum, we need to be able to:

  1. determine if a FilePage is for a local or non-local file. and
  2. get the URL of the original file, or at least fetch the original file

However it would be much nicer if we could obtain an APISite object for the site which hosts the actual file and a FilePage. e.g. if the FilePage for a shared file needs to be edited, it likely needs to be edited on the shared host, not on the local site.

So ideally we have a method like FilePage(enwp, 'File:Logo.png').get_shared() that returns a FilePage(commons, 'File:Logo.png'). And we need to do the same for any foreign repo.

See T74847 - fileIsShared only works with Wikimedia and Wikitravel shared repository
and https://gerrit.wikimedia.org/r/#/c/181416/ - Get shared FilePage and make checks semi-dynamic

[...]As an absolute minimum, we need to be able to:

  1. determine if a FilePage is for a local or non-local file. and

[...]
and https://gerrit.wikimedia.org/r/#/c/181416/ - Get shared FilePage and make checks semi-dynamic

Well that is already possible (as seen in that patch): All repos return at least if they are local or not so that is dynamic.

The obvious problem is how to determine the site the file is on. Now on wikitravel the api returns server and articlepath which Pywikibot can already interpret because it needs to determine a site from the interwiki map. But of course there might be a better way.

Now one question I have is if it is possible to use shared file repositories which aren't actually wiki sites. In this case our site implementation doesn't understand those servers. There are all least many ways to add a remote file repository

Now one question I have is if it is possible to use shared file repositories which aren't actually wiki sites. In this case our site implementation doesn't understand those servers. There are all least many ways to add a remote file repository

Theoretically yes, although I'm not aware of any implementations. All it would require would be someone implementing FileRepo and File subclasses that access the backend site via whatever interface it provides, following the model of ForeignAPIRepo and ForeignAPIFile.

Theoretically yes, although I'm not aware of any implementations. All it would require would be someone implementing FileRepo and File subclasses that access the backend site via whatever interface it provides, following the model of ForeignAPIRepo and ForeignAPIFile.

In theory you could point a ForeignDBRepo to a DB + file backend which is not assigned to any wiki as a local repo, and for most purposes it would work (not sure if file description pages would), but in practice it is fairly pointless as you have no way of uploading files.

The wiki to which the repo is local could be private or have its API disabled though, that's perfectly legal; and from the point of view of the bot, it still raises the same problem.

Okay but for each repo which uses normal pages, doesn't the server need to know how to get to the file (aka //commons.wikimedia.org/wiki/File:$1)? So at least all repo classes which can do that should be able to use the the same property which can be retrieved via json.

Okay but for each repo which uses normal pages, doesn't the server need to know how to get to the file (aka //commons.wikimedia.org/wiki/File:$1)? So at least all repo classes which can do that should be able to use the the same property which can be retrieved via json.

For local repos, you can use articlepath from the siteinfo API. For API-based foreign repos, articlepath and serveris included in the filerepoinfo API output. For shared-DB foreign repos, you can use descBaseUrl.

See http://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&meta=filerepoinfo for an example which has all three kinds of repos and uses code from master.

It would be nicer to have a property which has the same name and semantics for all three, not sure if that's important enough to create duplicate properties though.

In T85153#947827, @Tgr wrote:

The wiki to which the repo is local could be private or have its API disabled though, that's perfectly legal; and from the point of view of the bot, it still raises the same problem.

From a client perspective it is a different problem. Our problem is how to identify the wiki hosting the file and its real wiki page. Once we are able to identify the real host, we know what further options are available. e.g. do we have configuration setting relevant for that wiki , such as usernames ?

If that api url is not accessible, or the user doesnt have relevant permissions, they will be shown a reasonable error about that.

In T85153#964947, @Tgr wrote:

For local repos, you can use articlepath from the siteinfo API. For API-based foreign repos, articlepath and serveris included in the filerepoinfo API output. For shared-DB foreign repos, you can use descBaseUrl.

See http://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&meta=filerepoinfo for an example which has all three kinds of repos and uses code from master.

It would be nicer to have a property which has the same name and semantics for all three, not sure if that's important enough to create duplicate properties though.

It looks like we could also use scriptDirUrl + '/api.php' for all foreign repos to obtain the API endpoint, which would be better as it doesnt require stripping 'File:' from the URL, which may be difficult if the File: was translated, or if the MW version was pre 1.14 and used Image:.