IA UploadComponent
ActivePublic
Watch Project

Members (3)

Uzume (Uzume)
User
Samwilson (Sam Wilson)
Software Engineer (Community Tech) & volunteer
Tpt (Tpt)
User
View All

Watchers (5)

Uzume (Uzume)
User
Hsarrazin (Hélène)
User
Inductiveload (Inductiveload)
User
Tpt (Tpt)
User
Samwilson (Sam Wilson)
Software Engineer (Community Tech) & volunteer
View All

Details

Source Repo: https://github.com/wikisource/ia-upload/

Description

All issues pertaining to Wikisource's IA Upload tool:

Tool homepage: https://wikitech.wikimedia.org/wiki/Tool:IA_Upload
Tool: https://ia-upload.wmcloud.org/
Source code: https://github.com/wikisource/ia-upload/

Recent Activity
View All

Thu, Nov 14

Uzume updated the task description for T336651: If Archive URL is collection of files it gets only 1 file.

Thu, Nov 14, 3:44 PM · Internet-Archive, IA Upload

Uzume added a comment to T336651: If Archive URL is collection of files it gets only 1 file.

I do not think we should be supporting Internet Archive collections, as per se, however, based upon the IA identifier this really about multiple scanned objects available at a single IA identifier. I am not sure we should support more than a single item at a time to Commons, however, we should provide a means to upload each scanned sub-object available at a single IA identifier.

Thu, Nov 14, 3:22 PM · Internet-Archive, IA Upload

Samwilson added a comment to T379402: Unable to retrieve IA metadata for any item.

Jason Scott has suggested that the IA could be blocking us, so I've emailed info@archive.org to see if there's anything that can be done. I wouldn't be surprised if the Toolforge IPs have been blocked, considering they must see somewhat higher traffic from them. It sounds like IA is still in recovery mode, so we should be patient.

Thu, Nov 14, 1:01 AM · IA Upload

Wed, Nov 13

Marnanel created T379803: IA Upload can't fetch metadata.

Wed, Nov 13, 6:53 PM · IA Upload

Uzume added a comment to T379402: Unable to retrieve IA metadata for any item.

In T379402#10318391, @Chlod wrote:

FWIW, one of my tools which relies on the Internet Archive also always times out. Perhaps the Internet Archive has temporarily(?) blocked some WMCS IPs following the outage?

Wed, Nov 13, 6:31 PM · IA Upload

Chlod added a comment to T379402: Unable to retrieve IA metadata for any item.

FWIW, one of my tools which relies on the Internet Archive also always times out. Perhaps the Internet Archive has temporarily(?) blocked some WMCS IPs following the outage?

Wed, Nov 13, 6:18 PM · IA Upload

Uzume added a comment to T379402: Unable to retrieve IA metadata for any item.

In T379402#10308031, @Samwilson wrote:
Sorry, ignore me! ia-upload is not running on Toolforge any more, it's got its own VPS. :-/

However, the issue is the same:
$ curl -I https://archive.org/details/20231002_20231002_0537?output=json
curl: (28) Failed to connect to archive.org port 443 after 129880 ms: Couldn't connect to server
$ curl -I https://archive.org/details/history-of-telegraphy-wa-3?output=json
curl: (28) Failed to connect to archive.org port 443 after 130678 ms: Couldn't connect to server

Wed, Nov 13, 5:45 PM · IA Upload

Uzume added a comment to T194861: DjVu construction from original scans (JP2) selects which pages to build incorrectly resulting in misintegration of djvu.xml based text layers.

T178197 seems to discuss programmatic methods to detect at least some (perhaps all?) of the issues mentioned here.

Wed, Nov 13, 4:58 PM · Internet-Archive, IA Upload

Uzume renamed T194861: DjVu construction from original scans (JP2) selects which pages to build incorrectly resulting in misintegration of djvu.xml based text layers from Text is offset by one page to DjVu construction from original scans (JP2) selects which pages to build incorrectly resulting in misintegration of djvu.xml based text layers.

Wed, Nov 13, 4:48 PM · Internet-Archive, IA Upload

Uzume added a comment to T178197: IA Uploader: random corrupted text structure into bult djvu files.

Wed, Nov 13, 4:08 PM · IA Upload

Uzume added a comment to T178197: IA Uploader: random corrupted text structure into bult djvu files.

Since this is related to the integration of the text layer using _djvu.xml, and seems to happen when there is a mismatch in the number of pages, this is likely related to T194861: DjVu construction from original scans (JP2) selects which pages to build incorrectly resulting in misintegration of djvu.xml based text layers (and the numerous other tickets merged/closed a duplicates of that).

Wed, Nov 13, 4:08 PM · IA Upload

Uzume added a comment to T379402: Unable to retrieve IA metadata for any item.

There haven't been any new uploads in over 30 days so you won't find anything in commons:Special:RecentChanges (e.g., the recent-uploads link at the top of the tool page).

Wed, Nov 13, 2:31 PM · IA Upload

Tue, Nov 12

Samwilson merged T379581: Upload failure into T379402: Unable to retrieve IA metadata for any item.

Tue, Nov 12, 1:07 AM · IA Upload

Samwilson merged task T379581: Upload failure into T379402: Unable to retrieve IA metadata for any item.

Tue, Nov 12, 1:06 AM · IA Upload

Mon, Nov 11

ShakespeareFan00 created T379581: Upload failure.

Mon, Nov 11, 8:47 PM · IA Upload

Samwilson merged T379512: failure to recognise Internet Archive identifier into T379402: Unable to retrieve IA metadata for any item.

Mon, Nov 11, 7:07 AM · IA Upload

Samwilson renamed T379402: Unable to retrieve IA metadata for any item from '20231002_20231002_0537' is not a valid Internet Archive identifier to Unable to retrieve IA metadata for any item.

Mon, Nov 11, 7:06 AM · IA Upload

Samwilson merged task T379512: failure to recognise Internet Archive identifier into T379402: Unable to retrieve IA metadata for any item.

Mon, Nov 11, 7:06 AM · IA Upload

HLHJ created T379512: failure to recognise Internet Archive identifier.

Mon, Nov 11, 2:54 AM · IA Upload

Samwilson added a comment to T379402: Unable to retrieve IA metadata for any item.

Sorry, ignore me! ia-upload is not running on Toolforge any more, it's got its own VPS. :-/

Mon, Nov 11, 1:49 AM · IA Upload

Samwilson added a comment to T370133: Upgrade PHP dependencies.

There was a little bit more to be done: https://github.com/wikisource/ia-upload/pull/63 (finished now I think).

Mon, Nov 11, 1:37 AM · IA Upload

Samwilson added a comment to T379402: Unable to retrieve IA metadata for any item.

Are any IA items working?

Mon, Nov 11, 1:13 AM · IA Upload

Fri, Nov 8

HareGovorittKrishna created T379402: Unable to retrieve IA metadata for any item.

Fri, Nov 8, 6:26 PM · IA Upload

Mon, Oct 28

Uzume placed T370133: Upgrade PHP dependencies up for grabs.

Mon, Oct 28, 7:51 PM · IA Upload

Uzume closed T370133: Upgrade PHP dependencies as Resolved.

@Samwilson: I am closing this as resolved based upon your aforementioned PR being merged on 2024-07-16:

Mon, Oct 28, 7:51 PM · IA Upload

Oct 16 2024

Pppery added a project to T314882: ia-upload not working -An error occurred: Client error: `GET resulted in a `404 Not Found` response:: IA Upload.

Oct 16 2024, 4:38 PM · IA Upload, Internet-Archive

Oct 4 2024

Arcorann added a comment to T375838: Uploading backlog stucked.

For Wikisource, use the DjVu option "from original scans (JP2)" instead. This is currently preferred to uploading as PDF due to the various issues mentioned by me in T363619.

Oct 4 2024, 2:00 AM · IA Upload

Sep 27 2024

Samwilson added a comment to T375838: Uploading backlog stucked.

Converting from PDF to DjVu is no longer supported sorry. We've not yet removed the option from the tool (that'll be done in T363619).

Sep 27 2024, 7:00 AM · IA Upload

AgusDamanik created T375838: Uploading backlog stucked.

Sep 27 2024, 2:24 AM · IA Upload

Aug 26 2024

Arcorann added a comment to T363619: Remove option for PDF → DjVu conversion (phetools).

On the comment "the original PDFs can be uploaded directly", currently there are enough issues with our handling of PDFs (notably bad text layer extraction -- see T242169 -- and bad thumbnail generation -- see e.g. T224355 and linked issues, also note the related issue T339845) that DjVu is still being recommended over PDF on enWS.

Aug 26 2024, 12:18 AM · IA Upload

Aug 13 2024

Yodin added a subtask for T367491: Extra pages after upload: T194861: DjVu construction from original scans (JP2) selects which pages to build incorrectly resulting in misintegration of djvu.xml based text layers.

Aug 13 2024, 7:17 PM · Internet-Archive, IA Upload

Yodin merged tasks T364778: IA Upload: embedded text is offset by one page when generating new DJVU from page scans, T300761: Text layer mismatch with page images on DjVu created from original scans (JP2) into T194861: DjVu construction from original scans (JP2) selects which pages to build incorrectly resulting in misintegration of djvu.xml based text layers.

Aug 13 2024, 7:17 PM · Internet-Archive, IA Upload

Yodin added a parent task for T194861: DjVu construction from original scans (JP2) selects which pages to build incorrectly resulting in misintegration of djvu.xml based text layers: T367491: Extra pages after upload.

Aug 13 2024, 7:17 PM · Internet-Archive, IA Upload

Yodin merged task T300761: Text layer mismatch with page images on DjVu created from original scans (JP2) into T194861: DjVu construction from original scans (JP2) selects which pages to build incorrectly resulting in misintegration of djvu.xml based text layers.

Aug 13 2024, 7:16 PM · IA Upload, Commons

Yodin merged task T364778: IA Upload: embedded text is offset by one page when generating new DJVU from page scans into T194861: DjVu construction from original scans (JP2) selects which pages to build incorrectly resulting in misintegration of djvu.xml based text layers.

Aug 13 2024, 7:15 PM · Internet-Archive, IA Upload

Yodin added a comment to T367491: Extra pages after upload.

This is caused by ia-upload adding pages that are marked as <addToAccessFormats>false</addToAccessFormats> in the scandata XML file, as mentioned by InductiveLoad in this comment.

Aug 13 2024, 7:13 PM · Internet-Archive, IA Upload

Aug 12 2024

Uzume added a comment to T331411: Allow uploading newer version of an existing file through IA upload.

But those are different files scanned from different sources. পথের পাঁচালী.djvu is not even originally from Internet Archive.

Aug 12 2024, 11:23 AM · Internet-Archive, IA Upload

Aug 11 2024

Uzume added a comment to T363619: Remove option for PDF → DjVu conversion (phetools).

I agree that in general there is little advantage to creating DjVus from PDFs but sometimes people prefer such formats. PDF technology has now subsumed most of the advantages DjVu previously had. Unfortunately this now means PDF is a very large and complex set of specifications and it is hard to know how any single PDF is constructed without analysis by digital tools.

Aug 11 2024, 12:29 PM · IA Upload

Uzume added a comment to T364445: DJVU file generated is apparently 0x0 pixels.

I do not believe this is an IA Upload issue as it is not specific to IA Upload nor to DjVu as it happens with PDFs. This is a common issues with Commons in general. The workaround it is to purge the file on Commons (and sometimes a null edit too) to reset its media metadata. Sometimes such things also have to be done on a local wiki that uses Commons too (e.g., on a Wikisource site, etc.)

Aug 11 2024, 12:05 PM · Internet-Archive, IA Upload

Uzume added a watcher for IA Upload: Uzume.

Aug 11 2024, 11:54 AM

Uzume added a member for IA Upload: Uzume.

Aug 11 2024, 11:54 AM

Uzume added a comment to T268246: IA Uploader fails to recognize the first page of a book.

I too am looking forward to scandata.xml addToAccessFormats page filtering. That would get rid of the irritating color card and white card pages often included at the end of many scans (but I have seen them in the middle of book scans too).

Aug 11 2024, 11:53 AM · IA Upload

Jul 22 2024

Samwilson closed T369881: Upgrade IA Upload VPS to Bookworm as Resolved.

Deleted ia-upload-prod.

Jul 22 2024, 11:33 PM · Community-Tech (Darwin's Fox (July 15-26, 2024)), IA Upload, Cloud-VPS (Debian Buster Deprecation)

Samwilson closed T367564: Cloud VPS "wikisource" project Buster deprecation as Resolved.

Deleted! Sorry for the delay.

Jul 22 2024, 11:32 PM · Community-Tech, IA Upload, Wikimedia OCR, Cloud-VPS (Debian Buster Deprecation)

Samwilson closed T369881: Upgrade IA Upload VPS to Bookworm, a subtask of T367564: Cloud VPS "wikisource" project Buster deprecation, as Resolved.

Jul 22 2024, 11:31 PM · Community-Tech, IA Upload, Wikimedia OCR, Cloud-VPS (Debian Buster Deprecation)

Andrew added a comment to T367564: Cloud VPS "wikisource" project Buster deprecation.

Please delete :)

Jul 22 2024, 5:49 PM · Community-Tech, IA Upload, Wikimedia OCR, Cloud-VPS (Debian Buster Deprecation)

Jul 18 2024

Samwilson added a comment to T367564: Cloud VPS "wikisource" project Buster deprecation.

Yep, nearly. As noted in T369881 I'm just waiting another day or so before deleting it. I had a couple of reports yesterday of things not going right with uploading to Commons, but probably everything's fine. It'll be gone by the end of the week.

Jul 18 2024, 12:12 AM · Community-Tech, IA Upload, Wikimedia OCR, Cloud-VPS (Debian Buster Deprecation)

Andrew added a comment to T367564: Cloud VPS "wikisource" project Buster deprecation.

The only remaining buster host in this project is ia-upload-prod.wikisource.eqiad1.wikimedia.cloud, which is currently shut down. Can it be deleted?

Jul 18 2024, 12:03 AM · Community-Tech, IA Upload, Wikimedia OCR, Cloud-VPS (Debian Buster Deprecation)