Wikipedia:Link rot/URL change requests
This page is for requesting modifications to URLs, such as marking dead or changing to a new domain. Some bots are designed to fix link rot; they can be notified here. These bots include InternetArchiveBot and WaybackMedic. This page can be monitored by bot operators from other language wikis since URL changes are universally applicable.
finlex.fi
[edit]This section is pinned and will not be automatically archived. |
Finlex.fi URLs aren't dead but for some reason InternetArchiveBot keeps adding archived URLs for them. This was brought up at meta:User_talk:InternetArchiveBot#Finlex.fi_URLs_aren't_dead a month ago: Bot's edits: [1], [2], [3]. Some URLs it tagged as dead but are actually working: [4], [5], [6].
Those finlex.fi URLs that now have both a working URL and an archive URL should be tagged with the |url-status=live
tag, and could someone try to tell IABot that Finlex is live? Thanks. 2001:14BA:9C94:9A00:E866:DADA:1085:E3D9 (talk) 09:28, 17 March 2024 (UTC)
- Just noticed that this same issue is being discussed at fi.wikipedia: fi:Wikipedia:Kahvihuone_(tekniikka)#Botti_hakee_arkistosta_kumottuja_lakeja 2001:14BA:9C94:9A00:E866:DADA:1085:E3D9 (talk) 09:41, 17 March 2024 (UTC)
- The site has a "Are you human?" check box (CloudFlare). This is causing the bot to think it's a dead site. I logged into iabot.org and changed the domain to "Subscription" status and that will cause the bot to avoid this domain, it won't set live or dead. My bot WaybackMedic has capabilities to bypass CloudFlare. I can try to process this domain and see what happens. My bot also has a feature "make live" ie. convert a citation from dead to live state. Unfortunately my bot only works on English Wikipedia. I'll let you know what happens. -- GreenC 15:13, 17 March 2024 (UTC)
- Unfortunately, this site has maximum security enabled, none of my tools can get through. It started happening in late January 2024. I don't know what to do because no bot is able to determine if a link is live or dead. And no archive service such as WaybackMachine is able to archive a page. Only humans can get through, and they need to solve a captcha. It might be worthwhile waiting to see if they relax security in the future, since this is a recent development. -- GreenC 00:40, 19 March 2024 (UTC)
- @GreenC: Before this section gets archived and if it's easy/fast to check, can you check if this is still the case, i.e. that the site still has the maximum security enabled and no tool/bot can get through? Thank you. 85.76.109.152 (talk) 06:21, 2 June 2024 (UTC)
- When going to [7] it still asks "Are you human?" with the CloudFlare security tag at the bottom. This is a feature of CloudFlare service, clients have the option to enable, it's the highest level of security. I'm not aware of a tool that can bypass. What I will do is set a reminder in 6 months to check again and post the results here. I use W-Ping which posts a reminder in the watchlist at whatever time in the future with a custom message. -- GreenC 16:06, 2 June 2024 (UTC)
- Still on CloudFlare. -- GreenC 03:21, 2 December 2024 (UTC)
- When going to [7] it still asks "Are you human?" with the CloudFlare security tag at the bottom. This is a feature of CloudFlare service, clients have the option to enable, it's the highest level of security. I'm not aware of a tool that can bypass. What I will do is set a reminder in 6 months to check again and post the results here. I use W-Ping which posts a reminder in the watchlist at whatever time in the future with a custom message. -- GreenC 16:06, 2 June 2024 (UTC)
- @GreenC: Before this section gets archived and if it's easy/fast to check, can you check if this is still the case, i.e. that the site still has the maximum security enabled and no tool/bot can get through? Thank you. 85.76.109.152 (talk) 06:21, 2 June 2024 (UTC)
- Unfortunately, this site has maximum security enabled, none of my tools can get through. It started happening in late January 2024. I don't know what to do because no bot is able to determine if a link is live or dead. And no archive service such as WaybackMachine is able to archive a page. Only humans can get through, and they need to solve a captcha. It might be worthwhile waiting to see if they relax security in the future, since this is a recent development. -- GreenC 00:40, 19 March 2024 (UTC)
- The site has a "Are you human?" check box (CloudFlare). This is causing the bot to think it's a dead site. I logged into iabot.org and changed the domain to "Subscription" status and that will cause the bot to avoid this domain, it won't set live or dead. My bot WaybackMedic has capabilities to bypass CloudFlare. I can try to process this domain and see what happens. My bot also has a feature "make live" ie. convert a citation from dead to live state. Unfortunately my bot only works on English Wikipedia. I'll let you know what happens. -- GreenC 15:13, 17 March 2024 (UTC)
Can this be run in Tewiki?
[edit]@User:GreenC, In Tewiki, we have more than 10,400 pages in the category CS1 errors: archive-url. Almost 99% of these are "timestamp mismatch" errors. Can you plesase run WaybackMedic_2.5 to correct the error in these pages? Thank you. __ Chaduvari (talk) 15:59, 31 July 2024 (UTC)
- Ahh. I'd like to, but I am not setup for other wikis very difficult. The CS1 error: archive-url is across most wikis. Let me think about it because it's a growing problem. It might be I can process, but only some English-language templates like
{{cite web}}
that use English-language parameters like|archive-url=
. GreenC 19:25, 31 July 2024 (UTC)- Hi GreenC, in tewiki, this template, like many others, use English parameters and templates only. This policy was kept to ensure future compatibility. Thanks. __ Chaduvari (talk) 09:29, 12 August 2024 (UTC)
- User:Chaduvari, I could try some tests for Telugu Wiki. Can you help me get bot flag permissions for User:GreenC bot? I don't know where to start to ask permission. -- GreenC 18:19, 12 August 2024 (UTC)
- @GreenC, you can raise the request at te:వికీపీడియా:Bot/Requests for approvals.__ Chaduvari (talk) 23:40, 12 August 2024 (UTC)
- I made a request for approval. -- GreenC 02:30, 13 August 2024 (UTC)
- User:Chaduvari, I have not forgotten about this. Have many other projects. Can you tell me what kinds of date formats might exist (date month year, periods or slashes etc) and what Teluga language months? Some examples. -- GreenC 16:56, 26 September 2024 (UTC)
- @GreenC, you have been quick in responding to our request. In fact, we delayed in giving the bot flag.
- The date formats confirm to those in enwiki. 2024-09-27 and 27 September 2024 are the most widely used ones. The month names are:
- @GreenC, you can raise the request at te:వికీపీడియా:Bot/Requests for approvals.__ Chaduvari (talk) 23:40, 12 August 2024 (UTC)
- User:Chaduvari, I could try some tests for Telugu Wiki. Can you help me get bot flag permissions for User:GreenC bot? I don't know where to start to ask permission. -- GreenC 18:19, 12 August 2024 (UTC)
- Hi GreenC, in tewiki, this template, like many others, use English parameters and templates only. This policy was kept to ensure future compatibility. Thanks. __ Chaduvari (talk) 09:29, 12 August 2024 (UTC)
January జనవరి February ఫిబ్రవరి March మార్చి April ఏప్రిల్ May మే June జూన్ July జూలై August ఆగస్టు September సెప్టెంబరు October అక్టోబరు November నవంబరు December డిసెంబరు
- Please look for ref: "Ayodhyaverdict" at page:te:అయోధ్య వివాదంపై 2019 సుప్రీంకోర్టు తీర్పు. The archive date was incorrect in this citation. In the error message, the given Suggestion has the month name in Telugu. (Please look for the text -"మత సామరస్యాన్ని కాపాడాలని ప్రధాన మంత్రి బహిరంగ అభ్యర్థన చేసారు." I am referring to the first citation [10] after this sentence).
- Thank you __ Chaduvari (talk) 00:26, 27 September 2024 (UTC)
- OK. I can't see the red error message in the Wikitext, but it should be possible to scrape it from the HTML. Will investigate. Thank you. -- GreenC 01:14, 27 September 2024 (UTC)
- The easiest way for me is to convert to ISO eg.
|archive-date=2024-09-24
. Most of the problems will probably be archive.today and webcitation.org (if any) so I would check every citation template with one of these archives and then reset the archive-date to ISO format, based on the value in the URL. -- GreenC 16:56, 26 September 2024 (UTC)
User:Chaduvari, the tracking category was reduced from 10,400 to 664 for a 94% reduction. The bot I wrote only fixes mismatches in dates. There are other types of errors tracked in that category that bot does not fix. For example citations with an |archive-date=
but no |archive-url=
(or other way around). Or citations with |archive-url=
but no |url=
. These are more complex to automatically fix. -- GreenC 04:03, 2 October 2024 (UTC)
- Wow! Fantastic! @GreenC, thanks for eliminating so many errors.
- Now that the errors are brought down by 94% (My estimate fell short by 5% :-)), we will take care of the |archive-url= and other errors manually.
- Thank you very much. __ Chaduvari (talk) 04:53, 2 October 2024 (UTC)
- In fact the number is brought down to 596! __ Chaduvari (talk) 04:54, 2 October 2024 (UTC)
- User:Chaduvari: You are welcome. It can run automatically, every month or so, to keep the category in check. If you see problems it missed, that it should have caught, let me know. -- GreenC 05:17, 2 October 2024 (UTC)
- Sure, GreenC ! Chaduvari (talk) 05:25, 2 October 2024 (UTC)
- OK it will run each month, on the 2nd day. -- GreenC 02:35, 3 October 2024 (UTC)
- Sure, GreenC ! Chaduvari (talk) 05:25, 2 October 2024 (UTC)
- User:Chaduvari: You are welcome. It can run automatically, every month or so, to keep the category in check. If you see problems it missed, that it should have caught, let me know. -- GreenC 05:17, 2 October 2024 (UTC)
- In fact the number is brought down to 596! __ Chaduvari (talk) 04:54, 2 October 2024 (UTC)
bcsportshalloffame.com
[edit]Hello. I was looking through the Judi list and saw that bcsportshalloffame.com is there. These links could be converted over to their new url at bcsportshall.com/honoured_member/ Here are examples:
- Generally, any individual inductee's url is the person's first and last name. Here is now there for Andrea Neil.
- Last names at birth are included in the urls. Here is now there for Claire Lovett.
- (M) is ignored for team inductees. For instance, here is now there for Mark Evans (rower).
Just over 100 links. If any don't convert over, let me know and I'll fix them. Thanks! MrLinkinPark333 (talk) 19:50, 16 August 2024 (UTC)
- Hmm.. I've never done something like this. It will require de-usurping, like parsing and removing
{{usurped}}
. It's an inevitable situation as old usurped domains are migrated to a new working domain. It's probably more complicated than it seems. Will take a look. -- GreenC 16:53, 17 August 2024 (UTC)
I was able to fix cites on 85 pages. The pages it was unable to edit:
Jack Whent Lorne Loomer Charles Edward Pratt 1934 Women's World Games Burnaby Lake Rowing Club Karen Magnussen Lillian Palmer (athlete) Greg Ion Joe Watson (ice hockey) Donald Arnold Archibald MacKinnon Richard McClure Hugh Fisher (canoeist) Sven Habermann Shea Weber Terry Fox 1934 Women's World Games Mary Frizzell
-- GreenC 02:42, 18 August 2024 (UTC)
- Not bad! It makes sense some of it didn't work as the reference titles for Karen Magnussen and Jack Whent were adjusted a bit. I'll tweak the remaining 13 later. Thanks! MrLinkinPark333 (talk) 02:56, 18 August 2024 (UTC)
- OK. Added five more. This search found them. -- GreenC 04:40, 18 August 2024 (UTC)
- I've updated the 18 that the bot couldn't fix. MrLinkinPark333 (talk) 22:56, 13 September 2024 (UTC)
- User:MrLinkinPark333, great! (just saw). -- GreenC 22:22, 18 November 2024 (UTC)
- I've updated the 18 that the bot couldn't fix. MrLinkinPark333 (talk) 22:56, 13 September 2024 (UTC)
- OK. Added five more. This search found them. -- GreenC 04:40, 18 August 2024 (UTC)
Done -- GreenC 14:31, 26 August 2024 (UTC)
timesonline.co.uk
[edit]Old URLs for The Times don't work. While some of these have new URLs at thetimes.com, they can't be easily converted . For example, this is now here for Adele. Unfortunately, I think all of these links and the subdomains (entertainment.timesonline.co.uk, business.timesonline.co.uk, etc.) will need archives. It might be easier to do the subdomains first. Some articles already have archived links added like at Premier League. 15,000+ articles altogether. Thank you! MrLinkinPark333 (talk) 19:34, 12 September 2024 (UTC)
This is a difficult project due to a large number of soft-404s within archives:
soft404 rules for archives
|
---|
if url ~ "(the)?(sunday)?(times(plus|online)?)[.]co[.]uk": if url ~ "login=false": return "Check 6.131" if url ~ "(the)?(sunday)?(times(plus|online)?)[.]co[.]uk/[st][to][lo]/[?]CMP=": return "Check 6.132" if url ~ "(the)?(sunday)?(times(plus|online)?)[.]co[.]uk/[st][to][lo]/news/?([?](token=null|id=[a-zA-Z0-9]{2,10}$))?": return "Check 6.137" if url ~ "(the)?(sunday)?(times(plus|online)?)[.]co[.]uk/[st][to][lo]/(news|news/world|tv-radio|business|travel|arts|arts/(film/reviews|tv-radio))/?$": return "Check 6.135" if url ~ "the-tls[.]co[.]uk/tls/?$": return "Check 6.136" gsubs("://", "__T__", url) if url ~ "//": return "Check 6.133" gsubs("__T__", "://", url) if url ~ "obituaries/?$": return "Check 6.134" |
..where "url" is the redirected URL the page was saved from, as indicated on the archive page ie. not the URL on wiki or the live redirect (if any).
Enwiki
- Checked 15,686 pages and edited 13,589 pages. Moved 275 links to a new URL. Resolved 20,115 soft-404s. Removed 4
{{dead link}}
. Added 6,721{{dead link}}
. Switched 28|url-status=dead
to live. Switched 1,736|url-status=live
to dead. Added 8,624 archive URLs (7,156 Wayback). Changed 593 citation metadata.
- Explanation: the bot analyzed about 20,000 URLs - all dead and presenting as soft-404. Of those, about 17,000 the bot added an archive URL, dead link template or switched url-status to dead. The other 3,000 are uncertain but probably already have an archive URL and url-status=dead ie. nothing to do. The large number 6,721
{{dead link}}
is unfortunate, it represents the problem noted above of archives containing soft-404. -- GreenC 19:21, 26 September 2024 (UTC)
- Explanation: the bot analyzed about 20,000 URLs - all dead and presenting as soft-404. Of those, about 17,000 the bot added an archive URL, dead link template or switched url-status to dead. The other 3,000 are uncertain but probably already have an archive URL and url-status=dead ie. nothing to do. The large number 6,721
- That's too bad with the large about of dead links. If the new URLs were easy to convert, we could have swapped them over. Thank you for working on this! MrLinkinPark333 (talk) 19:25, 26 September 2024 (UTC)
- Yeah this domain needed help because it was marked "Subscription" in the IABot DB (ie. skip processing), so most of them were dead with no archives. Normally I would "done" at this point, but I want to try a new experimental method for finding the live URL (it has a low probability of success) - I won't be able to start until next week. -- GreenC 13:23, 27 September 2024 (UTC)
- Experimental method not working. -- GreenC 16:25, 30 September 2024 (UTC)
- Yeah this domain needed help because it was marked "Subscription" in the IABot DB (ie. skip processing), so most of them were dead with no archives. Normally I would "done" at this point, but I want to try a new experimental method for finding the live URL (it has a low probability of success) - I won't be able to start until next week. -- GreenC 13:23, 27 September 2024 (UTC)
IABot DB
- Checked and edited about 28,000 links which will propagate to 300+ wikis
Done -- GreenC 16:25, 30 September 2024 (UTC)
foxnews.com/story
[edit]Old URLs for foxnews.com with numeric IDs either redirect to new URLs, redirect to the wrong page or are broken. Working URLs are mainly at www.foxnews.com/story/article-name
- Redirects
- Working redirects: this goes here for Molly Henneberg.
- Soft redirects (redirects to /category/): this soft redirects to here. The working URL for this Timeline of the 2003 invasion of Iraq article is here.
- Redirects to 404 pages: this link doesn't work for British debate over veils. The working URL is here.
- Redirects to wrong pages: I found redirects pointing to the wrong article at Adam Levine (press aide), Catherine Herridge and Wood Green ricin plot. They do not follow the foxnews.com/story/article-name format. They instead point to articles at /entertainment/, media/ and /science/. Since these are wrong, I recommend the following URL changes below.
- URL Changes:
- With the above links, the numeric value is changed to the article title. Any punctuation marks are removed from the URL and all letters are lowercase.
- For redirects that do not point to articles using /story/, I request trying to convert them using /story/article-name first. If that doesn't work, then I recommend archive URLs.
~3,200 articles.
Thank you! MrLinkinPark333 (talk) 20:48, 12 September 2024 (UTC)
Enwiki
- Checked 3,248 pages and edited 2,346 pages. Moved 2,601 links to a new URL. Resolved 66 ghost redirects. Resolved 233 soft-404s. Removed 4
{{dead link}}
. Added 6{{dead link}}
. Switched 900|url-status=dead
to live. Switched 10|url-status=live
to dead. Added 240 archive URLs (198 Wayback). Changed 175 citation metadata.
- Analysis: converted about 3,500 to live URLs per the above rules (2,601 + 900). Another 250 or so added archive URLs. -- GreenC 18:07, 30 September 2024 (UTC)
- Not bad at all! How successful were fixing the redirects to wrong pages? MrLinkinPark333 (talk) 18:10, 30 September 2024 (UTC)
- It seems successful. A spot check of Disappearance of Natalee Holloway saw some. -- GreenC 21:25, 30 September 2024 (UTC)
IABot DB
- Checked and updated about 5,700 links that propagate to 300+ wikis.
Done -- GreenC 04:25, 2 October 2024 (UTC)
location.teamname.mlb.com
[edit]Each of the 30 MLB teams has a dead subdomain of the form <location>.<teamname>.mlb.com that should be archived, for example losangeles.angels.mlb.com. These now redirect to sites of the form mlb.com/<teamname>, and all content in the subdomains seems to be dead.
I combined the searches into 6 batches of 5 teams each, as combining all teams into one regex expression timed out the search and I didn't want to individually list the results for all 30 teams. I hope it isn't too difficult to process 30 different subdomains?
(Also, for some reason the searches counted a few pages where the text happened to contain <teamname>|mlb.com instead of <teamname>.mlb.com.)
- (a regex "." means match any character thus it matched on "|" or whatever character; to search on a literal dot use "[.]" or "\." to escape the regex meaning of dot) -- GreenC 00:18, 3 October 2024 (UTC)
diamondbacks, braves, orioles, redsox, cubs: 1,305 pages.
whitesox, reds, indians, rockies, tigers: 1,181 pages.
astros, royals, angels, dodgers, marlins: 1,134 pages.
brewers, twins, mets, yankees, athletics: 1,118 pages.
phillies, pirates, padres, giants, mariners: 1,304 pages.
cardinals, rays, devilrays (both are subdomains for the same team), rangers, bluejays, nationals: 1,260 pages. Helpful Raccoon (talk) 05:16, 14 September 2024 (UTC)
- Should be OK to combine into a single project since they use the same root domain, problems like soft-404s will be the same. Thanks for creating the separate searches. I saw one for "m.cubs.mlb.com" which is the mobile link for the Cubs. It is a soft-404, so looks like "*.cubs.mlb.com" need to be checked. -- GreenC 15:54, 14 September 2024 (UTC)
Enwiki
- Checked 5,505 pages and edited 4,080 pages. Moved 4 links to a new URL. Added 4,124
{{dead link}}
. Switched 1,160|url-status=live
to dead. Added 5,495 archive URLs (5,431 Wayback). Changed 721 citation metadata.
- Comment: high number of
{{dead link}}
-- GreenC 21:27, 3 October 2024 (UTC)- Looks like WaybackMachine performance has been poor creating timeouts resulting in false negatives thus the high number of
{{dead link}}
. I am beginning to reprocessing those at a slower pace. -- GreenC 15:35, 5 October 2024 (UTC)
- Looks like WaybackMachine performance has been poor creating timeouts resulting in false negatives thus the high number of
- Comment: high number of
- Round 2: Checked 1,921 pages and edited 1,426 pages. Added 2,388 archive URLs (2,388 Wayback).
- Reprocessed the "Added 4,124
{{dead link}}
" from above, due to Wayback Machine timeouts. Converted 2,388{{dead link}}
to archive URLs. -- GreenC 17:59, 6 October 2024 (UTC)
- Reprocessed the "Added 4,124
IABot DB
- Checked and updated about 30,000 links which propagate to 300+ wikis
Done -- GreenC 14:14, 8 October 2024 (UTC)
dnd.wizards.com
[edit]https://dnd.wizards.com
now mostly redirects to https://www.dndbeyond.com
; website was used as a primary source for various D&D articles. It looks like links that start with https://dnd.wizards.com/news/, https://dnd.wizards.com/articles/, https://dnd.wizards.com/dndstudioblog, https://dnd.wizards.com/dungeons-and-dragons, etc
redirect to the D&D Beyond home page or change log. Some (like https://dnd.wizards.com/products/
) redirect to similar pages on D&D Beyond but the D&D Beyond page often contains less information (such as not having the ISBN, author credits or other production info) so I think the whole lot should be marked as dead. Thanks! Sariel Xilo (talk) 22:29, 20 September 2024 (UTC)
159 pages -- GreenC 04:01, 21 September 2024 (UTC)
Enwiki
- Checked 172 pages and edited 150 pages. Added 3
{{dead link}}
. Switched 65|url-status=live
to dead. Added 169 archive URLs (159 Wayback). Changed 413 citation metadata.
IABot DB
- Checked and fixed about 500 links which propagate to 300+ wikis
Done -- GreenC 01:37, 7 October 2024 (UTC)
Some Vietnamese newspapers
[edit]RFI Vietnamese, VTC News and Zing News changed their domain names:
- vi.rfi.fr and viet.rfi.fr -> rfi.fr/vi
- vtc.vn -> vtcnews.vn
- news.zing.vn and zingnews.vn -> znews.vn
Billboard Vietnam website (billboardvn.vn) has been closed. Cherry Cotton Candy (talk) 09:05, 22 September 2024 (UTC)
vi.rfi.fr
[edit]12 pages — Preceding unsigned comment added by GreenC (talk • contribs)
- Tried this to that it doesn't work. -- GreenC 01:41, 7 October 2024 (UTC)
- @GreenC Can you skip the above link and continue with the others? For example, http://vi.rfi.fr/viet-nam/20191111-nhung-nguoi-linh-viet-nam-hy-sinh-vi-nuoc-phap-trong-the-chien-i -> https://www.rfi.fr/vi/viet-nam/20191111-nhung-nguoi-linh-viet-nam-hy-sinh-vi-nuoc-phap-trong-the-chien-i Cherry Cotton Candy (talk) 13:09, 7 October 2024 (UTC)
- Cherry, there are only 12. Could you do this manually? It will be less work than me programming the bot and working through the issues. -- GreenC 15:31, 7 October 2024 (UTC)
vtc.vn
[edit]197 pages — Preceding unsigned comment added by GreenC (talk • contribs)
- Checked 198 pages and edited 184 pages. Moved 248 links to a new URL. Added 2
{{dead link}}
. Switched 3|url-status=dead
to live. Switched 2|url-status=live
to dead. Added 15 archive URLs (11 Wayback). -- GreenC 04:23, 7 October 2024 (UTC)
- Checked and fixed about 2,000 URLs in the IABot DB which will propagate to 300+ wikis. -- GreenC 21:12, 7 October 2024 (UTC)
zingnews.vn
[edit]246 pages — Preceding unsigned comment added by GreenC (talk • contribs)
- Checked 246 pages and edited 244 pages. Moved 478 links to a new URL. Removed 2
{{dead link}}
. Switched 113|url-status=dead
to live. Added 9 archive URLs (4 Wayback). -- GreenC 21:08, 7 October 2024 (UTC)
- Checked and fixed about 2,600 links in the IABot DB which propagate to 300+ wikis -- GreenC 01:42, 8 October 2024 (UTC)
billboardvn.vn and thanhniennews.com
[edit]Billboard 130 pages — Preceding unsigned comment added by GreenC (talk • contribs)
Thanhniennews 261 pages. These websites have been closed. Cherry Cotton Candy (talk) 03:59, 23 September 2024 (UTC)
- Checked 389 pages and edited 261 pages. Added 61
{{dead link}}
. Switched 92|url-status=live
to dead. Added 178 archive URLs (139 Wayback). -- GreenC 16:25, 8 October 2024 (UTC) - Checked and fixed about 1,000 URLs in the IABot DB which propagate to 300+ wikis -- GreenC 18:18, 8 October 2024 (UTC)
tuoitre.com.vn
[edit]41 pages. Some articles can be found manually on tuoitre.vn, for example:
- http://www.tuoitre.com.vn/Tianyon/Index.aspx?ArticleID=159942 -> https://tuoitre.vn/ky-nam-va-tram-huong-159942.htm
- http://bauchon.tuoitre.com.vn/Tianyon/Index.aspx?ArticleID=226165&ChannelID=458 -> https://tuoitre.vn/bao-ton-da-dang-sinh-hoc-vinh-ha-long-226165.htm
- http://www.tuoitre.com.vn/Tianyon/Index.aspx?ArticleID=14965&ChannelID=124 -> https://cuoituan.tuoitre.vn/thap-da-vinh-nghiem-14965.htm
Cherry Cotton Candy (talk) 03:59, 23 September 2024 (UTC)
- Unable to do by bot. -- GreenC 23:58, 7 October 2024 (UTC)
thanhnien.com.vn
[edit]124 pages. Some articles can be found manually on thanhnien.vn, for example:
- http://www.thanhnien.com.vn/van-hoa-nghe-thuat/tri-thuc-cung-nghe-nhac-vang-181742.html -> https://thanhnien.vn/tri-thuc-cung-nghe-nhac-vang-185171992.htm
- http://www.thanhnien.com.vn/pages/20120819/tu-bat-pho-khong-nguoi-lai.aspx -> https://thanhnien.vn/tu-bat-pho-khong-nguoi-lai-185406399.htm
- http://www.thanhnien.com.vn/Pages/20110626/Phim-Hollywood-thang-lon-tai-Viet-Nam.aspx -> https://thanhnien.vn/phim-hollywood-thang-lon-tai-viet-nam-185293878.htm
- http://web.thanhnien.com.vn/Khoahoc/2005/9/6/121442.tno -> https://thanhnien.vn/nhung-phat-hien-chan-dong-ve-phong-nha-ke-bang-185148509.htm
- http://www.thanhnien.com.vn/News/Pages/201006/20100204224010.aspx -> https://thanhnien.vn/khanh-hoa-khong-to-chuc-miss-world-2010-185296179.htm
Cherry Cotton Candy (talk) 03:59, 23 September 2024 (UTC)
- Unable by bot. -- GreenC 23:58, 7 October 2024 (UTC)
laodong.com.vn
[edit]49 pages. Few articles can be found manually on laodong.vn, for example:
- http://tamlongvang.laodong.com.vn/the-gioi/banh-mi-viet-nam-va-hanh-trinh-chinh-phuc-ca-the-gioi-591142.bld -> https://laodong.vn/an/banh-mi-viet-nam-va-hanh-trinh-chinh-phuc-ca-the-gioi-515042.ldo
- http://laodong.com.vn/giai-tri/lan-khue-thu-suc-lam-mc-365442.bld -> https://laodong.vn/archived/lan-khue-thu-suc-lam-mc-693704.ldo
Cherry Cotton Candy (talk) 03:59, 23 September 2024 (UTC)
- Unable by bot. -- GreenC 23:58, 7 October 2024 (UTC)
Done -- GreenC 18:19, 8 October 2024 (UTC)
aviation-safety.net
[edit]These (currently) 299 results ought to have "/operator/airline.php?var=" replaced by "/operators/". Updating the redirected domain "aviation-safety.net" to "asn.flightsafety.org" could be done along the way as well. 1234qwer1234qwer4 16:02, 24 September 2024 (UTC)
- User:1234qwer1234qwer4, given http://aviation-safety.net/database/operator/airline.php?var=6345 can you tell me the new URL? -- GreenC 16:07, 24 September 2024 (UTC)
- http://aviation-safety.net/database/operators/6345 works, though it is a redirect to https://asn.flightsafety.org/database/operators/6345. 1234qwer1234qwer4 16:13, 24 September 2024 (UTC)
Enwiki
- Checked 298 pages and edited 298 pages. Moved 1,073 links to a new URL. Resolved 8 ghost redirects. Switched 7
|url-status=dead
to live. Switched 2|url-status=live
to dead. Added 22 archive URLs (21 Wayback).
IABot DB
- Checked and fixed about 800 links which propagate across 300+ wikis.
Done -- GreenC 22:52, 8 October 2024 (UTC)
planespotters.net
[edit]260 pages that should have "planespotters.net/Airline/" changed to "planespotters.net/airline/". 1234qwer1234qwer4 17:16, 24 September 2024 (UTC)
- Checked 241 pages and edited 231 pages. Moved 251 links to a new URL. Removed 1
{{dead link}}
. Added 1{{dead link}}
. Switched 99|url-status=dead
to live. Added 22 archive URLs (13 Wayback).
Done -- GreenC 23:13, 8 October 2024 (UTC)
articles.newspaper.com
[edit]Newspapers that follow the same process as .com_subdomains_for_Tribune_Publishing_sites">Tribune Publishing. Only migrate if new links are not behind paywall, otherwise archive. -- GreenC 18:01, 24 September 2024 (UTC)
articles.baltimoresun.com
[edit]- Enwiki
- Checked 4,150 pages and edited 3,977 pages. Converted 1 templates. Moved 4,833 links to a new URL. Resolved 51 ghost redirects. Resolved 11 soft-404s. Removed 3
{{dead link}}
. Added 64{{dead link}}
. Switched 791|url-status=dead
to live. Switched 39|url-status=live
to dead. Added 688 archive URLs (494 Wayback). Changed 50 citation metadata.
- Checked 4,150 pages and edited 3,977 pages. Converted 1 templates. Moved 4,833 links to a new URL. Resolved 51 ghost redirects. Resolved 11 soft-404s. Removed 3
- IABot DB
- Checked and updated about 8,000 URLs which propagate to 300+ wikis
- Done -- GreenC 03:35, 2 November 2024 (UTC)
articles.timesofindia.indiatimes.com
[edit]- Enwiki
- Pass 1: Checked 9,455 pages and edited 1,877 pages. Moved 1,745 links to a new URL. Resolved 1,663 ghost redirects. Resolved 10 soft-404s. Removed 10
{{dead link}}
. Added 133{{dead link}}
. Switched 1,621|url-status=dead
to live. Switched 80|url-status=live
to dead. Added 92 archive URLs (92 Wayback). Changed 334 citation metadata.
- Pass 1: Checked 9,455 pages and edited 1,877 pages. Moved 1,745 links to a new URL. Resolved 1,663 ghost redirects. Resolved 10 soft-404s. Removed 10
- Pass 2 after improvements to Ghost redirect code: Checked 9,455 pages and edited 3,488 pages. Moved 5,186 links to a new URL. Discovered 5,126 ghost redirects. Resolved 229 soft-404s. Removed 7
{{dead link}}
. Switched 5,061|url-status=dead
to live. Switched 1|url-status=live
to dead. Added 16 archive URLs (0 Wayback). Changed 1 citation metadata. -- GreenC 18:24, 15 October 2024 (UTC)
- Pass 2 after improvements to Ghost redirect code: Checked 9,455 pages and edited 3,488 pages. Moved 5,186 links to a new URL. Discovered 5,126 ghost redirects. Resolved 229 soft-404s. Removed 7
- Pass 3: Checked 9,455 pages and edited 1,359 pages. Moved 1,839 links to a new URL. Discovered 1,780 ghost redirects. Removed 2
{{dead link}}
. Switched 1,762|url-status=dead
to live. Added 4 archive URLs (0 Wayback).
- Pass 3: Checked 9,455 pages and edited 1,359 pages. Moved 1,839 links to a new URL. Discovered 1,780 ghost redirects. Removed 2
- Pass 4: Checked 9,455 pages and edited 276 pages. Moved 378 links to a new URL. Discovered 331 ghost redirects. Removed 1
{{dead link}}
. Switched 323|url-status=dead
to live. Added 7 archive URLs (0 Wayback).
- Pass 4: Checked 9,455 pages and edited 276 pages. Moved 378 links to a new URL. Discovered 331 ghost redirects. Removed 1
- IABot DB
- Checked and updated
- Done -- GreenC 18:31, 4 November 2024 (UTC)
articles.economictimes.indiatimes.com
[edit]- Enwiki
- Checked 2,654 pages and edited 2,422 pages. Moved 86 links to a new URL. Discovered 3 ghost redirects. Added 115
{{dead link}}
. Switched 587|url-status=live
to dead. Added 2,505 archive URLs (2,237 Wayback). Changed 23 citation metadata.
- Checked 2,654 pages and edited 2,422 pages. Moved 86 links to a new URL. Discovered 3 ghost redirects. Added 115
- IABot DB
- Checked and updated
- Done -- GreenC 18:31, 4 November 2024 (UTC)
articles.philly.com
[edit]- Enwiki
- Checked 4,702 pages and edited 4,055 pages. Resolved 2,032 soft-404s. Added 75
{{dead link}}
. Switched 550|url-status=live
to dead. Added 5,215 archive URLs (4,721 Wayback). Changed 160 citation metadata.
- Analysis: the URLs were unable to convert via the
|title=
method, like elsewhere with the other articles.* domains. It has ghost redirects, but they are all soft-404s pointing to the home page. Last option was archive URLs, which it was mostly able, except for 75{{dead link}}
.
- Analysis: the URLs were unable to convert via the
- Checked 4,702 pages and edited 4,055 pages. Resolved 2,032 soft-404s. Added 75
- IABot DB
- Checked and updated
- Done -- GreenC 18:31, 4 November 2024 (UTC)
articles.sfgate.com
[edit]1,300 pages GreenC 18:01, 24 September 2024 (UTC)
- Enwiki
- Pass 1: Checked 1,365 pages and edited 1,328 pages. Moved 1,236 links to a new URL. Discovered 1,236 ghost redirects. Resolved 4 soft-404s. Removed 76
{{dead link}}
. Added 26{{dead link}}
. Switched 102|url-status=dead
to live. Switched 23|url-status=live
to dead. Added 291 archive URLs (122 Wayback). Changed 7 citation metadata. - Pass 2: Checked 1,365 pages and edited 241 pages. Moved 266 links to a new URL. Discovered 266 ghost redirects. Removed 25
{{dead link}}
. Switched 234|url-status=dead
to live. Added 1 archive URLs (1 Wayback).
- Pass 1: Checked 1,365 pages and edited 1,328 pages. Moved 1,236 links to a new URL. Discovered 1,236 ghost redirects. Resolved 4 soft-404s. Removed 76
- IABot DB
- Checked and updated
- Done -- GreenC 18:31, 4 November 2024 (UTC)
articles.washingtonpost.com
[edit]- Enwiki
- Pass 1: Checked 774 pages and edited 444 pages. Moved 426 links to a new URL. Discovered 426 ghost redirects. Removed 13
{{dead link}}
. Added 28{{dead link}}
. Switched 396|url-status=dead
to live. Switched 1|url-status=live
to dead. Added 12 archive URLs (4 Wayback). Changed 8 citation metadata. - Pass 2: Checked 774 pages and edited 141 pages. Moved 126 links to a new URL. Discovered 126 ghost redirects. Switched 124
|url-status=dead
to live.
- Pass 1: Checked 774 pages and edited 444 pages. Moved 426 links to a new URL. Discovered 426 ghost redirects. Removed 13
- IABot DB
- Checked and updated
- Done -- GreenC 18:31, 4 November 2024 (UTC)
articles.boston.com
[edit]- Enwiki
- Pass 1: Checked 622 pages and edited 263 pages. Moved 103 links to a new URL. Discovered 103 ghost redirects. Removed 1
{{dead link}}
. Added 63{{dead link}}
. Switched 76|url-status=dead
to live. Switched 6|url-status=live
to dead. Added 101 archive URLs (50 Wayback). Changed 7 citation metadata. - Pass 2: Checked 622 pages and edited 26 pages. Moved 26 links to a new URL. Discovered 26 ghost redirects. Switched 24
|url-status=dead
to live.
- Pass 1: Checked 622 pages and edited 263 pages. Moved 103 links to a new URL. Discovered 103 ghost redirects. Removed 1
- IABot DB
- Checked and updated
- Done -- GreenC 18:31, 4 November 2024 (UTC)
articles.herald-mail.com
[edit]- Enwiki
- Checked 123 pages and edited 75 pages. Moved 4 links to a new URL. Added 4
{{dead link}}
. Switched 2|url-status=live
to dead. Added 88 archive URLs (78 Wayback).
- Checked 123 pages and edited 75 pages. Moved 4 links to a new URL. Added 4
- IABot DB
- Checked and updated
- Done -- GreenC 18:31, 4 November 2024 (UTC)
articles.businessinsider.com
[edit]"has a paywall"
- Enwiki
- Checked 133 pages and edited 23 pages. Added 1
{{dead link}}
. Switched 1|url-status=live
to dead. Added 18 archive URLs (9 Wayback). Changed 1 citation metadata.
- Checked 133 pages and edited 23 pages. Added 1
- IABot DB
- Checked and updated
- Done -- GreenC 18:31, 4 November 2024 (UTC)
articles.dailypilot.com
[edit]"redirect to latimes and don't appear to have easy conversion rules"
- Enwiki
- Checked 109 pages and edited 88 pages. Added 2
{{dead link}}
. Switched 10|url-status=live
to dead. Added 113 archive URLs (110 Wayback).
- Checked 109 pages and edited 88 pages. Added 2
- IABot DB
- Checked and updated
- Done -- GreenC 18:31, 4 November 2024 (UTC)
singapore-elections.com
[edit]website is dead. hostile takeover by the usual.. casino suspects. – robertsky (talk) 02:55, 26 September 2024 (UTC)
Done in WP:JUDI batch #19 -- GreenC 17:56, 5 November 2024 (UTC)
ittiofauna.org
[edit]Has been WP:JUDI usurped by a Thai site. Redirects to Gbo5000 - Mainkan Slot Gacor Server Thailand Resmi (dacres.org)
It used to contain photos of European fish, and there are ~18 occurrences in the articlespace. Big Blue Cray(fish) Twins (talk) 12:51, 27 September 2024 (UTC)
Done in WP:JUDI batch #19 -- GreenC 17:56, 5 November 2024 (UTC)
usemod.com
[edit]Domain is usurped, new domain is usemod.org. Paths should work the same. * Pppery * it has begun... 04:44, 30 September 2024 (UTC)
3 pages. I edited them manually.
Done -- GreenC 18:03, 5 November 2024 (UTC)
ctv.ca
[edit]Hello. Old CTV links don't work anymore. I did not find any that were now at ctvnews.ca, Therefore, I request archives for these links only. ~1500 articles. Some of these already have archives added in the article. Thanks! MrLinkinPark333 (talk) 22:47, 30 September 2024 (UTC)
- I did this domain in 2021. Developed code to move links to ctvnews.ca .. example diff: Special:Diff/1029587503/1033596569 .. converted about 1,000 links. But that code won't work anymore as the redirect information no longer exists. Currently there are 1,847 pages. I'll try to find ghost redirects otherwise convert to archive. -- GreenC 23:45, 5 November 2024 (UTC)
- That's too bad that the conversion doesn't work anymore. Hopefully some more can be changed over if possible. MrLinkinPark333 (talk) 23:50, 5 November 2024 (UTC)
- Enwiki
- Checked 1,854 pages and edited 459 pages. Moved 2 links to a new URL. Resolved 6 soft-404s. Switched 4
|url-status=live
to dead. Added 133 archive URLs (116 Wayback). Changed 184 citation metadata.- Analysis - It found only 2 conversions. The 133 archive URLs might be links that were missed in 2021 due to improvements in code, new archives at the service provider, or new links (re)added to Wikipedia since 2021. The citation metadata is a new feature not available in 2021. Overall, it looks about expected. -- GreenC 16:42, 6 November 2024 (UTC)
- Checked 1,854 pages and edited 459 pages. Moved 2 links to a new URL. Resolved 6 soft-404s. Switched 4
- IABot DB
- Previously done and for any stragglers I changed the status to permadead in the DB
Done -- GreenC 16:42, 6 November 2024 (UTC)
citynews.ca
[edit]Hello again. citynews.ca links are mostly redirecting to toronto.citynews.ca:
- Links with dates: this is now here for The Christmas Shoes (song).
- Links without dates: Other article links need to be converted to toronto.citynews.ca/year/month/day/name-of-article/ - For example this is now here for First Canadian Place. Unfortunately, the old URL does not have the date already listed, so it either has to be extracted from the citation or archived copy. Any punctuation marks are removed
~290 links. If any of these new links do not work, it is possible that it's under a different subdomain like calgary.citynews.ca. As Toronto is the main domain, it might be easier to test if they convert to toronto.citynews.ca, then archive the ones that don't work. Please let me know if any of these don't convert to new links. Thanks! MrLinkinPark333 (talk) 23:28, 30 September 2024 (UTC)
- MrLinkinPark333: Let me know what you want to do with the below 37. If not too complicated. At some point it's easier to fix small numbers by hand. Overall it found most of them successfully Special:Diff/1255156660/1255809021. It scraped the title from the citation
|title=
, and scraped the date from the archive URL page content; reformatted and assembled into a new URL. -- GreenC 20:02, 6 November 2024 (UTC)- Were any of them converted to calgary? I used that as an example as there are 9 subdomains. However, I think that the rest of them would be at toronto and just need manually converting. Let me know if that's the case, and I'll swap the rest manually. MrLinkinPark333 (talk) 20:47, 6 November 2024 (UTC)
- Converted only 1 calgary Special:Diff/1255149041/1255811790 -- GreenC 21:08, 6 November 2024 (UTC)
- Were any of them converted to calgary? I used that as an example as there are 9 subdomains. However, I think that the rest of them would be at toronto and just need manually converting. Let me know if that's the case, and I'll swap the rest manually. MrLinkinPark333 (talk) 20:47, 6 November 2024 (UTC)
- MrLinkinPark333: Let me know what you want to do with the below 37. If not too complicated. At some point it's easier to fix small numbers by hand. Overall it found most of them successfully Special:Diff/1255156660/1255809021. It scraped the title from the citation
Enwiki
- Pass 1: toronto and calgary: Checked 298 pages and edited 266 pages. Moved 291 links to a new URL. Removed 1
{{dead link}}
. Added 1{{dead link}}
. Switched 39|url-status=dead
to live. Added 10 archive URLs (5 Wayback). Changed 4 citation metadata.
Articles that still have www.citynews.ca links after trying conversion to toronto or calgary.citynews.ca
IABOt DB
- Checked and updated.
Done -- GreenC 01:56, 8 November 2024 (UTC)
deseretnews.com
[edit]Almost all links here are soft redirects to articles at deseret.com, but conversion seems to be intractable, so the links should be archived. The converted links are of the form www.deseret.com/year/month/day/<id>/title-of-article, where the <id> seems to be unrelated to anything in the old link. Example: link [8] in 2012 United States presidential election is a soft redirect to [9].
5,446 pages. Helpful Raccoon (talk) 02:48, 6 October 2024 (UTC)
Enwiki
- Checked 5,456 pages and edited 4,934 pages. Moved 741 links to a new URL. Of which 736 are ghost redirects. Resolved 19 soft-404s. Removed 1
{{dead link}}
. Added 240{{dead link}}
. Switched 60|url-status=dead
to live. Switched 628|url-status=live
to dead. Added 6,336 archive URLs (5,748 Wayback). Changed 342 citation metadata.
IABot
- Checked and updated
Done -- GreenC 15:47, 8 November 2024 (UTC)
foxnews.com/section/year/
[edit]Fox News articles of the form foxnews.com/<section>/yyyy/mm/dd/.... are soft redirects to articles of the form foxnews.com/<section>/title-of-article. Example: [10] in "Weird Al" Yankovic is a soft redirect to [11] (note that the text at the end of the first URL differs from that of the second, with "adapting" apparently misspelled in the first). Conversion is usually tractable so long as the article title is known, as it is similar to the Chicago Tribune conversion.
7,259 pages. Helpful Raccoon (talk) 03:14, 6 October 2024 (UTC)
- Looks like two types of conversions: a simple URL transform by removing the date; and the harder "Chicago method", of extracting the title from the citation. I guess the best way is try to simple method first and if not then the Chicago method; if those do not work then check for ghost redirects; and finally add an archive. -- GreenC 15:59, 8 November 2024 (UTC)
- It's working, but took a while to code as this is the first time I've attempted sequencing all the methods at once. The "Chicago" method is still pretty custom, I need to integrate it as part of the boilerplate code as a standard feature. Also with all these methods it's slow, 7,000 pages will take a while. -- GreenC 19:58, 8 November 2024 (UTC)
- I added two new concepts to the glossary: ruled soft-redirect, and inferred soft-redirect. In this case, the removal of the date from the URL is a 'ruled soft-redirect' ie. a hard-coded rule to transform the URL. The parsing of the title is an 'inferred soft-redirect' because it is inferring (guessing) what the new URL might be, and could generate multiple guesses into an 'inference table', from which the bot checks each guess, until it finds a match. The inferred soft-redirect code is now incorporated as a feature that can be enabled/disabled for each project. -- GreenC 06:26, 9 November 2024 (UTC)
- Helpful Raccoon, thanks for finding and reporting Fox News, it was helpful on a couple levels. Fixing the links, improving the bot's general code for future domains, and helping to distinguish (or at least name) the concepts of 'ruled soft-redirects' and 'inferred soft-redirects'. -- GreenC 15:14, 10 November 2024 (UTC)
- I added two new concepts to the glossary: ruled soft-redirect, and inferred soft-redirect. In this case, the removal of the date from the URL is a 'ruled soft-redirect' ie. a hard-coded rule to transform the URL. The parsing of the title is an 'inferred soft-redirect' because it is inferring (guessing) what the new URL might be, and could generate multiple guesses into an 'inference table', from which the bot checks each guess, until it finds a match. The inferred soft-redirect code is now incorporated as a feature that can be enabled/disabled for each project. -- GreenC 06:26, 9 November 2024 (UTC)
- It's working, but took a while to code as this is the first time I've attempted sequencing all the methods at once. The "Chicago" method is still pretty custom, I need to integrate it as part of the boilerplate code as a standard feature. Also with all these methods it's slow, 7,000 pages will take a while. -- GreenC 19:58, 8 November 2024 (UTC)
Enwiki
- Checked 7,259 pages and edited 6,849 pages. Moved 7,986 links to a new URL. Of which 477 were ghost redirects; 4,276 were inferred soft-redirects; 2,894 were ruled soft-redirects; 539 were regular redirects. Added 565 archive URLs (462 Wayback).
IABot DB
- Checked and updated about 15,000 URLs which propagate to 300+ wikis
Done -- GreenC 15:14, 10 November 2024 (UTC)
cnbc.com/id/number/title
[edit]Articles of the form cnbc.com/id/<eight digit id>/<article title> can be converted to live articles or redirects by simply removing everything after the 8-digit id. Example: https://www.cnbc.com/id/37207942/Could_Italy_Be_Better_Off_than_its_Peers in Italy can be converted to https://www.cnbc.com/id/37207942, which redirects to the live article https://www.cnbc.com/2010/05/18/could-italy-be-better-off-than-its-peers.html.
A different example: https://www.cnbc.com/id/47387334/Jim_Breyer_via_Accel_Partners from Facebook can be converted to https://www.cnbc.com/id/47387334, which is a live article.
1,644 pages. Helpful Raccoon (talk) 08:23, 6 October 2024 (UTC)
- OK. Some redirect some do not. I'll test them all and migrate the ones that redirect. It increased the search size, since it's also including anything with only an ID number. -- GreenC 16:24, 10 November 2024 (UTC)
Enwiki
- Checked 1,654 pages and edited 1,491 pages. Moved 1,492 links to a new URL: 1,389 ruled soft-redirects, 103 ghost soft-redirects. Resolved 22 soft-404s. Removed 1
{{dead link}}
. Added 140{{dead link}}
. Switched 107|url-status=dead
to live. Switched 10|url-status=live
to dead. Added 142 archive URLs (114 Wayback). Changed 305 citation metadata.
Done -- GreenC 01:18, 11 November 2024 (UTC)
newamericamedia.org
[edit]217 pages. New American Media has ceased operations. Links to its website no longer work and its domain name may have been taken over. Cherry Cotton Candy (talk) 03:11, 8 October 2024 (UTC)
- Hijacked. I added it to WP:JUDI. thanks!
Done in batch #20 -- GreenC 16:59, 23 December 2024 (UTC)
en.rsf.org
[edit]567 pages. This website always returns the error code 521. Cherry Cotton Candy (talk) 03:25, 8 October 2024 (UTC)
Enwiki
- Checked 565 pages and edited 257 pages. Added 3
{{dead link}}
. Switched 67|url-status=live
to dead. Added 286 archive URLs (246 Wayback). Changed 2 citation metadata.
IABot DB
- Checked and done a few thousand.
Done -- GreenC 15:51, 11 November 2024 (UTC)
variety.com
[edit]Links with parameters do not work. If parameters are removed, some links will become redirect links.
- https://www.variety.com/article/VR1118016497?refCatId=16 does not work.
- https://www.variety.com/article/VR1118016497 redirects to https://variety.com/2010/film/markets-festivals/willie-nelson-launches-luck-films-1118016497/
- https://variety.com/article/VR102012.html?categoryid=4&cs=1&query=garth+brooks does not work.
- https://variety.com/article/VR102012 redirects to https://variety.com/1992/voices/columns/that-was-the-year-that-was-a-wrap-song-for-92-102012/
Cherry Cotton Candy (talk) 04:28, 8 October 2024 (UTC)
Enwiki
- Checked 2,852 pages and edited 2,681 pages. Moved 4,468 links to a new URL: 4,468 ruled soft-redirects. Removed 24
{{dead link}}
. Added 6{{dead link}}
. Switched 554|url-status=dead
to live. Switched 18|url-status=live
to dead. Added 106 archive URLs (53 Wayback). Changed 178 citation metadata.
IABot DB
- Checked and updated about 14,000 links which propagate to 300+ wikis
Done -- GreenC 15:59, 12 November 2024 (UTC)
kotaku.com.au
[edit]1357 pages for https://www.kotaku.com.au
- Kotaku Australia is now redirecting to Kotaku's front page (see update on Aftermath). Sariel Xilo (talk) 23:55, 15 October 2024 (UTC)
Enwiki
- Checked 1,370 pages and edited 1,303 pages. Added 8
{{dead link}}
. Switched 825|url-status=live
to dead. Added 772 archive URLs (749 Wayback). Changed 89 citation metadata.
IABot DB
- Checked and updated about 2,000 links which propagate to 300+ wikis
Done -- GreenC 23:15, 12 November 2024 (UTC)
community.seattletimes.nwsource.com
[edit]All of the "http://community.seattletimes.nwsource.com" links seem to be dead, but can be substituted with "https://archive.seattletimes.com" as seen in Special:Diff/1253654883
There are 2,943 articles that match this description: per this search result.
I tried this with several links and it seemed to work fine. I'm not sure how many failed the transfer, but testing a bunch and it being fine seems to me like a lot of them still exist.
Take for instance, the one provided in the Gulf War page: http://community.seattletimes.nwsource.com/archive/?date=19910912&slug=1305069
An archive does exist, and it shows what is shown with the url replacement: Archived old link vs Live updated link Chewsterchew (talk) 04:59, 27 October 2024 (UTC)
Enwiki
- Checked 2,951 pages and edited 2,905 pages. Moved 4,195 links to a new URL: 3,954 ruled soft-redirects, Removed 5
{{dead link}}
. Switched 287|url-status=dead
to live. Added 33 archive URLs (20 Wayback). Changed 255 citation metadata.
IABot DB
- Checked and updated about 1,000 links
Done -- GreenC 04:04, 13 November 2024 (UTC)
disneyparks.disney.go.com/blog/
[edit]"disneyparks.disney.go.com/blog/" redirects to https://disneyparksblog.com/, with none of the articles/post still active/archived. I've tried to {{dead link}} many of them and have submitted for InternetArchiveBot to run on many of the pages, but I'm sure I missed a bunch of them as well. Elisfkc (talk) 02:55, 28 October 2024 (UTC)
Enwiki
- Checked 461 pages and edited 390 pages. Removed 1
{{dead link}}
. Added 7{{dead link}}
. Switched 156|url-status=live
to dead. Added 544 archive URLs (520 Wayback). Changed 3 citation metadata.
IABot DB
- Checked and updated about 900 URLs that will propagate to 300+ wikis
Done -- GreenC 15:18, 13 November 2024 (UTC)
avclub.com/articles
[edit]Seems like a lot of their music reviews have dead links. How can we fix this? Cahlin29 (talk) 03:58, 30 October 2024 (UTC)
- Is there an example? -- GreenC 04:30, 30 October 2024 (UTC)
- The link on Drake's Take Care is dead: https://www.avclub.com/articles/drake-take-care,65046
- Same with Mac & Devin Go to High School (soundtrack): https://www.avclub.com/articles/snoop-dogg-and-wiz-khalifa-mac-and-devin-go-to-hig,66410
- Also with Curtis (50 Cent album): https://www.avclub.com/articles/50-cent-curtis,7557
- I'm presuming a pattern. Cahlin29 (talk) 17:22, 30 October 2024 (UTC)
- The Drake link was moved here. The number "1798170489" is the key. I was able to find it in a ghost redirect as seen here (the old URL redirects to the new URL). It will be a while, I need to get through everything else above first. Looks like about 4,600 pages. -- GreenC 17:55, 30 October 2024 (UTC)
- No worries, take your time, I assume the Internet Archive outage delayed things. Cahlin29 (talk) 20:45, 30 October 2024 (UTC)
- The Drake link was moved here. The number "1798170489" is the key. I was able to find it in a ghost redirect as seen here (the old URL redirects to the new URL). It will be a while, I need to get through everything else above first. Looks like about 4,600 pages. -- GreenC 17:55, 30 October 2024 (UTC)
Enwiki
- First pass: Checked 4,601 pages and edited 2,924 pages. Moved 3,133 links to a new URL: 3,133 ghost soft-redirects. Switched 120
|url-status=dead
to live. Added 73 archive URLs (26 Wayback). Changed 770 citation metadata. - Second pass: Checked 2,607 pages and edited 1,751 pages. Moved 3,493 links to a new URL: 468 inferred CDX soft-redirects, 3,025 ghost soft-redirects, Added 9
{{dead link}}
. Switched 32|url-status=dead
to live. Switched 115|url-status=live
to dead. Added 1,199 archive URLs (1,067 Wayback). Changed 213 citation metadata.
- Analysis: created a new method for discovery: inferred CDX soft-redirects. Converted domain names *.xvclub.com to www.avclub.com. Improved ghost redirect detection
IABot DB
- Updated about 11,000 links that propagate to 300+ wikis
Done - GreenC 05:01, 15 November 2024 (UTC)
empoweringindia.org
[edit]655 pages. This domain was sold to a gambling website, and Citation bot changed the titles of these links. Cherry Cotton Candy (talk) 04:11, 3 November 2024 (UTC)
Done in WP:JUDI batch #19 -- GreenC 17:56, 5 November 2024 (UTC)
michmarkers.com
[edit]262 pages. It has been usurped by a gambling website. Cherry Cotton Candy (talk) 09:14, 3 November 2024 (UTC)
Done in WP:JUDI batch #19 -- GreenC 17:56, 5 November 2024 (UTC)
ouramericanrevolution.org
[edit]Colonial Williamsburg site. 10 pages -- GreenC 19:36, 6 November 2024 (UTC)
- Done - via IABot job. -- GreenC 05:07, 15 November 2024 (UTC)
southdreamz.com
[edit]Website has been usurped. Doesn't look like JUDI but it redirects to a completely different website such as the link at Naan Mahaan Alla (2010 film). 73 articles. MrLinkinPark333 (talk) 20:50, 7 November 2024 (UTC)
Done in batch #20 -- GreenC 16:59, 23 December 2024 (UTC)
screenindia.com
[edit]This website soft redirects to indianexpress.com but has no equivalent text. Therefore, this needs archives only. 810 articles. Some of them already have archives added, such as at Vakkalathu Narayanankutty.Thanks! MrLinkinPark333 (talk) 02:24, 8 November 2024 (UTC)
- Technically soft 404 (vs. soft redirect). Corollary concepts. Soft 404 redirects when it shouldn't. Soft redirect doesn't redirect when should. -- GreenC 05:19, 15 November 2024 (UTC)
Enwiki
- Checked 823 pages and edited 340 pages. Added 184
{{dead link}}
. Switched 23|url-status=live
to dead. Added 136 archive URLs (104 Wayback). Changed 88 citation metadata.
IABot DB
- Checked and fixed about 400 links which propagate to 300+ wikis
Done -- GreenC 16:33, 15 November 2024 (UTC)
time.com
[edit]Time.com has moved their links to new URLs. Unfortunately, they are not easy to convert. For example, this is now here for Paul McCartney.. Therefore, I request archives URLs instead ~20k articles. Some of them already have archives added. Thanks! MrLinkinPark333 (talk) 15:53, 9 November 2024 (UTC)
- I processed time.com in July 2021. It was large, took three days to process. Added 25,000 archive URLs. You can read my strategy in the link. Do you still see a lot of broken links without archive URLs? -- GreenC 01:07, 11 November 2024 (UTC)
- Of the first 500 in the above link, 194 don't show archives. If you could filter out the ones without archive URLs for time, it'll help a lot. MrLinkinPark333 (talk) 01:11, 11 November 2024 (UTC)
- How are you checking for archives? 194 is about 40%. I just manually checked 50 pages, every one has an archive (need to open the page and search on the link, the search result page doesn't provide enough information to determine). Except 3 cases that have a live link. Of those 50, in no cases would the bot add an archive URL. I could do this, but it will take a while to process, and I'm not sure how much it will accomplish. BTW the Paul McCartney example link no longer exists in the article, but it does exist in two others. Both have archives. -- GreenC 19:36, 11 November 2024 (UTC)
- I only checked the results page and not manually checked each individual article. Is it possible to adjust the search result link above to calculate how many articles don't have archives first for time? Then, we could decide what to do next. MrLinkinPark333 (talk) 19:43, 11 November 2024 (UTC)
- There is no easy way for this search. But recall Wikipedia:Link_rot/URL_change_requests#ctv.ca, which was also previously done in 2021, and it found 133 more archives. Maybe it's worth trying again. I'll need to build a list of target articles by searching a dump file, since the online search tops out at 10,000 results. -- GreenC 05:06, 12 November 2024 (UTC)
- If you believe this is easier, feel free to check all of them. Since this request is big, I don't mind if it gets done later after the smaller requests are done. MrLinkinPark333 (talk) 02:16, 13 November 2024 (UTC)
- Extracting all the page names that contain time.com requires searching a dump file which can take 6-8 hours to complete. This is required when the number of results is > 10,000 because Cirrus search (eg. "insource:..") won't return more than 10k results, due to resource constraints on their search server. Cirrus can return how many results there are > 10k, but won't display the actual results beyond 10k. I'll need to do the same with deadline.com below which has 40k results. -- GreenC 19:46, 15 November 2024 (UTC)
- If you believe this is easier, feel free to check all of them. Since this request is big, I don't mind if it gets done later after the smaller requests are done. MrLinkinPark333 (talk) 02:16, 13 November 2024 (UTC)
- There is no easy way for this search. But recall Wikipedia:Link_rot/URL_change_requests#ctv.ca, which was also previously done in 2021, and it found 133 more archives. Maybe it's worth trying again. I'll need to build a list of target articles by searching a dump file, since the online search tops out at 10,000 results. -- GreenC 05:06, 12 November 2024 (UTC)
- I only checked the results page and not manually checked each individual article. Is it possible to adjust the search result link above to calculate how many articles don't have archives first for time? Then, we could decide what to do next. MrLinkinPark333 (talk) 19:43, 11 November 2024 (UTC)
- How are you checking for archives? 194 is about 40%. I just manually checked 50 pages, every one has an archive (need to open the page and search on the link, the search result page doesn't provide enough information to determine). Except 3 cases that have a live link. Of those 50, in no cases would the bot add an archive URL. I could do this, but it will take a while to process, and I'm not sure how much it will accomplish. BTW the Paul McCartney example link no longer exists in the article, but it does exist in two others. Both have archives. -- GreenC 19:36, 11 November 2024 (UTC)
- Of the first 500 in the above link, 194 don't show archives. If you could filter out the ones without archive URLs for time, it'll help a lot. MrLinkinPark333 (talk) 01:11, 11 November 2024 (UTC)
Enwiki
- Checked 44,901 pages and edited 13,920 pages. Moved 14,455 links to a new URL: 14,074 ruled soft-redirects, 381 ghost soft-redirects, Resolved 7,124 soft-404s. Removed 9
{{dead link}}
. Switched 660|url-status=dead
to live. Added 740 archive URLs (446 Wayback). Changed 2,281 citation metadata.
- Analysis: almost all 'ruled soft redirects' are http -> https. Since 'ghost redirects' were not available in 2021, they were discovered this time. Most of the archive URLs were non-Time.com domains that had a
{{dead link}}
tag and repaired incidentally. It was able to convert many|work=time.com
to|work=Time
, because this feature did not exist in 2021.
- Analysis: almost all 'ruled soft redirects' are http -> https. Since 'ghost redirects' were not available in 2021, they were discovered this time. Most of the archive URLs were non-Time.com domains that had a
Done -- GreenC 02:19, 17 November 2024 (UTC)
- Nice to see many fixes! MrLinkinPark333 (talk) 03:37, 17 November 2024 (UTC)
deadline.com
[edit]Deadline.com redirects to new URLs with numeric IDs at the end. Any punctuation marks are removed like at this link to go here for Robert Pattinson. Any links that already have an numeric ID at the end can be skipped. ~1300 articles. Thank you! MrLinkinPark333 (talk) 16:06, 9 November 2024 (UTC)
- There are over 40,000 pages with deadline.com .. limit to www.deadline.com there are 4,780. This is what I am checking on "Pass 1". -- GreenC 17:16, 15 November 2024 (UTC)
Enwiki
- Pass1: Checked 4,784 pages and edited 4,364 pages. Moved 6,575 links to a new URL: 6,575 ruled soft-redirects, Added 24
{{dead link}}
. Switched 98|url-status=dead
to live. Switched 143|url-status=live
to dead. Added 442 archive URLs (401 Wayback). Changed 1,295 citation metadata. - Pass 2: Checked 39,245 pages and edited 5,808 pages. Moved 2,278 links to a new URL: 2,278 ruled soft-redirects, Added 95
{{dead link}}
. Switched 32|url-status=dead
to live. Switched 1,119|url-status=live
to dead. Added 2,126 archive URLs (2,018 Wayback). Changed 2,399 citation metadata.
Done -- GreenC 00:22, 18 November 2024 (UTC)
passport.weibo.com
[edit]Weibo is a Chinese social media platform with a lot of official information disseminated through the official accounts. Some editors tend to use the visitor landing url with prefix when citing a specific post. So a url cited on Death of Li Keqiang goes like:
Am hoping to clean all the citations, just to take what goes after the 'url=' with '%3A → :' and '%2F → /' so the url becomes https://weibo.com/1938487875/NpL26wys2. NoCringe (talk) 02:24, 11 November 2024 (UTC)
- Hi NoCringe: 134 pages. The links are not dead, but I can still process them for link normalization. And if it finds any are dead it will add an archive. -- GreenC 05:19, 12 November 2024 (UTC)
- Thank you! Please process them. It will make archiving easier since the IABot gets stuck on some of these landing pages. NoCringe (talk) 06:57, 12 November 2024 (UTC)
- NoCringe: This is not working. For example in Liu Duan Duan there is this which translates to that. Which is a soft-404 which I don't know how to detect due to foreign script. Furthermore, if you follow the redirect headers, it actually should go to here, which doesn't even work. There are some legitimate working pages, also many not. Without a way to detect the non-working pages, I can't remove existing URLs. -- GreenC 02:47, 18 November 2024 (UTC)
- Ugh I didn't realise there are variations. I will try to clean them up manually when time permits. Thanks for processing by the way. NoCringe (talk) 08:57, 18 November 2024 (UTC)
- User:NoCringe, ok good luck. I wrote the code to extract and decode the sub-URL, if you need me to run it and log the results I can post a three column table: <wikipage> <origiurl> <extracted url> -- GreenC 15:02, 19 November 2024 (UTC)
- If it's not too much trouble please run it. It will be useful. NoCringe (talk) 03:15, 20 November 2024 (UTC)
- User:NoCringe, ok good luck. I wrote the code to extract and decode the sub-URL, if you need me to run it and log the results I can post a three column table: <wikipage> <origiurl> <extracted url> -- GreenC 15:02, 19 November 2024 (UTC)
- Ugh I didn't realise there are variations. I will try to clean them up manually when time permits. Thanks for processing by the way. NoCringe (talk) 08:57, 18 November 2024 (UTC)
paleobiodb.org
[edit]Their former URLs paleodb.org and fossilworks.org have been taken over by The Ecological Register; a seemingly well-meaning site. The old URLs such as:
http://paleodb.org/cgi-bin/bridge.pl?a=checkTaxonInfo&taxon_no=34738
http://www.fossilworks.org/cgi-bin/bridge.pl?a=taxonInfo&taxon_no=64541
have now become:
https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=34738
https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=64541
Can you fix/redirect these, please?
Big Blue Cray(fish) Twins (talk) 12:20, 12 November 2024 (UTC)
paleodb.org
[edit]- Enwiki
- Checked 2,000 pages and edited 1,990 pages. Moved 1,957 links to a new URL: 1,957 ruled soft-redirects, Removed 14
{{dead link}}
. Added 9{{dead link}}
. Switched 71|url-status=dead
to live. Added 30 archive URLs (29 Wayback). Changed 22 citation metadata.
- Checked 2,000 pages and edited 1,990 pages. Moved 1,957 links to a new URL: 1,957 ruled soft-redirects, Removed 14
- Done -- GreenC 02:24, 19 November 2024 (UTC)
fossilworks.org
[edit]Big Blue Cray(fish) Twins: From Midshipman fish, there are a lot like this but I couldn't find an equivalent at paleobiodb -- GreenC 05:34, 18 November 2024 (UTC)
- Manually fixed that one. For some reason, they don't match the standard profile, but do still retain the same numbers:
http://www.fossilworks.org/cgi-bin/bridge.pl?a=collectionSearch&collection_no=135043
- became
https://paleobiodb.org/classic/displayCollResults?collection_no=col:135043
- and:
becamehttp://www.fossilworks.org/cgi-bin/bridge.pl?a=taxonInfo&taxon_no=361425
https://paleobiodb.org/classic/basicTaxonInfo?taxon_no=txn:361425
- Thanks for your expert ministrations, but I am afraid I have given you/your bot thousands more!!
- Big Blue Cray(fish) Twins (talk) 09:20, 18 November 2024 (UTC)
- Thank you. Done in Pass 2. More varieties on the margins, if they exist:
- Bohío Formation: displayStrata (603)
- Ashorocetus: displayReference (8)
- Serra da Galga Formation: collectionSearch (156)
- displayStrata has most instances. -- GreenC 15:49, 18 November 2024 (UTC)
- Looks like displayStrata is: http://www.fossilworks.org/cgi-bin/bridge.pl?action=displayStrata&geological_group=&formation=Bohio&group_formation_member=Bohio ==> https://paleobiodb.org/classic/displayStrata?geological_group=&formation=Bohio&group_formation_member=Bohio
- I'll rerun a Pass 3 with this update -- GreenC 21:49, 18 November 2024 (UTC)
- The References are: http://www.fossilworks.org/cgi-bin/bridge.pl?a=displayReference&reference_no=12130 ==> https://paleobiodb.org/classic/displayRefResults?reference_no=ref:12130
- Big Blue Cray(fish) Twins (talk) 22:24, 18 November 2024 (UTC)
- And collectionSearch may be: http://www.fossilworks.org/cgi-bin/bridge.pl?action=collectionSearch&geological_group=Bauru&formation=Mar%EDlia ==> https://paleobiodb.org/classic/displayCollResults?&geologicalgroup=Bauru&formation=Marília
- BUT will need checking against other results to be sure due to Unicode clouding the issue on the example provided
- Big Blue Cray(fish) Twins (talk) 23:34, 18 November 2024 (UTC)
- Running Pass 4 with the new rules, and a larger set of articles. -- GreenC 03:25, 19 November 2024 (UTC)
- Thank you. Done in Pass 2. More varieties on the margins, if they exist:
- Enwiki
- * Pass 1: Checked 7,269 pages and edited 6,391 pages. Moved 3,089 links to a new URL: 3,089 ruled soft-redirects, Removed 17
{{dead link}}
. Switched 678|url-status=dead
to live. Added 6 archive URLs (6 Wayback). Changed 186 citation metadata. - * Pass 2: Checked 590 pages and edited 525 pages. Moved 1,645 links to a new URL: 1,645 ruled soft-redirects, Removed 2
{{dead link}}
. Added 4{{dead link}}
. Switched 5|url-status=dead
to live. Added 2 archive URLs (2 Wayback). - * Pass 3: Checked 590 pages and edited 67 pages. Moved 603 links to a new URL: 603 ruled soft-redirects
- * Pass 4: Checked 914 pages and edited 423 pages. Moved 687 links to a new URL. Added 20 archive URLs (20 Wayback).
Done -- GreenC 04:41, 19 November 2024 (UTC)
avclub.com
[edit]Dead sub-domains. Can be made live again by converting hostname to "www." .. the hostname might be: origin|games|music|film|news|aux|tv|mobile .. 4,732 pages -- GreenC 21:22, 13 November 2024 (UTC)
Enwiki
- Checked 4,742 pages and edited 4,546 pages. Moved 5,181 links to a new URL: 5,156 ruled soft-redirects, 25 ghost soft-redirects, Removed 3
{{dead link}}
. Switched 278|url-status=dead
to live. Added 31 archive URLs (22 Wayback). Changed 60 citation metadata.
IABot DB
- Checked and updated about 3,500 links that propagate to 300+ wikis
Done -- GreenC 14:55, 19 November 2024 (UTC)
nztop40.co.nz
[edit]I'm reposting a request I made at WP:BOTREQ and was directed here.
Dead citations occur due to the the website changing the URL format. For example https://nztop40.co.nz/chart/albums?chart=3467 is now https://aotearoamusiccharts.co.nz/archive/albums/1991-08-09.
Case 1: 9,025 pages that are using these URLs found through search. Some may already be archived.
Case 2: 4,133 citations using {{cite certification
An ideal transition seems difficult as it would require the following steps:
- Find an archived version through the wayback machine, e.g., https://web.archive.org/web/20240713231341/https://nztop40.co.nz/chart/albums?chart=3467 for the above. For case 2 this requires inferring the URL first (
https://nztop40.co.nz/chart/{{#switch:{{{type|}}}|album={{#if:{{{domestic|}}}|nzalbums|albums}}|compilation=compilations|single={{#if:{{{domestic|}}}|nzsingles|singles}}}}?chart={{{id|}}})
) - Harvest the date 11 August 1991 either from the rendered archived page or from the archived page source,
<p id="p_calendar_heading">11 August 1991</p>
- For case 1, translate the URL accordingly to https://aotearoamusiccharts.co.nz/archive/albums/1991-08-11.
- For case 2, add
|source=newchart
and replace|id=1991-08-11
.
Note that for case 1, the word after "/archive/" changed according to the following incomplete table. For case 2 this is handled by the template so no need to worry about it.
Old text | New text |
---|---|
albums | albums |
singles | singles |
nzalbums | aotearoa-albums |
nzsingles | aotearoa-singles |
tereosingles | te-reo-singles |
hotsingles | hot-singles |
hotnzsingles | hot-aotearoa-singles |
If someone is willing to go through the above, at least for simple cases, I think it is the ideal solution, especially for case 2. Failing that, a simpler archiving procedure can be taken.
- For case 1: add
|archive-url=
and|archive-date=
per usual archiving procedure. Add|url-status=deviated
. If no archive exists (which should be a minority), add {{dead link}} - For case 2: add
|archive-url=
and|archive-date=
per usual archiving procedure as they are supported by the templates. Add|source=oldchart
(even if no archive is found)
I will be happy to support any technical assistance. Muhandes (talk) 22:55, 14 November 2024 (UTC)
- Muhandes, I don't see any major hurdles with your ideal solution. It's a lot of citations, worth doing. I'm working through requests on this page chronologically. Might get to here in a week or less. -- GreenC 00:51, 15 November 2024 (UTC)
- @GreenC: I'm happy to hear that. In the meanwhile I added records to the table above which should make it complete, to the best of my knowledge. I also noticed some of the URLs (53 of them to be accurate) add an additional #all_records_extra to the URL, e.g., https://nztop40.co.nz/chart/albums?chart=4413#all_records_extra. I will have a look at them individually and perhaps, since it's only 53, do them manually. --Muhandes (talk) 08:18, 15 November 2024 (UTC)
- The pages using #all_records_extra were are all referring to the Heatseeker charts which don't seem to be available on the new website. As such, they should be archived, not translated to the new format. --Muhandes (talk) 10:32, 15 November 2024 (UTC)
- Case 1 and 2 are different code bases. I have a separate code file for working with external link templates. So I'll initially focus on case 1, then likely some of that code can be reused with case 2. -- GreenC 15:07, 19 November 2024 (UTC)
- To document an additional variation, the "End of Year" charts, like this, which have new URLs like
https://aotearoamusiccharts.co.nz/archive/annual-{newcode}/{e}-12-31
, where{newcode}
is in the HTML search on"<h1>Top Selling [name]"
where [name] could be Singles, Albums, NZ Singles, NZ Albums, Compilations - then extrapolate from the chart above. The "{e}" is the year taken from<p id="p_calendar_heading">...</p>
-- GreenC 21:11, 19 November 2024 (UTC) - Muhandes: I need help translating the "discover" code as here. I tried this but does not work. -- GreenC 21:48, 19 November 2024 (UTC)
- @GreenC: the "discover" charts are the same Heatseeker charts as the #all_records_extra ones. As far as I can tell they are no longer available. The only way to handle it is to find an archive-url. Note that in these cases the oldest archive-url is the best. I have found several cases where a new archive exists but it does not include the chart itself. Muhandes (talk) 23:45, 19 November 2024 (UTC)
- OK. It defaults to oldest. In the end there were only 3 cases. -- GreenC 00:44, 20 November 2024 (UTC)
- @GreenC: the "discover" charts are the same Heatseeker charts as the #all_records_extra ones. As far as I can tell they are no longer available. The only way to handle it is to find an archive-url. Note that in these cases the oldest archive-url is the best. I have found several cases where a new archive exists but it does not include the chart itself. Muhandes (talk) 23:45, 19 November 2024 (UTC)
- @GreenC: Is there a way to identify those 142 dead links and 259 archive URLs in the log? I would like to give them a manual sweep. Muhandes (talk) 07:51, 20 November 2024 (UTC)
- Logs: Wikipedia:Link_rot/Cases/nztop40.co.nz. The templates from Case 2 will show up in the tracking category. If there is no archive URL available it won't be able to make the conversion, and likewise won't be able to add an archive URL. Some archive URLs are available, but are soft-404s, or the original URL was not a valid chart page, or the template is malformed. I'll provide a list of the templates that didn't convert, so you can scan for syntax errors; the process is still running. -- GreenC 16:17, 20 November 2024 (UTC)
- @GreenC Thank you. I guess I have my next project, fixing those references manually. Muhandes (talk) 12:51, 21 November 2024 (UTC)
- @GreenC Can you please check why it failed on 3:15 (Breathe) case 2? The URL is https://nztop40.co.nz/chart/singles?chart=5565 archive exists at https://web.archive.org/web/20230428222709/https://nztop40.co.nz/chart/singles?chart=5565 (it was the first on the log). Muhandes (talk) 14:56, 21 November 2024 (UTC)
- The logs show
network failure
.. likely Wayback Machine time out (I check for timeouts and have retries but at some point it gives up). I just tried it again, worked first try. I'll rerun the cases that didn't convert. -- GreenC 20:24, 21 November 2024 (UTC)- For case 2: Re-ran the 305 pages in Category:Cite certification used for New Zealand with missing archive plus the pages with an
|archive-url=
- it fixed 220 templates in 200 pages. Example -- GreenC 22:02, 21 November 2024 (UTC) - For case 1: Re-ran the 249 pages in Wikipedia:Link_rot/Cases/nztop40.co.nz (first two lists combined) and had only 1 new result. This leads me to believe that while running case 2 originally, there were intermittent problems with the Wayback Machine, during that period. If you see anything else it missed let me know and I'll investigate. -- GreenC 22:20, 21 November 2024 (UTC)
- @GreenC Thanks again. I'll have a look later on the remaining pages and see if there is anything left to do. Muhandes (talk) 08:06, 22 November 2024 (UTC)
- @GreenC I'm sorry but, again, the first entry in the category is 6lack discography where there is an unexplained case 2 failure https://nztop40.co.nz/chart/singles?chart=4494 where https://web.archive.org/web/20180629074435/https://nztop40.co.nz/chart/singles?chart=4494 exists. I'd appreciate it if you can check it. --Muhandes (talk) 10:21, 22 November 2024 (UTC)
- Problems found and fixed:
- '&' character in the template not percent encoded, which caused an API request to return incorrect results.
- Certain difficult citations: Grease (1978 soundtrack):
{{Certification Table Entry|region=New Zealand|type=album|title=Grease Soundtrack|artist=Various|award=Platinum|number=6|id=5383|salesamount=250,000|certyear=2022|relyear=1978|access-date=21 August 2022|salesref=<ref>{{cite web|url=https://www.americanradiohistory.com/Archive-Billboard/70s/1979/Billboard%201979-03-17.pdf|title=Tax Clouds Growth And Dampens Local Talent Development|publisher=Billboard|page=SA-6|first=Phil|last=Gifford|date=17 March 1979|access-date=31 July 2019}}</ref>}}
- Down to below 80. -- GreenC 00:11, 23 November 2024 (UTC)
- Thanks again. Going through the remaining certifications is a pain-staking task but I'm going to do it. Can you please have a look at this edit? The url is https://nztop40.co.nz/chart/albums?chart=4736 and archive-url is https://web.archive.org/web/20190816231216/https://nztop40.co.nz/chart/albums?chart=4736 which shows date 19 August 2019. This should have been translated to
|id=2019-08-19
, but as you cans see, it didn't.
A second thing I just realized that case 2 also includes rare calls from {{Certification Cite Ref}} which is, sadly, still widely used (Category:Certification Cite Ref usages outside Certification Table Entry (1,265)), especially in discographies. For example, BTS albums discography, Eagles discography, Cobra Starship discography. Muhandes (talk) 09:57, 24 November 2024 (UTC)- Well, Wikipedia follows the 80/20 Rule. It's sort of like climbing Mt. Everest without oxygen. The first 80% is easy. The next 10% is hard. The last 10% is as hard as the previous 90% combined. This is why many people give up once it gets to 90% (or around there) without reaching 100%. The work gets exponentially difficult.
- To answer your question about the date offset, I'm embarrassed to say there is a typo in the code, converting "August" to "09", instead of "08". The site then redirected the bogus date page to a working page nearby, September 13. So I never caught it. This is unfortunately the case for everything with an August month. There are about 840 citations in 750 pages that would possibly be a problem.. probably about half that since some are legitimate September dates. This will be tricky to fix. I keep logs with old -> new template data that make it possible, for this sort of regression situation.
- 'Certification Cite Ref', if you can give me the template format it seems to use different parameters. -- GreenC 03:17, 25 November 2024 (UTC)
- Thanks again. Going through the remaining certifications is a pain-staking task but I'm going to do it. Can you please have a look at this edit? The url is https://nztop40.co.nz/chart/albums?chart=4736 and archive-url is https://web.archive.org/web/20190816231216/https://nztop40.co.nz/chart/albums?chart=4736 which shows date 19 August 2019. This should have been translated to
- Problems found and fixed:
- For case 2: Re-ran the 305 pages in Category:Cite certification used for New Zealand with missing archive plus the pages with an
- The logs show
- Logs: Wikipedia:Link_rot/Cases/nztop40.co.nz. The templates from Case 2 will show up in the tracking category. If there is no archive URL available it won't be able to make the conversion, and likewise won't be able to add an archive URL. Some archive URLs are available, but are soft-404s, or the original URL was not a valid chart page, or the template is malformed. I'll provide a list of the templates that didn't convert, so you can scan for syntax errors; the process is still running. -- GreenC 16:17, 20 November 2024 (UTC)
- I'm a pefectionist. It may take me years but I aim to reach 100%.
{{Certification Cite Ref}} uses the same format as {{cite certification}} when it comes to|id=
and|source=
. Muhandes (talk) 08:05, 25 November 2024 (UTC)- Found and fixed the August error: 343 citations in 320 pages. Example. The edit counts in the edit summary are not always accurate due to the way it was done.
Ran the CCR template. It only edited 15 pages, but it got the three pages you mentioned, so I suspect it's probably accurate.
-- GreenC 19:53, 25 November 2024 (UTC)- I finished cleaning up the category. I may deal with the rest of the cases at a later date. Anyway, I believe the bot's work is done. Thank you! Muhandes (talk) 19:12, 3 December 2024 (UTC)
- User:Muhandes: Congrats! Nice to see your dedication to reach 100%. This project required new bespoke code that of course had some bugs on the first/second try but you kept error checking it and narrowed the numbers down to something manageable so the rest could be done manually, which is admirable work. My boilerplate code is well tested, but novel situations like this are often how the boilerplace code gets new features added. Although I've never seen anything like this before, I'll keep it mind in case the pattern comes up again. -- GreenC 17:57, 10 December 2024 (UTC)
- I finished cleaning up the category. I may deal with the rest of the cases at a later date. Anyway, I believe the bot's work is done. Thank you! Muhandes (talk) 19:12, 3 December 2024 (UTC)
- Found and fixed the August error: 343 citations in 320 pages. Example. The edit counts in the edit summary are not always accurate due to the way it was done.
Enwiki
- Case 1: Checked 8,904 pages and edited 8,870 pages. Moved 17,224 links to a new URL: 17,224 ruled inferred soft redirect, Removed 1
{{dead link}}
. Added 142{{dead link}}
. Switched 313|url-status=dead
to live. Switched 38|url-status=live
to dead. Added 269 archive URLs (219 Wayback). Changed 8 citation metadata. - Case 2: Converted 4,861 templates. Example diff.
Unable to convert see Wikipedia:Link_rot/Cases/nztop40.co.nz (224) and Category:Cite certification used for New Zealand with missing archive (305)outdated
IABot DB
- Checked and updated about 3,000 links which propagate to 300+ wikis
Done (pending further edge cases above) -- GreenC 01:27, 22 November 2024 (UTC)
iassrt.org
[edit]judi. see Special:Diff/1257685967. – robertsky (talk) 04:47, 16 November 2024 (UTC)
Done in batch #20 -- GreenC 16:59, 23 December 2024 (UTC)
kcchiefs.com
[edit]kcchiefs.com redirects to chiefs.com without having an archive of the articles. There are 386 articles on English Wikipedia linking to kcchiefs.com Elisfkc (talk) 19:38, 17 November 2024 (UTC)
Enwiki
- Checked 384 pages and edited 87 pages. Added 10
{{dead link}}
. Switched 1|url-status=live
to dead. Added 109 archive URLs (101 Wayback). Changed 1 citation metadata.
IABot DB
- Checked and updated 168 URLs which propagate to 300+ wikis
Done -- GreenC 02:26, 22 November 2024 (UTC)
health.gov
[edit]Looks like the original site at health.gov was moved to https://odphp.health.gov and a new health.gov was created. Some links might need updating. --Nintendofan885T&Cs apply 19:39, 18 November 2024 (UTC)
- There are 127 pages. Will convert to archive URLs, unless there is a working redirect. -- GreenC 05:20, 19 November 2024 (UTC)
Enwiki
- Checked 127 pages and edited 92 pages. Moved 130 links to a new URL: 115 redirects, 15 ghost soft-redirects, Switched 6
|url-status=dead
to live. Added 3 archive URLs (3 Wayback). Changed 28 citation metadata.
Done -- GreenC 02:48, 22 November 2024 (UTC)
HugeDomains
[edit]Possibly a similar modus operandi as detailed in WP:JUDI but in this case identifying domains for sale and changing text to show this as here [12]. Small at present with only about 20 pages affected Lyndaship (talk) 13:01, 23 November 2024 (UTC)
- Hi User:Lyndaship, good to hear from you, thanks for the report. That was done automatically by the user-run bot ReFill which checks the page title header and adds it to Wikipedia. I don't think it's malicious intent, a side effect of how reFill "works". Probably the best solution is report to two places (I think) monitor for title string spam: WP:CITATIONBOT, and Help talk:Citation Style 1. I just reported it to later. -- GreenC 17:06, 23 November 2024 (UTC)
- There is also WP:CYBERSQUATTER, a page to document squatters like this. -- GreenC 17:27, 23 November 2024 (UTC)
currentaffairs.org
[edit]The links to currentaffairs.org have been changed. They used to be just:
currentaffairs.org/yyyy/mm/article-name
But have now changed to:
currentaffairs.org/news/yyyy/mm/article-name
At the moment most of the links are being redirected from the old URLs to the new ones. -- LCU ActivelyDisinterested «@» °∆t° 18:43, 30 November 2024 (UTC)
Enwiki
- Checked 125 pages and edited 123 pages. Moved 145 links to a new URL: 145 ruled soft-redirects, Added 2 archive URLs (0 Wayback). Changed 11 citation metadata.
- (and manually repaired 3 pages with typos in the URLs)
Done -- GreenC 16:24, 16 December 2024 (UTC)
patents.com lapsed
[edit]patents.com lapsed and is for sale. Searching for insource:"patents.com" shows 25 pages affected. For the one rescue I did, {Cite patent|...} seemed not to work, so I used {cite web|...} to the Google Patents page. (diff)
Before: (URL without {cite...}):
<ref>[http://www.patents.com/Heat-transfer-initiator/US20020035945/en-US/ Heat transfer initiator - US20020035945]. Patents.com. Retrieved on 2010-02-08.</ref>
After: ({cite web|...}):
<ref>{{cite web|title=US patent 20020035945A1, Heat transfer initiator|url=https://patents.google.com/patent/US20020035945A1/en}}</ref>
A876 (talk) 21:00, 5 December 2024 (UTC)
- There 4 pages that need repair. Can you do it? It is not suitable for a bot request, thank you. -- GreenC 17:49, 16 December 2024 (UTC)
people.com
[edit]Hello. Old urls with only numeric IDs don't work, such as this link for Bruno Mars. I haven't seen replacement URLs on the website. Therefore, I request archives for these URLS, unless new links can be found. ~2500 pages. Some already have archive urls added. Thanks! MrLinkinPark333 (talk) 19:21, 6 December 2024 (UTC)
Enwiki
- (Pass 1): Checked 2,577 pages and edited 1,007 pages. Moved 724 links to a new URL: 624 ruled soft-redirects, 100 ghost soft-redirects, Resolved 58 soft-404s. Switched 36
|url-status=dead
to live. Added 49 archive URLs (16 Wayback). Changed 784 citation metadata.
- (The 624 represent URLs that were http:// converted to https:// (a ruled soft-redirect) and at the same time a normal redirect was found and followed and converted to a live link.
Because so few archives were added it appears the domain was previously processed converting to archives, though not by WaybackMedic)
- (The 624 represent URLs that were http:// converted to https:// (a ruled soft-redirect) and at the same time a normal redirect was found and followed and converted to a live link.
IABot DB
- Updated about 6,000 URLs which propagate to 300+ wikis
MrLinkinPark333: other domains have a similar URL structure eg [13] in 2001 Marsh Harbour Cessna 402 crash .. probably the same Content Management System (CMS). -- GreenC 02:32, 17 December 2024 (UTC)
- That one could be fixed with a similar URL that keeps the numeric ID. If you want to go through The Observer dead links, feel free to. MrLinkinPark333 (talk) 02:38, 17 December 2024 (UTC)
- There are only about 240, probably worth doing, but I bet this pattern:
name.com/string/string/0,
can be found throughout. -- GreenC 17:52, 17 December 2024 (UTC)
- There are only about 240, probably worth doing, but I bet this pattern:
While doing Sports Illustrated below, I found a recently introduced bug that would explain why so few archive URLs were added. I'll need to reprocess the enwiki of people.com -- GreenC 18:41, 17 December 2024 (UTC)
- Pass 2 (bug fix): Checked 2,577 pages and edited 1,493 pages. Moved 19 links to a new URL: 1 normal redirects, 8 ruled soft-redirects, 10 ghost soft-redirects, Resolved 60 soft-404s. Added 12
{{dead link}}
. Switched 5|url-status=dead
to live. Switched 449|url-status=live
to dead. Added 1,349 archive URLs (1,215 Wayback). Changed 9 citation metadata.
Done -- GreenC 01:18, 18 December 2024 (UTC)
vault.sportsillustrated.cnn.com
[edit]These links might be able to convert to new links at si.com. The new URL format is vault.si.com/vault/year/month/day/name-of-article/ - For example: this link is now here for Kenny Anderson (basketball). However, it won't always work as this is now here for Guus Hiddink. As some new URLs also have the subtitle, I suggest trying to convert with the headline only first, then add the subtitle if that doesn't work. Otherwise, I request regular archives if converted URLs aren't found. ~800 articles. Thanks! MrLinkinPark333 (talk) 20:07, 6 December 2024 (UTC)
Enwiki
- Checked 829 pages and edited 251 pages. Moved 254 links to a new URL: 254 inferred soft-redirects, Resolved 496 soft-404s. Removed 1
{{dead link}}
. Added 5{{dead link}}
. Switched 195|url-status=dead
to live. Added 16 archive URLs (4 Wayback). Changed 40 citation metadata.
Done -- GreenC 04:47, 19 December 2024 (UTC)
vault.si.com/vault/
[edit]I've also discovered that some of these links are broken. For example, this link doesn't work for Kraus–Weber test and there isn't a replacement URL. However, this is working for Dick Donovan. As this would conflict with the cnn.com ones above, I think these ones should be checked first. ~2,200. Thank you! MrLinkinPark333 (talk) 21:02, 6 December 2024 (UTC)
Enwiki
- Checked 2,264 pages and edited 259 pages. Moved 9 links to a new URL: 2 ruled soft-redirects, 7 ghost soft-redirects, Resolved 14 soft-404s. Added 11
{{dead link}}
. Switched 9|url-status=live
to dead. Added 174 archive URLs (165 Wayback). Changed 79 citation metadata.
IABot DB
- Updated 5 links
Done -- GreenC 18:27, 18 December 2024 (UTC)
AquariumWiki
[edit]Entire domains https://www.theaquariumwiki.com/ and https://www.theaquariumwiki.org are dead. Also has an interwiki at aquariumwiki: but I cleaned that up manually. Since it's an open wiki (and hence not a relaible source) maybe delete without archiving.
. * Pppery * it has begun... 00:34, 11 December 2024 (UTC)
- 15 pages. I will leave citation deletion to anyone who wants to go through manually, there are so few it would be better, deleting citations by bot is error prone. I'll add archive URLs for now. -- GreenC 04:57, 19 December 2024 (UTC)
- BTW I couldn't find any .org on enwiki or in the IABot database
Enwiki
- Checked 15 pages and edited 15 pages. Added 3
{{dead link}}
. Added 12 archive URLs (11 Wayback).
IABot DB
- Updated 102 links which will propagate to 300+ wikis
Done -- GreenC 05:22, 19 December 2024 (UTC)
archive.fwweekly.com
[edit]Old to new form via Inferred soft-redirect method to determine date and title. Example. 36 pages -- GreenC 18:58, 12 December 2024 (UTC)
Enwiki
- Checked 37 pages and edited 32 pages. Moved 15 links to a new URL: 15 inferred soft-redirects, Resolved 1 soft-404s. Switched 3
|url-status=dead
to live. Switched 1|url-status=live
to dead. Added 17 archive URLs (17 Wayback).
Done -- GreenC 03:41, 19 December 2024 (UTC)
www.military-today.com
[edit]The entire domain seems to have been usurped: http://www.military-today.com/ has been replaced by some Indonesian gambling advertisement. Seems like there's a lot of citations that reference it. laptop bird talkcontribs 05:31, 18 December 2024 (UTC)
Done in batch #20 -- GreenC 17:00, 23 December 2024 (UTC)
observer.theguardian.com
[edit]These links are now redirecting to new URLs. this is now here for Andrew Lincoln Any new URLS without /observer/ need to be swapped to The Guardian. For example, this is now here for Tony Blair. However, some redirect to 404s like this one for The Stone Roses (album). 76 articles Thank you! MrLinkinPark333 (talk) 18:52, 19 December 2024 (UTC)
domainname.theguardian.com
[edit]Most of the domain names for The Guardian redirect to new links. For example, this goes here for Art criticism. However not all of them work. Here's what I've found so far:
- Broken: witness.theguardian.com, blogs.theguardian.com
- Working redirects: film.theguardian.com, politics.theguardian.com, business.theguardian.com, arts.theguardian.com, careers.theguardian.com
There's probably more, but I'm not sure how to search for it. MrLinkinPark333 (talk) 19:29, 19 December 2024 (UTC)
- It will likely be > 10k (Cyrus maxes at 10k). I can search a dump file for pages that contain *.theguardian, and the bot will internally skip www and <none> links. Also there are over 2000 pages with amp (mobile optimized) to be converted to www -- GreenC 03:03, 20 December 2024 (UTC)
- If you want to do the mobile ones first, that might be easier. MrLinkinPark333 (talk) 03:09, 20 December 2024 (UTC)
- OK. The original request will be redirects, and archived soft-redirects (ghost). The amp will be ruled soft-redirects. Maybe some ruled inferred soft-redirects are possible. Anyway I need to finish WP:JUDI batch #20 first, it's larger than all previous JUDI batch's combined, will require a bunch of runs due to size limits. -- GreenC 03:52, 20 December 2024 (UTC)
- No worries! These requests can wait :) MrLinkinPark333 (talk) 19:58, 20 December 2024 (UTC)
- OK. The original request will be redirects, and archived soft-redirects (ghost). The amp will be ruled soft-redirects. Maybe some ruled inferred soft-redirects are possible. Anyway I need to finish WP:JUDI batch #20 first, it's larger than all previous JUDI batch's combined, will require a bunch of runs due to size limits. -- GreenC 03:52, 20 December 2024 (UTC)
- If you want to do the mobile ones first, that might be easier. MrLinkinPark333 (talk) 03:09, 20 December 2024 (UTC)
247sports.com
[edit]- https://247sports.com/nfl/new-york-giants/Bolt/NFL-Free-Agency-Josh-Mauro-signs-with-New-York-Giants-116458092
- https://247sports.com/nfl/new-york-giants/Article/NFL-Free-Agency-Josh-Mauro-signs-with-New-York-Giants-116458092/
- https://247sports.com/nfl/green-bay-packers/Bolt/Report-Packers-tender-OL-Adam-Pankey--116192054
- https://247sports.com/nfl/green-bay-packers/article/nfl-free-agency-packers-tender-offensive-lineman-adam-pankey--116192054/
- https://247sports.com/nfl/green-bay-packers/Bolt/Green-Bay-Packers-to-wear-color-rush-uniforms-vs-Chicago-Bears--108117457/
- https://247sports.com/nfl/green-bay-packers/article/green-bay-packers-to-wear-color-rush-uniforms-vs-chicago-bears--108117457/
- https://247sports.com/nfl/green-bay-packers/Bolt/Green-Bay-Packers-sign-LB-Ahmad-Thomas-to-practice-squad--111380990/
- https://247sports.com/nfl/green-bay-packers/article/green-bay-packers-sign-lb-ahmad-thomas-to-practice-squad--111380990/
This was done in April but for some reason many did not work. -- GreenC 02:31, 23 December 2024 (UTC)
Tag: FABLE-1224
-- GreenC 02:31, 23 December 2024 (UTC)
frank.mif.pg.gda.pl/sheets
[edit]http://www.frank.mif.pg.gda.pl/sheets/*
Defunct for some time. Was only a mirror to
https://frank.pocnet.net/sheets/*
which is alive and kicking. IAbot has already put in some unnecessary links to Wayback.
— Preceding unsigned comment added by 2001:8A0:5E5D:D200:8CFD:3F84:7C2D:F066 (talk • contribs)
- Too few to program a bot for. Please fix manually. Thanks. -- GreenC 16:36, 23 December 2024 (UTC)
www.ukzn.ac.za
[edit]Tag: FABLE-1224
-- GreenC 16:13, 23 December 2024 (UTC)
ufc.com/fighter
[edit]Tag: FABLE-1224
-- GreenC 16:18, 23 December 2024 (UTC)
uctv.tv
[edit]- https://www.uctv.tv/search-details.asp?showID=5048
- https://www.uctv.tv/search-details.aspx?showID=5048
Tag: FABLE-1224
-- GreenC 16:21, 23 December 2024 (UTC)
torontofc.ca
[edit]- http://www.torontofc.ca/news/2015/06/dwayne-de-rosario-calls-it-career
- http://www.torontofc.ca/news/dwayne-de-rosario-calls-it-career
Tag: FABLE-1224
-- GreenC 16:32, 23 December 2024 (UTC)
topspeed.com/cars
[edit]Normal redirects, but some of the redirects go to a 404 page:
- https://www.topspeed.com/cars/mercedes/2015-mercedes-cls63-amg-ar164010.html
- https://www.topspeed.com/cars/mercedes/2015-mercedes-cls63-amg/
Tag: FABLE-1224
-- GreenC 16:45, 23 December 2024 (UTC)
timbers.com
[edit]- http://www.timbers.com/t2/2015/06/usl-match-recap-seattle-sounders-2-2-portland-timbers-2-0
- https://www.timbers.com/news/usl-match-recap-seattle-sounders-2-2-portland-timbers-2-0
Tag: FABLE-1224
-- GreenC 16:55, 23 December 2024 (UTC)
Variety.com
[edit]https://variety.com/2007/digital/news/chipmunks-befriend-earl-star-1117960746/?jwsource=cl
This page 2601:601:D37F:3C50:8C3D:9F83:F3FF:5BFC (talk) 18:44, 24 December 2024 (UTC)
- Fixed. Special:Diff/1264338613/1265041546. Done -- GreenC 19:57, 24 December 2024 (UTC)