User Details
- User Since
- Aug 4 2016, 3:57 PM (434 w, 2 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Basilicofresco [ Global Accounts ]
Jun 30 2022
Holy cow. You are absolutely right.
Jun 29 2022
Jun 28 2022
Apr 14 2022
Ok, thanks. And keep in mind that speed matters when you have to montly check the ns:0 with hundreds of regexes. Many active bots on Wikipedia, I believe a good part of them, are actually using the dumps. So the efficiency should be as good as possibile... we are talking about days of cpu at 100%. Thanks for understanding!
Well, probably I did not express myself well.
The whole point of using the dump with replace.py is to rapidly filter the xml by replacements in order to speed up the process of replacing something with something else on the whole ns:0. Replace.py used to work in this way since at least 15 years.
At the moment it's just listing any page within the dump even if these pages does not contain the word "meteorite" and it is a problem because in this way the use of a big dumps become pointless. I could do the same without the dump just executing something like
python pwb.py replace.py -start:* -ns:0 -lang:it "meteorite" ""
These articles never contained the word "meteorite".
Moreover "Organo a pompa" is the very first article written in the current itwiki-20220401-pages-articles.xml dump, "Antropologia" is the second one, "Agricoltura" the third one, etc.
The problem is that it is not filtering at all the xml by replacements, it is just listing one by one every single page present in the dump.
None of the skipped page has that word. The point of running replace.py on a dump should be to load only the pages with that word and not just any page.
Apr 13 2022
Sep 24 2017
Hi! I tested it again with the latest version and it still doesn't work as expected. The problem now is slightly different: the text-contains exceptions within the user-fixes.py are always treated as non-regex. Other exceptions like inside and title work as regex as expected. I would like to stress that oddly the -regex parameter on the command line is able to solve the problem. Apparently these exceptions are precompiled with the call precompile_exceptions(exceptions, regex, flags) in the main and therefore they are using the CLI regex parameter instead of the flag within user-fixes.py.
Sep 25 2016
Can I triage this as high?
In my opinion it is a sneaky trap for bot operators that can lead to subtle errors around the encyclopedia.
Sep 4 2016
Aug 7 2016
Aug 5 2016
This morning I had the same problem. I'm running Python 3.5.1 under Win7. Interestingly logging on Commons is working flawlessly.