Agenda
- Location: #wikimedia-office IRC channel
- Topic: T128602: Create and deploy an extension that implements an authenticated key-value store
- Meeting type: Problem definition
Meeting summary
- LINK: https://phabricator.wikimedia.org/E237 Phab event link (robla, 21:01:38)
- LINK: https://phabricator.wikimedia.org/T128602 authenticated key-value store RFC (robla, 21:02:09)
- Question discussed: does the store need to be authenticated? (robla, 21:05:38)
- conversation turned to discussion of use cases for Mobile App (robla, 21:08:47)
- Discussion turned to authentication possibilities, and then to using user_props (robla, 21:24:36)
- LINK: https://www.mediawiki.org/wiki/Manual:User_properties_table (Marybelle, 21:26:03)
- LINK: https://remotestorage.io/ (DanielK_WMDE__, 21:27:11)
- DanielK_WMDE__ makes the case for using remoteStorage IETF draft implementation; discussion about that will likely continue in the RFC (robla, 21:37:08)
- ACTION: ArchCom needs to bump the priority on a watchlist specific RFC (robla, 21:56:18)
People present (lines said)
- brion (78)
- tgr (37)
- Marybelle (36)
- robla (31)
- DanielK_WMDE__ (30)
- anomie (28)
- SMalyshev (25)
- gwicke (22)
- dbrant (14)
- niedzielski (12)
- Scott_WUaS (7)
- wm-labs-meetbot` (3)
- TimStarling (3)
- mdholloway (2)
- Pchelolo (1)
Full log
1 | 21:00:40 <robla> #startmeeting ArchCom 2016W30: authenticated key-value store |
---|---|
2 | 21:00:40 <wm-labs-meetbot`> Meeting started Wed Jul 27 21:00:40 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. |
3 | 21:00:40 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. |
4 | 21:00:40 <wm-labs-meetbot`> The meeting name has been set to 'archcom_2016w30__authenticated_key_value_store' |
5 | 21:01:01 <robla> #topic Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ |
6 | 21:01:25 <Marybelle> tgr: When you say "large amounts of global data which are needed infrequently", what do you mean specifically? |
7 | 21:01:38 <robla> #link https://phabricator.wikimedia.org/E237 Phab event link |
8 | 21:02:09 <robla> #link https://phabricator.wikimedia.org/T128602 authenticated key-value store RFC |
9 | 21:02:13 <tgr> the specific use case that resulted in this RfC was reading lists |
10 | 21:02:42 <tgr> ie lists of favorite articles which are synchronized across devices |
11 | 21:03:03 <Marybelle> > While "reading lists" is one of the reasons the app teams want this, this isn't an implementation of page lists. |
12 | 21:03:05 <robla> dbrant and anomie: thanks for your work on this! |
13 | 21:03:07 <Marybelle> This task makes no sense to me. |
14 | 21:03:28 <robla> I think maybe we can start with the list of questions in the task |
15 | 21:03:37 <Marybelle> If you want a private list of favorite articles, that sounds like a watchlist. |
16 | 21:03:41 <SMalyshev> question: does the store itself needs to be authenticated? I mean, we store sessions relying just on long random ID to be secure. Can't we use another long random ID to secure prefs? |
17 | 21:04:06 <Scott_WUaS> Hi All! |
18 | 21:04:16 <niedzielski> o/ |
19 | 21:04:18 <brion> SMalyshev: by default you'd do it behind the API which is authenticated |
20 | 21:04:23 <anomie> SMalyshev: Sessions are that way because of the bootstrapping problem: you have to start *somewhere*. |
21 | 21:04:32 <brion> No need to invent new auth methods |
22 | 21:04:47 <robla> alright, I guess we'll start with SMalyshev 's question, and then move to "Should this be implemented as a MediaWiki action API endpoint or a restbase service?" |
23 | 21:04:51 <mdholloway> SMalyshev: at least for the apps' use case, the reading lists should be private and therefore we'd need authentication |
24 | 21:04:54 <mdholloway> hi all, btw |
25 | 21:04:55 <tgr> SMalyshev: not sure what you mean by the store itself being authenticated |
26 | 21:05:06 <SMalyshev> brion: well, yes, if it's mediawiki API then sure. but understand it's one of the options? |
27 | 21:05:10 <Marybelle> robla: I think focusing on implementation before defining use-cases is silly. |
28 | 21:05:11 <tgr> the API should authenticate, just like sessions do |
29 | 21:05:38 <robla> #info Question discussed: does the store need to be authenticated? |
30 | 21:05:56 <brion> Marybelle: the implementation of the store? Or the client side app feature that uses it? |
31 | 21:06:06 <anomie> If it's in the action API, it'll be authenticated just like anything else. If it's in restbase, I think that has support for the same sort of thing (but I don't know details). |
32 | 21:06:16 <tgr> how that's represented in the backend is an implementation details that is IMO low relevance compared to other factors for choosing backends |
33 | 21:06:20 <brion> The app is a thing that already exists and already has a reading list feature. |
34 | 21:06:23 <Marybelle> brion: The use-cases of having another authenticated store. |
35 | 21:06:43 <Marybelle> If this is for private lists of articles, that sounds like a watchlist. |
36 | 21:07:02 <brion> Marybelle ok. The use case is to store per user private data on the server and be able to retrieve and update it, with no versioning. |
37 | 21:07:23 <Marybelle> We do that already in several places. :-) |
38 | 21:07:47 <Marybelle> I'm not sure making yet another private place is needed. |
39 | 21:07:55 <brion> Marybelle: yes, in ways that don't match this use case as already replied on the bug. |
40 | 21:08:08 <Marybelle> Okay. I guess I'm the only person not getting it. |
41 | 21:08:09 <brion> Either we can change one of those methods to match, or make a new one. |
42 | 21:08:13 <tgr> note that the apps already store global authenticate private data, but they are currently using user_props for that, which is not terrible but suboptimal |
43 | 21:08:13 <dbrant> Marybelle: this is a sort of generalization of watchlists, where the user can have multiple lists, each with its own attributes (name, description, whether it's saved for offline reading, etc) |
44 | 21:08:26 <tgr> so reading lists are not in fact the only current use case |
45 | 21:08:47 <robla> #info conversation turned to discussion of use cases for Mobile App |
46 | 21:08:56 <niedzielski> tgr: we currently use user prefs to store theme info. we'd probably migrate that to this new store |
47 | 21:09:02 <Marybelle> brion: Right. I'd really like to avoid yet another special thing to maintain indefinitely. |
48 | 21:09:05 <niedzielski> (just as an example) |
49 | 21:09:21 <brion> *nod* keep em simple when possible :) |
50 | 21:09:22 <SMalyshev> reading https://phabricator.wikimedia.org/T128602#2499662 it looks like the delta vs. user_properties is mostly size/performance concerns? |
51 | 21:09:30 <Marybelle> niedzielski: You think MediaWiki's database should store a user's app theme preference? |
52 | 21:09:32 <gwicke> dbrant: do you see this to eventually covering collection (book) functionality as well? |
53 | 21:10:22 <tgr> SMalyshev: and keeping complexity down |
54 | 21:10:36 <tgr> user prefs need to be loaded as a bundle, for perf reasons |
55 | 21:10:50 <niedzielski> Marybelle: what i mean to say is that we currently use user options to store a theme preference. if we had a general purpose key / value store, that would be a more appropriate spot |
56 | 21:10:52 <tgr> reading lists would have to be loaded one by one, again for perf reasons |
57 | 21:11:02 <anomie> SMalyshev: On the back end, anyway, although there's a question of whether it should be cross-wiki or if users should just pick one "central" wiki. On the front end it'll support fetching individual keys instead of having to fetch the whole thing at once, and possibly some other bits. |
58 | 21:11:14 <Marybelle> niedzielski: I'm not sure I get why a mobile app/client gets to use MediaWiki's database to store its preferences. |
59 | 21:11:18 <tgr> mixing the two would probably result in more awkward code then having different services for them |
60 | 21:11:25 <Marybelle> That seems weird. |
61 | 21:11:28 <DanielK_WMDE__> from the RFC, it seems to me that one premise was "re-doing watchlists is hard, let's just a do a key/value store, that's easy". |
62 | 21:11:30 <DanielK_WMDE__> But perhaps it's not that easy to do right, and so we should perhaps re-do watchlists instead? After all, we discussed global watchlists last week! Perhaps we'll be doing that anyway... |
63 | 21:11:52 <niedzielski> Marybelle: do you mean why store preferences on a server vs on device? or mediawiki specifically? |
64 | 21:11:57 <dbrant> Marybelle: the MediaWiki db already stores plenty of user settings. Whether these settings apply to the desktop browsing experience or the app experience shouldn't matter. |
65 | 21:12:00 <brion> Marybelle: because it's an app for the site which you log in with your site credentials, and it's that or invent a new storage service ? |
66 | 21:12:15 <DanielK_WMDE__> We have several needs that drive changes to watchlists: global watchlists, multiple watchlists, automatic expiry... |
67 | 21:12:25 <anomie> DanielK_WMDE__: Part of the idea was prototyping their reading lists on top of a basic key-value store, instead of designing something then having to redo it when they find out they need different behavior. |
68 | 21:13:01 <SMalyshev> DanielK_WMDE__: I think it's bigger than just watchlists, watchlists would be one usecase for this? |
69 | 21:13:24 <tgr> DanielK_WMDE__: actually the original proposal was to the k-v story as a temporary solution that makes sense on its own as well and then eventually migrate to a dedicated lists API based on some sort of lists concept in core |
70 | 21:13:35 <brion> Right thisll be used for the other user data that's currently shoehorned into userjs prefs as I understand? |
71 | 21:13:36 <tgr> we might have given up on that by now, not sure |
72 | 21:13:39 <Marybelle> brion: I guess I think about non-Wikimedia Foundation apps. Would those clients also be using MediaWiki's/Wikimedia's database to store their user preferences and data? |
73 | 21:13:52 <robla> anomie: having a flexible solution sounds really nice. what if we find out it didn't work the way we wanted it to? How does this not become tech debt? |
74 | 21:13:53 <dbrant> gwicke: i wouldn't see why not; do you mean something like "turn my reading list into a pdf"? |
75 | 21:13:55 <brion> Marybelle: sure, why not? |
76 | 21:14:02 <anomie> DanielK_WMDE__: As for watchlists in particular, the apps want the ability to have multiple lists that each aren't limited to a single wiki, and extra metadata, but not (I think?) actually the recentchanges-filtering functionality watchlists have. |
77 | 21:14:05 <DanielK_WMDE__> SMalyshev: the driving use cases is (named) reading lists (aka bookmarks). which is very similar to watchlists. a k/v store could be used to cover this to some degree, as long as the lists don't become very large. |
78 | 21:14:13 <gwicke> dbrant: yes |
79 | 21:14:13 <Marybelle> It seems outside the scope and responsibility of MediaWiki a bit. |
80 | 21:14:57 <brion> Marybelle: no more than watchlist and user prefs for the web UI surely |
81 | 21:15:00 <anomie> robla: Are you asking about the key-value store, or a specialized reading-list service? If the former, that's the nice thing about a simple, generic key-value service. |
82 | 21:15:05 <DanielK_WMDE__> anomie: cross-wiki watchlists are a lot of fun, as discussed here last week (or was it the week before)? |
83 | 21:15:28 <Marybelle> brion: MediaWiki the application using the MediaWiki database isn't so crazy. Any random client application using the MediaWiki database seems a lot zanier. |
84 | 21:15:29 <tgr> DanielK_WMDE__: apart from lists in core being a big and long project, I think it would be much more sane to go into it *after* we have a good understanding of the use cases, and a key-value store is great for prototyping |
85 | 21:15:32 <brion> Those can all be implemented separately and could invent separate places to store their data, but I think it would not be super practical |
86 | 21:16:03 <brion> Depends how narrowly or broadly you view MediaWiki IMO |
87 | 21:16:15 <DanielK_WMDE__> i see two questions here, and we should perhaps pick one to discuss. a) do we want/need a generic k/v store, what needs does it address, what features does it need? and b) how do we best implement (or prototype) reading lists for mobile? |
88 | 21:16:20 <brion> And how narrowly or broadly you view Wikipedia as a site or product or place |
89 | 21:16:22 <DanielK_WMDE__> which of the two should we discuss? |
90 | 21:16:27 <tgr> Gather tried to build its own API from the start and maintain it across use-case changes, and pretty much ended up with a key-value store (JSON blobs in an SQL table) with lots of cruft around it |
91 | 21:16:33 <robla> anomie: more the latter. I'll simplify to 3 options: 1) wild success 2) questionable success 3) obvious failure. outcome 2 is where tech debt accrues |
92 | 21:16:42 <gwicke> one issue with a generic key-value store without schema enforcement is that any client side app could write any kind of blob to any key |
93 | 21:16:45 <anomie> And really, one of the open questions here is whether the key-value store should actually be in MediaWiki (action API) or should be a separate service for WMF app use (restbase). |
94 | 21:17:04 <gwicke> this would put the burden of schema checking / validation squarely on the client |
95 | 21:17:13 <Marybelle> brion: Do other big sites let client applications use their databases for arbitrary private data? Like Twitter and Facebook and friends? |
96 | 21:17:51 <anomie> gwicke: You say that like it's a disadvantage. It could as well be an advantage. |
97 | 21:17:57 <brion> Marybelle: are they platforms for sharing free knowledge? |
98 | 21:18:02 <brion> :) |
99 | 21:18:11 <gwicke> anomie: it's a trade-off |
100 | 21:18:21 <brion> Anyway, we already can store tons of arbitrary data as you point out Marybelle |
101 | 21:18:37 <brion> The question is can we do it in a way that's efficient and meets the needs of users |
102 | 21:18:41 <Marybelle> I mean, if I were making a regular bookmark application, I wouldn't expect MediaWiki to be my back-end off-hand. |
103 | 21:18:45 <gwicke> there is the related issue of schema migrations |
104 | 21:18:58 <anomie> robla: Yeah, a specialized reading-list service would certainly have the danger of falling into #2. That's why I personally don't want to build one, at least not without decent planning to make it more likely to hit #1. |
105 | 21:18:59 <SMalyshev> given that you can just create a wiki page and dump the data there I don't think it changes a lot |
106 | 21:19:06 <gwicke> and format versioning |
107 | 21:19:20 <brion> Schemas are out of scope for now, imo |
108 | 21:19:32 <Scott_WUaS> (gwicke: can you please clarify what "collection (book) functionality" is? Thanks) |
109 | 21:19:54 <anomie> Scott_WUaS: I'm guessing https://www.mediawiki.org/wiki/Extension:Collection |
110 | 21:19:58 <gwicke> brion: they are necessarily in scope, the question is just where you handle them |
111 | 21:20:03 <brion> Marybelle: maybe, or you might store it in one of the several places in the application servers user database that it makes available for that sort of thing |
112 | 21:20:04 <Marybelle> brion: This use-case seems a bit against sharing free knowledge, if these are per-user and private, FWIW. |
113 | 21:20:15 <Scott_WUaS> anomie: thnx |
114 | 21:20:20 <gwicke> if you say it's out of scope on the server, then that implicitly means that they will need to be handled on the client |
115 | 21:20:32 <Marybelle> Or a separate server. |
116 | 21:20:34 <DanielK_WMDE__> would it be an option to just expose an existing K/V system to the public (with the necesssary auth in place)? |
117 | 21:20:39 <brion> gwicke: yes it's explicitly in the clients sphere of responsibility |
118 | 21:21:00 <Marybelle> DanielK_WMDE__: Existing like Redis or something? |
119 | 21:21:06 <anomie> gwicke: Once you start shoving schemas and stuff into it, it's no longer a generic key-value store. The client is free to implement a schema on top of a generic key-value store if it wants, which makes the store itself more flexible. |
120 | 21:21:15 <gwicke> with multiple clients, this might be tricky to support |
121 | 21:21:19 <brion> DanielK_WMDE__: if we can query individual items and not send them to every view, user props would work. |
122 | 21:21:26 <SMalyshev> DanielK_WMDE__: I think that would be one of the solutions. If we have a suitable one |
123 | 21:21:27 <DanielK_WMDE__> Marybelle: yes. though redis explicitly says it's designed to be accessed by trusted clients only (i just checked) http://redis.io/topics/security |
124 | 21:21:31 <brion> That's basically the difference |
125 | 21:21:46 <robla> Marybelle is not likely to be convinced about use cases this hour, but other folks seem more interested in talking implementation, so let's focus on implementation |
126 | 21:21:53 <brion> :) |
127 | 21:21:58 <SMalyshev> DanielK_WMDE__: it doesn't have to be redis directly, can be redis (or other non-auth k/v) behind Mediawiki API front |
128 | 21:22:03 <tgr> anomie: I guess something similar to how EventLogging schemas are handled could be done, I doubt it's worth the effort though |
129 | 21:22:05 <Marybelle> I asked on the task about just using a separate key/namespace in user_props and just filtering. |
130 | 21:22:14 <Marybelle> You don't have to send every user option on every page load. |
131 | 21:22:17 <Marybelle> I'm not sure why we do. |
132 | 21:22:17 <DanielK_WMDE__> brion: ok, new plan: hack user props that keys starting with an underscore will be skipped when writing props into jsconfig. |
133 | 21:22:20 <SMalyshev> DanielK_WMDE__: which supports only API like "give me my data", not "give me her data" |
134 | 21:22:21 <brion> Marybelle might work |
135 | 21:22:29 <Marybelle> DanielK_WMDE__: +1 |
136 | 21:22:47 <brion> :) |
137 | 21:23:05 <anomie> DanielK_WMDE__: Disadvantage: every existing user of user_props has to be updated to deal with the filtering, and we have to make sure the additional data doesn't have negative performance impact on the existing uses. |
138 | 21:23:10 <DanielK_WMDE__> filtering user_props by prefix seems the simples solution by far... |
139 | 21:23:21 <brion> Are there any low level probs with how that's stored? |
140 | 21:23:21 <niedzielski> would user options be able to hold thousands of arbitrary keys ok? |
141 | 21:23:30 <DanielK_WMDE__> anomie: yes. existing uses need to be surveyed |
142 | 21:23:35 <tgr> see https://phabricator.wikimedia.org/T128602#2499662 for a list of problems with user_props |
143 | 21:23:39 <SMalyshev> DanielK_WMDE__: if there's separate keys for up_property that would work I think |
144 | 21:24:01 <gwicke> setting up key-value storage is fairly easy, I would say quite a bit easier than handling format versioning & migrations correctly |
145 | 21:24:08 <anomie> Also, jcrespo didn't like the idea of putting it in the main database much in https://phabricator.wikimedia.org/T128602#2476545 |
146 | 21:24:10 <brion> Ok there's a byte limit I think on those, is that a problem? |
147 | 21:24:16 <brion> Could be lifted with a schema tweak |
148 | 21:24:17 <SMalyshev> anomie: does existing API right now just dumps all keys for the user, or it chooses specific ones? |
149 | 21:24:32 <brion> Ah yes, and we did have req to move to separate db cluster |
150 | 21:24:36 <robla> #info Discussion turned to authentication possibilities, and then to using user_props |
151 | 21:24:38 <brion> Which is easy to do per table iirc |
152 | 21:24:45 <Marybelle> tgr: First three bullets seem trivially solvable. |
153 | 21:24:46 <anomie> SMalyshev: The existing API query only supports fetching all data for the user, unless I'm completely mistaken. |
154 | 21:25:01 <DanielK_WMDE__> hm, i just found this: https://remotestorage.io/ |
155 | 21:25:02 <dbrant> brion: what's the limit, roughly? |
156 | 21:25:03 <brion> Yeah API needs enhancement for query |
157 | 21:25:07 <niedzielski> maybe it would be easier to drop a new generic key value store if the feature is unpopular than to clear user options |
158 | 21:25:15 <DanielK_WMDE__> no idea if it's good, but it seems worth a look. |
159 | 21:25:16 <brion> dbrant: 65534 bytes iirc |
160 | 21:25:17 <SMalyshev> anomie: right, but that's only one API. I don't think it should be too hard to make this API skip certain keys in DB? |
161 | 21:25:29 <brion> Should be a matter of changing column type |
162 | 21:25:54 <gwicke> niedzielski: if there are clear patterns in how keys are structured, or there is only a single use case using this service, yes |
163 | 21:25:56 <SMalyshev> anomie: also, if main DB is bad, we could make it two-stage - store opaque id in main db, store actual data in better storage |
164 | 21:26:03 <Marybelle> https://www.mediawiki.org/wiki/Manual:User_properties_table |
165 | 21:26:12 <anomie> SMalyshev: Then you're **really** making things complicated. |
166 | 21:26:30 <SMalyshev> it's not *that* complicated I think... just one more call |
167 | 21:26:31 <anomie> At that point, why not just use the actual-data storage? |
168 | 21:26:33 <Marybelle> How is a filter complicated? |
169 | 21:26:35 <anomie> directly |
170 | 21:26:41 <SMalyshev> anomie: because auth, etc. |
171 | 21:26:54 <tgr> again, if the goal is keeping things simple then having two separate systems with a very simple mode of operation seems better than having one that tries to be the mix of the two |
172 | 21:26:56 <SMalyshev> I think maintaining two auth systems in sync is worse |
173 | 21:26:58 <anomie> SMalyshev: No, not direct access to the backend. |
174 | 21:27:00 <DanielK_WMDE__> Huh. interesting. https://remotestorage.io/ is sponsored by the Wau Holland Stiftung? that indicates it's not industry bullshit. doesn't ell me if it's any good for our use case, but |
175 | 21:27:11 <DanielK_WMDE__> #link https://remotestorage.io/ |
176 | 21:27:22 <brion> Ooh that's neat |
177 | 21:27:30 <anomie> SMalyshev: But the frontend directly accessing the better-storage backend instead of looking up in user_properties then in the better-storage. |
178 | 21:27:32 * brion bookmarks for later |
179 | 21:27:33 <SMalyshev> anomie: that means two APIs instead of one, otherwise the same. I'd rather have users learn one API :) |
180 | 21:28:05 <anomie> SMalyshev: Ah, the Gather approach? |
181 | 21:28:19 <TimStarling> my vote is to just add a table |
182 | 21:28:35 <gwicke> data like reading lists would likely be interesting to multiple applications; I would be surprised if the app would end up being the only user |
183 | 21:28:46 <TimStarling> avoid joins so that you can hack up some cross-server thing later if need be, query groups or something |
184 | 21:29:01 * anomie is actually more interested in "BagOStuff vs some custom abstraction" over "where exactly do we hide the database table" |
185 | 21:29:06 <TimStarling> this whole project seems like something that could be done in hardly any more time than it takes to have this meeting |
186 | 21:29:16 <brion> Hehe |
187 | 21:29:32 <gwicke> simple key-value starage, sure |
188 | 21:29:34 <SMalyshev> defining what to do is often longer than doing it... it's normal :) |
189 | 21:29:42 <gwicke> but that's hardly a solution |
190 | 21:29:47 <dbrant> +1 to gwicke -- i can easily see mobile web and/or desktop surfacing reading lists for logged-in users. |
191 | 21:29:50 <robla> anomie: should we try to answer that question now? |
192 | 21:29:52 <Scott_WUaS> :) |
193 | 21:30:25 <DanielK_WMDE__> Is this spec in line with what we need? https://datatracker.ietf.org/doc/draft-dejong-remotestorage/ |
194 | 21:30:29 <brion> That might be a more interesting question for the watchlist cross wiki magic future discussion :) |
195 | 21:30:39 <anomie> robla: Wouldn't hurt. Although "action API vs restbase" is an even more interesting question, since it's the difference between me writing it and me not ;) |
196 | 21:30:49 <DanielK_WMDE__> if so, perhaps we can rely on some ready-to-go solution. |
197 | 21:31:07 <brion> I just don't want mobile apps to get bogged down in the meantime while we bikeshed reading lists |
198 | 21:31:17 <gwicke> dbrant: once we have multiple clients, it might make more sense to provide an API that gives a bit more guarantees than just "it's a blob of bytes" |
199 | 21:31:25 <SMalyshev> DanielK_WMDE__: does it have TLDR description? |
200 | 21:31:47 <DanielK_WMDE__> SMalyshev: https://remotestorage.io/ |
201 | 21:32:48 <gwicke> updating the schema when all your schema handling is implemented in $n clients sounds hard |
202 | 21:32:58 <SMalyshev> DanielK_WMDE__: this seems to be client-side storage? |
203 | 21:33:26 <tgr> DanielK_WMDE__: how would auth work? |
204 | 21:33:30 <SMalyshev> or the picture is misleading |
205 | 21:33:41 <DanielK_WMDE__> SMalyshev: no, it's not. |
206 | 21:33:50 <tgr> DanielK_WMDE__: "If <auth-dialog> is a URL, the user can supply their credentials for accessing the account (how, is out of scope) |
207 | 21:33:58 <tgr> - not so promising |
208 | 21:33:59 <DanielK_WMDE__> SMalyshev: the client picks it's storage provider. local is one option. all providers implement the same protocol |
209 | 21:34:00 <anomie> DanielK_WMDE__: At a glance, that looks like it would serve the purpose. It wouldn't be able to be done in the context of the action API, but that's not a requirement (assuming it's not me writing it, anyway). |
210 | 21:34:00 <Marybelle> Is 65,535 bytes really insufficient? |
211 | 21:34:25 <brion> Long names add up and it's hard to make guarantees :) |
212 | 21:34:25 <DanielK_WMDE__> tgr: they seem to use oauth |
213 | 21:34:29 <Marybelle> That sounds like a lot of page IDs. |
214 | 21:34:43 <brion> 64k would likely be enough in practice until it breaks on some extreme case |
215 | 21:34:43 <robla> SMalyshev: it looks like the storage is OAuth protected server storage (that basically provides similar constraints to most client storage solutions) |
216 | 21:35:00 <tgr> forcing all apps to go through an extra oauth authorization screen might be suboptimal |
217 | 21:35:14 <DanielK_WMDE__> using an actual standard to implement this would be nice. |
218 | 21:35:27 <DanielK_WMDE__> forhtering the idea of "clients pick where they store their data" is even nicer |
219 | 21:35:29 <robla> DanielK_WMDE__ agreed |
220 | 21:35:32 <tgr> anyway, that protocol seems like it might be a better solution but evaluating it within this IRC meeting is not realistic |
221 | 21:35:49 <brion> Well ideally you'd probably want the same login for login and storage in a case like this |
222 | 21:35:52 <DanielK_WMDE__> even if we implement this ourselves, perhaps we should implement the protocol defined there. |
223 | 21:36:20 <DanielK_WMDE__> tgr: i agree. i'll link to it from the ticket |
224 | 21:36:23 <brion> I am curious about it and it smells useful for unofficial apps and such, do bring it up in future :) |
225 | 21:36:24 <tgr> and worth mentioning again that building a k-v store that relies on the action API is super simple |
226 | 21:36:27 <dbrant> Marybelle: brion: for reference, the current record holder for the most pages in reading lists in our app is over 7000. |
227 | 21:36:50 <tgr> bulding a new API that should match a draft protocol with OAuth 2 and whatnot is no |
228 | 21:36:52 <niedzielski> i don't think 64k would be too small personally. it would encourage favoring distinct keys instead of json blobs |
229 | 21:36:54 <tgr> *not* |
230 | 21:37:08 <robla> #info DanielK_WMDE__ makes the case for using remoteStorage IETF draft implementation; discussion about that will likely continue in the RFC |
231 | 21:37:12 <brion> 7000 dang! Might fit in int IDs bit maybe not titles yeah :) |
232 | 21:37:38 <brion> Anyway that bits changeable if we need it |
233 | 21:38:01 <brion> Easy to bump the column type up one |
234 | 21:38:32 <SMalyshev> I understand that if we use IETF one we'd need a client for remoteStorage protocol and a backend. which should rely on some kind of k'v storage inside mediawiki |
235 | 21:39:07 <SMalyshev> so we're kind of back to sq. 1? maybe with more standard client API |
236 | 21:39:20 <robla> so...my understanding is that the Mobile Apps team plans to prototype *something* this quarter; should they use one of our existing tech, or should they explore something new? |
237 | 21:39:38 <brion> So we have some interest in that protocol; and some talk about tweaking user props with a modified MW API method, and some talk about just adding a table still. Any other current main alternatives? |
238 | 21:39:56 <brion> :) |
239 | 21:40:11 <tgr> restbase I suppose? |
240 | 21:40:15 * robla notes that the "IETF one" is a draft that hasn't made it to "Proposed Standard", and that submitting an IETF draft is trivial |
241 | 21:40:28 <SMalyshev> tgr: does restbase authenticate? |
242 | 21:41:35 <Pchelolo> SMalyshev: restbase can authenticate |
243 | 21:42:02 <brion> Ok so that's another possibility yes :) |
244 | 21:42:24 <brion> Though I think gwicke would prefer restbase services to be better specified for their data schemas ? |
245 | 21:42:41 <brion> And this is still a young feature likely to change in details |
246 | 21:42:46 <Marybelle> dbrant: How common is a reading list of 7000 pages? Just one user? A dozen users? |
247 | 21:42:52 <dbrant> robla: to be clear, this isn't a blocker for our current goals this quarter. it would simply make our reading lists become a "complete" feature, the way we intended it to be (technically, two quarters ago :) ). |
248 | 21:42:55 <brion> Eventually hopefully merging into fancy watchlists |
249 | 21:43:25 <robla> dbrant: thanks for the clarification! |
250 | 21:43:34 <brion> That helps us set timelines :) |
251 | 21:43:38 <Marybelle> brion: I get pretty frustrated at how bad watchlists are, especially when I see people basically working around them instead of investing resources to fix them. :-/ |
252 | 21:43:51 <brion> Understandable! |
253 | 21:44:13 <brion> I think we need to bring some more focus on that I agree |
254 | 21:44:46 <gwicke> added a comment re versioning & API stability at https://phabricator.wikimedia.org/T128602#2500338 |
255 | 21:44:48 <Marybelle> For the general idea of private reading lists, it's kind of maddening that watchlists won't work. |
256 | 21:44:54 <Marybelle> Sigh. |
257 | 21:45:07 * anomie wouldn't mind working on watchlists, but there are currently enough other cooks in that kitchen and that's not the problem at hand here. |
258 | 21:45:10 * robla tries to remember what ori and Steven Walling were pushing for a few years ago on the watchlist front |
259 | 21:45:14 <DanielK_WMDE__> SMalyshev: the point was that client libraries and backend implementations exist, we wouldn't have to write them (if the current ones are good - which i don't know) |
260 | 21:45:39 <SMalyshev> DanielK_WMDE__: I don't think backend which we need (i.e. with mediawiki auth) exists? |
261 | 21:45:42 <DanielK_WMDE__> (sorry for jumping back to this, don't let me distract you9 |
262 | 21:45:53 <brion> dbrant: does the enhanced user pref model sound like a good short term for you as a sync mechanism? I think we all still agree fancier watchlist integration would be great for future |
263 | 21:45:55 <DanielK_WMDE__> SMalyshev: backends with oath exist |
264 | 21:46:25 <tgr> with a page lists API it really helps if you have a good idea what features you'll need exactly |
265 | 21:46:37 <dbrant> Marybelle: there are over 100 users with at least 1000 pages in lists. Over 4000 users with at least 100 pages, etc... |
266 | 21:46:42 <tgr> that's one place where Gather dug itself into the ground |
267 | 21:46:55 <tgr> a k-v store is ideal for prototyping |
268 | 21:47:21 <SMalyshev> DanielK_WMDE__: yeah but oauth against what? we need something on mediawiki side to do actual r/w... even if oauth plugs seamlessly there. Maybe I just don't understand yet what that API does :) |
269 | 21:47:24 <tgr> for reading lists, and for a number of (non-list-related) future features I'd imagine |
270 | 21:47:39 <dbrant> brion: i think that can definitely work, as long as it can handle a large number of keys. |
271 | 21:48:07 <brion> Large number of keys should work with that model yes. You'd need a bulk lookup thiugh too? |
272 | 21:48:12 <tgr> DanielK_WMDE__: note that this seems to be using OAuth 2 which MediaWiki does not support |
273 | 21:48:18 <anomie> dbrant, brion: "as long as it can handle a large number of keys" is a good question. Again, https://phabricator.wikimedia.org/T128602#2476545 |
274 | 21:48:22 <DanielK_WMDE__> tgr: hm, right... |
275 | 21:48:24 <brion> Heh |
276 | 21:48:37 <tgr> probably not a huge undertaking to fix but way larger than the one proposed in this RfC |
277 | 21:48:44 <Marybelle> robla: https://www.mediawiki.org/wiki/Requests_for_comment/Support_for_user-specific_page_lists_in_core |
278 | 21:49:18 <robla> ah, that's the one, thanks Marybelle |
279 | 21:49:25 <dbrant> brion: right, lookup too. |
280 | 21:50:16 <dbrant> but then, if we're talking about a short term solution, we can limit things on the client end, too. |
281 | 21:50:32 <Scott_WUaS> (anomie: just noticed that you're mentioned in this BBC article "Meet the 'bots' that edit Wikipedia" - http://www.bbc.com/news/magazine-18892510 :) |
282 | 21:51:26 <tgr> one thing that hasn't been discussed is how much effort it would take to prevent abuse / how afraid we are it would happen |
283 | 21:51:38 <brion> dbrant: ok let's maybe model a couple variants. Large blob, vs row per title? Then confirm whether they make sense on a tweaked user props table, and decide whether to look more at the alternatives? |
284 | 21:51:53 <tgr> pirates using it for movie distribution or whatnot |
285 | 21:52:22 <gwicke> a quota can take care of that |
286 | 21:52:36 <dbrant> brion: +1 niedzielski: what do you think of that? & |
287 | 21:52:45 <brion> tgr: good question. There's little we can do to prevent use of user props as a file sharing or DoS space usage against us, other than "it's inconvenient and there are probably easier ways to abuse the system" |
288 | 21:53:01 <niedzielski> brion dbrant: one of the problems we consider with row vs blob were race conditions between clients. user options didn't seem well designed to handle that |
289 | 21:53:07 <robla> tgr: the security considerations section of https://datatracker.ietf.org/doc/draft-dejong-remotestorage/?include_text=1 looks like a good start on a list |
290 | 21:53:13 <brion> Changing API migh make it a bit easier to abuse buy not much |
291 | 21:53:49 <niedzielski> brion dbrant: for example in the blob scenario, if two clients try to update the same list, the last client wins. there's also bandwidth concerns for the 7000 title person |
292 | 21:53:53 <brion> niedzielski: yeah, you'd have to detect conflicts through some other means like a signal value |
293 | 21:54:10 <brion> Goes smoother with smaller bits, but that complicates the filtering |
294 | 21:54:15 <tgr> brion: we could do all kinds of usage tracking, user agent filtering etc |
295 | 21:54:19 <gwicke> conflict resolution is somewhat orthogonal from storage strategy |
296 | 21:54:30 <tgr> but building it is little effort and those things probably aren't |
297 | 21:54:33 <brion> Ok were low on time :) |
298 | 21:54:48 <brion> robla: shall we plan next steps? |
299 | 21:55:06 * robla ponders what that would be |
300 | 21:55:25 <gwicke> overall, I'm honestly sceptical about the value of using a generic key-blob storage service for use cases like reading lists |
301 | 21:55:39 <brion> I think we want to bump pruoity on the more watchlist specific rfc! |
302 | 21:55:44 <gwicke> if the use case is so well defined, then I think it deserves a real API |
303 | 21:55:47 <brion> Priority |
304 | 21:56:00 <anomie> gwicke: "if" |
305 | 21:56:02 <brion> gwicke: agreed, medium to long term |
306 | 21:56:18 <robla> #action ArchCom needs to bump the priority on a watchlist specific RFC |
307 | 21:56:30 <Scott_WUaS> "if" |
308 | 21:56:35 <niedzielski> brion dbrant: IIRC, we also had concerns with a list of page IDs vs a page ID with a list of lists. i don't think we came up with a great way to handle that and had to use the list title as the ID |
309 | 21:56:58 <brion> Short term: I'll follow up with dbrant and niedzielski on using user prefs modified and see if that still makes sense |
310 | 21:57:29 <brion> And anyone else want to do more research on general user data storage with that protocol? |
311 | 21:57:45 <brion> It sounds potentially very useful for unofficial third-party tools and such |
312 | 21:57:52 <dbrant> brion: sounds good |
313 | 21:57:55 <anomie> brion: I'd suggest to run it by jcrespo too |
314 | 21:58:02 <brion> Ah yes good |
315 | 21:58:21 <robla> brion: I should probably take some of those action items from you, but yes, this all looks good |
316 | 21:58:28 <brion> Hehe ok |
317 | 21:58:30 <robla> (thank you for spelling this out!) |
318 | 21:58:45 <brion> :) |
319 | 21:59:09 <robla> anomie: dbrant - any last comments questions before we close this out? |
320 | 21:59:18 <brion> Good discussion folks! |
321 | 21:59:28 <Scott_WUaS> Yes! |
322 | 21:59:29 <dbrant> robla: nope, really glad to see this moving forward |
323 | 21:59:29 * anomie has no more comments at the moment |
324 | 21:59:40 <robla> great discussion indeed....thanks everyone! |
325 | 21:59:43 <niedzielski> \o |
326 | 21:59:45 <brion> Woohoo |
327 | 21:59:48 <robla> o/ |
328 | 21:59:57 <brion> Ok I gotta run, catch y'all later |
329 | 22:00:02 <robla> #endmeeting |
Other meetings
Architecture meetings | ||
---|---|---|
13:00 PT ArchCom Planning Meetings | upcoming | all since 2016-03-30 |
14:00 PT ArchCom-RFC Meetings | upcoming | all since 2015-09-09 |