[go: up one dir, main page]

Page MenuHomePhabricator

MaxMind seems to be mapping the same IP to different countries
Closed, ResolvedPublic

Description

year=2024
month=5
day=10
hour=12

webrequest has multiple countries for the same ip address, and multiple isps for the same country

select * from webrequest
 where year = 2024
   and month = 5
   and day = 10
   and hour = 12
   and ip = ''

Event Timeline

hypothesis so far: maybe some workers are getting MaxMind updates on a staggered schedule from others, so there's always some variation?

Indeed, different versions of the database seems to be present on cluster hosts.

#%%
@F.udf()
def maxmind() -> str:
    import maxminddb
    from datetime import datetime
    reader = maxminddb.open_database('/usr/share/GeoIP/GeoIP2-City.mmdb')
    ts_epoch = reader.metadata().build_epoch
    return datetime.fromtimestamp(ts_epoch).strftime('%Y-%m-%d %H:%M:%S')
    

@F.udf()
def host() -> str:
    import socket
    return socket.gethostname()

#%%
out = (spark.range(1, 10000)
 .repartition("id")
 .withColumn("maxmind", maxmind())
.withColumn("host", host()) 
).cache()
#%%
out.groupBy("maxmind").agg(F.collect_set("host").alias("hosts")).show(truncate=False)

Returns

+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|maxmind            |hosts                                                                                                                                                                                                             |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|2024-05-28 15:24:59|[an-worker1142, an-worker1104, an-worker1107, an-worker1139, an-worker1125, an-worker1120, analytics1076, an-worker1165, an-worker1159, an-worker1161, an-worker1118, an-worker1103, an-worker1098, an-worker1126]|
|2024-04-23 12:31:09|[an-worker1132, an-worker1168, an-worker1141, an-worker1144, analytics1071, an-worker1111, an-worker1160, an-worker1102, an-worker1122]                                                                           |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Note this is only looking at the city db, and only for a random subset of hosts that spark happens to run partitions on.

Ottomata added subscribers: BTullis, Ottomata.

@BTullis I added DPE-SRE. Can you look into this and see if the version of the maxmind dbs is the same on all hadoop workers?

With the exception of a worker that seemingly can't be reached, the DB seem to be the exact same everywhere:

brouberol@cumin2002:~$ sudo cumin 'an-worker*.eqiad.wmnet' 'md5sum /usr/share/GeoIP/GeoIP2-City.mmdb'
100 hosts will be targeted:
an-worker[1078-1177].eqiad.wmnet
OK to proceed on 100 hosts? Enter the number of affected hosts to confirm or "q" to quit: 100
===== NODE GROUP =====
(1) an-worker1165.eqiad.wmnet
----- OUTPUT of 'md5sum /usr/shar...GeoIP2-City.mmdb' -----
ssh: connect to host an-worker1165.eqiad.wmnet port 22: Connection timed out
===== NODE GROUP =====
(99) an-worker[1078-1164,1166-1177].eqiad.wmnet
----- OUTPUT of 'md5sum /usr/shar...GeoIP2-City.mmdb' -----
784d268a38b213e9fafa572b193d1e1a  /usr/share/GeoIP/GeoIP2-City.mmdb
================
PASS |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████  |  99% (99/100) [00:21<00:00,  4.66hosts/s]
FAIL |█▉

@fkaelin Are we still observing the same discrepancy on your end?

The snippet pasted above now returns the same maxmind metadata for all hosts the job ran on: 2024-10-29 19:58:23.

brouberol claimed this task.

Cool. I'll close then.