[go: up one dir, main page]

Re: index and table corruption

From: Jerry Sievers <gsievers19(at)comcast(dot)net>
To: "Anand Kumar\, Karthik" <Karthik(dot)AnandKumar(at)classmates(dot)com>
Cc: "pgsql-general\(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: index and table corruption
Date: 2013-12-19 20:42:49
Message-ID: 864n64373q.fsf@jerry.enova.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Anand Kumar, Karthik" <Karthik(dot)AnandKumar(at)classmates(dot)com> writes:

> Thanks Shaun!
>
> Yes, we're getting synchronous_commit on right now.
>
> The log_min_duration was briefly set to 0 at the time I sent out the post,
> just to see what statements were logged right before everything went to
> hell. Didn't yield much since we very quickly realized we couldn't cope
> with the volume of logs.
>
> We also noticed that when trying to recover from a snapshot and replay
> archived wal logs, it would corrupt right away, in under an hour. When
> recovering from snapshots *without* replaying wal logs, we go on for a day
> or two without the problem, so it does seem like wal logs are probably not
> being flushed to disk as expected.

Make sure your snapshots are atomic as you probably assume they are and
in fact must be if you expect a consistent cluster after startup and
crash recovery.

That is, if you are doing snaps at random times and not wrapping with
pgstart/stop backup() *and* replaying WAL till concisconsistent recovery
point.

If you're snapping something like a remote-site mirror running SAN
block-level replication, unless the snap is done at the end of flushing
all changed blocks since last tick, then the image you're snapping may
not be consistent.

I say that because, I came into a company that had been doing snaps this
way since eons ago and thought that since the clusters would start up
and could perform trivial checks, things were OK.

As soon aas you subjected an instance dirived this way however with
something wide-ranging such as an all-table vac/analyze, dumpall... etc,
soon after launching the foo, corruption was observed.

FWIW

>
> Will update once we get onto the new h/w to see if that fixes it.
>
> Thanks,
> Karthik

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres(dot)consulting(at)comcast(dot)net
p: 312.241.7800

In response to Responses Browse pgsql-general by date
  From Date Subject
Next Message Joseph Kregloh 2013-12-19 20:46:20 Re: pg_upgrade & tablespaces
Previous Message Adrian Klaver 2013-12-19 20:41:10 Re: pg_upgrade & tablespaces