[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table lock - concurrency problem #120

Closed
volker-raschek opened this issue Dec 18, 2020 · 10 comments
Closed

table lock - concurrency problem #120

volker-raschek opened this issue Dec 18, 2020 · 10 comments

Comments

@volker-raschek
Copy link
volker-raschek commented Dec 18, 2020

Hello folks,
I have written an oracle driver for https://github.com/golang-migrate/migrate using godror as the driver. To prevent multiple instances from migrating an oracle database the schema_lock table is locked and a row is inserted. Other instances should then not start the migration.

Locally on my system and inside a container, the tests run successfully. In our internal CI the test fails with the same container image.

Information:

  • godror v0.22.2
  • go version 1.15.6
  • oracle client: 19.8.0.0.0
  • oracle db version: 12c

Under the project https://github.com/volker-raschek/locker I have implemented everything to reproduce the error. Again, local on my system the test is successfully, but in our internal CI the test fails.

Unfortunately I can't catch the error with an ORA error code, because it doesn't contain one. Sometimes I get an error message like this one dpiStmt_execute(mode=0 arrLen=-1): or nothing. I am not sure if there is something wrong with my source code or the SQL statements or godror has a bug.

Please have a look at my sample code and the logs and reproduce the error if necessary.

Volker

To Reproduce

git clone https://github.com/volker-raschek/locker
make start-oracle-xe
sleep 300 # wait until db is ready
make container-run/test/unit # maybe you must adapt DB_URL in Makefile

Expected behavior

The intergration tests are successfully on both systems (local and ci)

logs are attached as seperate files

ci.log
local-system.log

@tgulacsi
Copy link
Contributor
PASS
ok      github.com/volker-raschek/locker        1.479s

I'd use context.Context (for timeouts) and defer tx.Rollback().

@tgulacsi
Copy link
Contributor
tgulacsi commented Dec 18, 2020

What do you want to test?

In a Go program, you won't create more sql.DB pools per database, only one.
Here you create 25.

Maybe I've found a fix for the empty errors - see 7c96931

For your tests, I'd use context for timeouts:
x.patch.txt

@volker-raschek
Copy link
Author

Hey @tgulacsi ,

I should give you more information. I am writing a backend application in go
which will be packaged as container image. The container image is a base image
for other containerized applications which run in future in a kubernetes
cluster. One application, based on my container image, share the used database
scheme with other instances of the same application.

I don't know how ofter the application will be started in the cluster, but to
avoid that all instances of the application starts to migrate the database
scheme from one version to another an entry in table scheme_lock should avoid to
start the migration process multiple times.

This is also the reason for 25 instances of the sql.DB pools. Each sql.DB pool
for one running application. It can be possible, that in a cluster more than 25
instances of an application is running. Maybe 100 or 1000. I just do not know it
at the moment.

I applied your patch into my example project. I saw you implemented a mutex. I
think the mutex makes no sense for the same reason. I have n applications and
the mutex limits only simultaneous access exclusively for one instance.

Next to the context. I try to change the code to use a context. I can pass the
context over the Config struct for the oracle driver of the go-migrate package.
Ofter the Lock() and Unlock() function it is not possible. It is an
implementation of the interface.

Now again to your patch. The test is successfully on my local system. On our
internal CI I received the following error message:

    locker_test.go:103: 
        	Error Trace:	locker_test.go:103
        	Error:      	Should be true
        	Test:       	TestLocker/Concurrency
        	Messages:   	Failed to create table schema_lock: dpiStmt_execute(mode=0 arrLen=-1): 
--- FAIL: TestLocker (0.79s)
    --- PASS: TestLocker/lock-unlock (0.57s)
    --- FAIL: TestLocker/Concurrency (0.22s)

Complete CI-Log

@tgulacsi
Copy link
Contributor

"Maybe I've found a fix for the empty errors - see 7c96931"

@sudarshan12s
Copy link
Collaborator
sudarshan12s commented Dec 21, 2020 via email

@volker-raschek
Copy link
Author
volker-raschek commented Dec 30, 2020

@sudarshan12s , thanks a lot for the hint with runtime.LockOSThread(). I added LockOSThread() into my test and I have updated godror to the latest version (currently v0.22.3). The test is than successfully on the CI. The logs are attached - ci.log

Since there is a dependency on a C library by godror it seems that it must be ensured that the C code runs in the same thread as the Go routines. Some additional links:

  1. benefits-of-runtime-lockosthread-in-golang
  2. LockOSThread - example

Now I am wondering how far this could have implications on existing code where runtime.LockOSThread() is not called.

I think this is only necessary when an application opens multiple DB pools or did I misunderstand something?

@tgulacsi
Copy link
Contributor

I still does not understand what's happening.

@volker-raschek , can you post us a failing CI log of yout test using v0.22.3, but WITHOUT runtime.LockOSThread() ? That should not have empty error message.
If it still does, than that suggests that the error-retrieving logic must be protected with LockOSThread(), which is more than a hundred places in the source, so I'd like to support this with evidence.

tgulacsi added a commit that referenced this issue Dec 31, 2020
Encapsulate execution of OCI function and getting error in
runtime.LockOSThread and runtime.UnlockOSThread.

For #120.
@tgulacsi
Copy link
Contributor

@volker-raschek can you try 8411931 without runtime.LockOSThread ?
I've tried encapsulating all OCI execution and error return in runtime.LockOSThread.

@volker-raschek
Copy link
Author

@tgulacsi, I updated godror to 8411931 and removed runtime.LockOSThread() from my test - commit

The test is now successfully on the CI - ci.log. I would like to update all of our internal projects to the latest version of godror if we have fixed the error.

@sudarshan12s
Copy link
Collaborator
sudarshan12s commented Jan 5, 2021

@sudarshan12s , thanks a lot for the hint with runtime.LockOSThread(). I added LockOSThread() into my test and I have updated godror to the latest version (currently v0.22.3). The test is than successfully on the CI. The logs are attached - ci.log

Since there is a dependency on a C library by godror it seems that it must be ensured that the C code runs in the same thread as the Go routines. Some additional links:

  1. benefits-of-runtime-lockosthread-in-golang
  2. LockOSThread - example

Now I am wondering how far this could have implications on existing code where runtime.LockOSThread() is not called.

I think this is only necessary when an application opens multiple DB pools or did I misunderstand something?

Yes i was indicating TLS used for storing error context in C libs in this case. Other thing I wanted to clarify is on application instance. By application instance, you mean a seperate process , right? which creates a pool.
I got confused because in your test case, you create multiple pools in a single process .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants