RPostgreSQL connections are expired as soon as they are initiated with doParallel clusterEvalQ

I was able to reproduce your problem locally. I am not entirely sure but I think the problem is related to the way clusterEvalQ works internally. For example, you say that dbGetQuery(con, "select inet_client_port())
gave you the client port output. If the query was actually evaluated/executed on the cluster nodes then you would be unable to see this output (the same way that you are unable to directly read any other output or print statements that are executed on the external clusternodes).

Hence, It is my understanding that the evaluation is somehow first performed on the local environment and the relevant functions and variables are subsequently copied/exported to the individual clusternodes. This would work for any other type of functions/variables but obviously not for db connections. If the connections/portmappings are linked to the master R instance, then the connections would not work from the slave instances. You would also get the exact same error if you tried to use the clusterExport function in order to export connections that are created on the master instance.

As an alternative, what you can do is create separate connections inside the individual foreach tasks. I have verified with a local database that the following works:

library(doParallel)
nrCores = detectCores()
cl <- makeCluster(nrCores)
registerDoParallel(cl)
clusterEvalQ(cl,library(RPostgreSQL))
clusterEvalQ(cl,library(DBI))

result <- foreach(i=1:nrCores) %dopar%
{
  drv <- dbDriver("PostgreSQL")
  con <- dbConnect(drv, user="user", password="password", dbname="ISO",host="localhost")
  queryResult <- dbGetQuery(con, "fetch something...")
  dbDisconnect(con)
  return(queryResult)
}
stopCluster(cl)

However, now you have to take into account that you will create and disconnect a new connection every foreach iteration. You might incur some performance overhead because of this. You can obviously circumvent this by splitting up your queries/data intelligently so that a lot of work gets done during the same iteration. Ideally, you should split up the work in exactly as much number of cores that you have available.

Read more here: Source link