問題描述
spark中的jdbc更新語句 (Jdbc update statement in spark)
我使用 JDBC 連接到數據庫,並且正在嘗試運行更新查詢。首先我輸入查詢,然後執行它(以同樣的方式執行 SELECT,它工作得非常好)。
caseoutputUpdateQuery = "(UPDATE dbo.CASEOUTPUT_TEST SET NOTIFIED = 'YES') alias_output "
spark.read.jdbc(url=jdbcUrl, table=caseoutputUpdateQuery, properties=connectionProperties)
當我運行它時,我有以下錯誤:
嵌套的 INSERT、UPDATE、DELETE 或 MERGE 語句必須有一個 OUTPUT 子句。
我試圖用不同的方法解決這個問題,但總是有另一個錯誤。例如,我嘗試用以下方式重寫查詢:
caseoutputUpdateQuery = "(UPDATE dbo.CASEOUTPUT_TEST SET NOTIFIED = 'YES' OUTPUT DELETED.*, INSERTED.* FROM dbo.CASEOUTPUT_TEST) alias_output "
但我遇到了這個錯誤:Anested INSERT, UPDATE, DELETE, or MERGE statement is not allowed in a SELECT statement is allowed in a SELECT statement that不是 INSERT 語句的行的直接來源。
我嘗試重寫它的另一種方法是:
caseoutputUpdateQuery = "(INSERT INTO dbo.UpdateOutput(OldCaseID,NotifiedOld) SELECT * FROM( UPDATE dbo.CASEOUTPUT_TEST SET NOTIFIED = 'YES' OUTPUT deleted.OldCaseID,DELETED.NotifiedOld ) AS tbl) alias_output "
但我遇到了這個錯誤:
嵌套的 INSERT、UPDATE、DELETE 或 MERGE 語句不允許在另一個嵌套的 INSERT、UPDATE、DELETE 或MERGE 語句。
我確實嘗試了我在互聯網上找到的所有內容,但沒有運氣。你對我如何解決這個問題並運行我的更新語句有什麼建議嗎?
參考解法
方法 1:
I think Spark is not designed for that UPDATE statement use case. That's not the scenario where Spark can help to deal with RDBMS. I suggest to use a direct connection using a JDBC from the code you are writing (I mean calling that JDBC directly). If you are using Scala you can use as suggested here (for example, but there are other multiple ways) or from Python as explained here. Those samples reach Oracle engine, but please change the driver/connector if you are using MySQL, SQL Server, Postgres or any other RDMBS
方法 2:
spark.read under the covers does a select * from the source jdbc table. if you pass a query, spark translates it to
select your query from ( their query select *)
Sql complains because you are trying to do an update on a view "select * from"
(by xavier、Oscar Lopez M.、user3486439)