Classification problem with GPT-3.5-turbo-model in Rstudio (sentiment analysis) – API

Hi!

Currently trying to do a sentiment analysis with GPT-3.5-turbo-model in Rstudios, where my goal is to categorise each data row as either positive, negative or neutral. I have created a limited dataset with 50 rows, for which I have tried to create a loop that categorises 10 rows at a time. However, I can’t really get the loop to work. Any ideas on what I’m doing wrong? Is my prompt sufficient? (the examples are in danish, but shouldn’t be a problem). Heres the code:

My OpenAI API Key
Sys.setenv(
OPENAI_API_KEY = ‘’
)

Evaluate GPT classifier on 50 random articles

set.seed(1801)
eval_data ← df %>% sample_n(size = 50)

eval_data$gpt_label ← NA

base_url ← “(cannot include links in a post apparently)”
headers ← c(Authorization = paste(“Bearer”, Sys.getenv(“OPENAI_API_KEY”)),
Content-Type = “application/json”)
body ← list()
body[[“model”]] ← “gpt-3.5-turbo”
body[[“max_tokens”]] ← 4
body[[“temperature”]] ← 0.7

batch_start_row ← seq(1, 50, by = 10)

pb ← txtProgressBar(min = min(batch_start_row),
max = max(batch_start_row), initial = 1, char = “-”, width = 60, style = 3)

token_usage ← 0

for(row in batch_start_row){

#update progress bar
setTxtProgressBar(pb, row)

Sys.sleep(3.1)

prompt ← “Classify the sentiment in these civic proposals as either \”positive\” or \”negative\” or \”neutral\”. Here are two examples:
Example[1]: Skat stjæler dine penge
Example[2]: Superligafodbold på DR
Example[3]: Gratis offentlig transport for studerende med SU
Answer[1]: negative
Answer[2]: neutral
Answer[3]: positive \n”

indices ← row:(row + 9)

texts ← paste(“Text”, 1:length(indices), “: “, eval_data$titel[indices], ” \n”, collapse = “”, sep = “”)

prompt ← paste(prompt, texts, “Answers:”, collapse = “”)

messages ← list(list(role = “user”, content = prompt))

body[[“messages”]] ← messages

response ← POST(url = base_url,
add_headers(.headers = headers),
body = body, encode = “json”)

completion ← response %>%
content(as = “text”, encoding = “UTF-8”) %>%
fromJSON(flatten = TRUE)

labels ← unlist(str_extract_all(completion$choices[“message.content”]))

eval_data$gpt_label[indices] ← labels

token_usage ← token_usage + completion$usage$total_tokens
}

I’m hoping someone can help clear up what the issues are, so I can get each row in my dataset categorised as either positive, negative or neutral.

Best regards,
Nina

Read more here: Source link