Hi!
Currently trying to do a sentiment analysis with GPT-3.5-turbo-model in Rstudios, where my goal is to categorise each data row as either positive, negative or neutral. I have created a limited dataset with 50 rows, for which I have tried to create a loop that categorises 10 rows at a time. However, I can’t really get the loop to work. Any ideas on what I’m doing wrong? Is my prompt sufficient? (the examples are in danish, but shouldn’t be a problem). Heres the code:
My OpenAI API Key
Sys.setenv(
OPENAI_API_KEY = ‘’
)
Evaluate GPT classifier on 50 random articles
set.seed(1801)
eval_data ← df %>% sample_n(size = 50)
eval_data$gpt_label ← NA
base_url ← “(cannot include links in a post apparently)”
headers ← c(Authorization = paste(“Bearer”, Sys.getenv(“OPENAI_API_KEY”)),Content-Type
= “application/json”)
body ← list()
body[[“model”]] ← “gpt-3.5-turbo”
body[[“max_tokens”]] ← 4
body[[“temperature”]] ← 0.7
batch_start_row ← seq(1, 50, by = 10)
pb ← txtProgressBar(min = min(batch_start_row),
max = max(batch_start_row), initial = 1, char = “-”, width = 60, style = 3)
token_usage ← 0
for(row in batch_start_row){
#update progress bar
setTxtProgressBar(pb, row)
Sys.sleep(3.1)
prompt ← “Classify the sentiment in these civic proposals as either \”positive\” or \”negative\” or \”neutral\”. Here are two examples:
Example[1]: Skat stjæler dine penge
Example[2]: Superligafodbold på DR
Example[3]: Gratis offentlig transport for studerende med SU
Answer[1]: negative
Answer[2]: neutral
Answer[3]: positive \n”
indices ← row:(row + 9)
texts ← paste(“Text”, 1:length(indices), “: “, eval_data$titel[indices], ” \n”, collapse = “”, sep = “”)
prompt ← paste(prompt, texts, “Answers:”, collapse = “”)
messages ← list(list(role = “user”, content = prompt))
body[[“messages”]] ← messages
response ← POST(url = base_url,
add_headers(.headers = headers),
body = body, encode = “json”)
completion ← response %>%
content(as = “text”, encoding = “UTF-8”) %>%
fromJSON(flatten = TRUE)
labels ← unlist(str_extract_all(completion$choices[“message.content”]))
eval_data$gpt_label[indices] ← labels
token_usage ← token_usage + completion$usage$total_tokens
}
I’m hoping someone can help clear up what the issues are, so I can get each row in my dataset categorised as either positive, negative or neutral.
Best regards,
Nina
Read more here: Source link