The wizrd package exists to test the hypothesis that Large Language Models (LLMs) can be programmed as functions and integrated with data science tools and workflows implemented in any programming language. To accomplish this, wizrd defines a grammar and implements a fluent API for programming with LLMs.
Just a utility for pretty-printing in Rmd block quotes:
To start analyzing data with an agent:
library(wizrd)
agent <- llamafile_llama()
predict(agent, "Describe the mtcars dataset") |> pretty_rmd()
The mtcars dataset is a built-in dataset in R, a popular programming language for statistical computing and graphics. It was created by John Fox and Alan S. Blume in 1973 and is often used as a benchmark dataset in machine learning and data analysis.
Here’s an overview of the mtcars dataset:
Description: The mtcars dataset contains information about 32 cars, including their performance characteristics, engine specifications, and fuel efficiency. The dataset is divided into two main categories:
- Car characteristics: This includes variables such as: *
mpg
: miles per gallon (fuel efficiency) *cyl
: number of cylinders in the engine *disp
: engine displacement (in cubic inches) *hp
: horsepower of the engine *drat
: gear ratio *wt
: weight of the car (in thousands of pounds) *qsec
: quarter mile time *vs
: vehicle type (0 = automatic, 1 = manual) *am
: transmission type (0 = automatic, 1 = manual) *gear
: number of gears in the transmission *carb
: number of carburetors 2. Car performance: This includes variables such as: *vs
: vehicle type (0 = automatic, 1 = manual) *am
: transmission type (0 = automatic, 1 = manual)gear
: number of gears in the transmissioncarb
: number of carburetorsData structure: The mtcars dataset is a data frame with 32 rows (one row per car) and 11 columns (variables).
Example: Here’s a sample of the first few rows of the mtcars dataset:
r mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 3.85 2.320 18.61 1 1 4 1
I hope this helps! Let me know if you have any questions or need further clarification.<|eot_id|>
The call to llamafile_llama()
will download, cache and
run the self-contained, cross-platform llamafile binary for the llama
3.2 3B model. Behind the scenes, it starts up a local HTTP server (based
on llama.cpp) through which it communicates with the R process. For more
general use of local LLMs, it is recommended to install Ollama and use
ollama_agent()
to pull (if necessary) and run agents with
Ollama. The llama()
function is a convenience to run llama
3.2 3B with Ollama. Convenience functions exist for some other common
models.
The predict()
function is the most convenient way to
execute single exchanges with an agent. To maintain a context over
multiple exchanges, use the chat()
function:
#> <Chat>: 5 messages
#> - - - - - - - - - - - - - - - Latest messages - - - - - - - - - - - - - - -
#> ┌───────────────────────────────────┐
#> │ │
#> │ How can I manipulate it in R? │
#> │ │
#> └───────────────────────────────────┘
#>
#> You can manipulate the mtcars dataset in R using various functions and
#> techniques. Here are a few examples:
#>
#> **1. View the first few rows of the dataset**
#> ```r
#> head(mtcars)
#> ```
#> This will display the first few rows of the dataset.
#>
#> **2. Get the summary statistics of the dataset**
#> ```r
#> summary(mtcars)
#> ```
#> This will display the summary statistics of the dataset, including the mean,
#> median, and standard deviation of each variable.
#>
#> **3. Filter the dataset to include only cars with a specific transmission
#> type**
#> ```r
#> mtcars[mtcars$am == 1, ]
#> ```
#> This will return a subset of the dataset that includes only cars with a
#> manual transmission.
#>
#> **4. Group the dataset by transmission type and calculate the mean mpg**
#> ```r
#> library(dplyr)
#> mtcars %>%
#> group_by(am) %>%
#> summarise(mean_mpg = mean(mpg))
#> ```
#> This will group the dataset by transmission type (am) and calculate the mean
#> mpg for each group.
#>
#> **5. Plot a scatterplot of mpg vs. wt**
#> ```r
#> plot(mpg ~ wt, data = mtcars)
#> ```
#> This will create a scatterplot of mpg vs. wt.
#>
#> **6. Calculate the correlation between mpg and wt**
#> ```r
#> cor(mpg, wt, mtcars)
#> ```
#> This will calculate the correlation between mpg and wt.
#>
#> **7. Create a new column for the horsepower**
#> ```r
#> mtcars$hp <- mtcars$hp
#> ```
#> This will create a new column called hp in the dataset.
#>
#> **8. Sort the dataset by mpg in descending order**
#> ```r
#> mtcars[order(mtcars$mpg, decreasing = TRUE), ]
#> ```
#> This will sort the dataset by mpg in descending order.
#>
#> These are just a few examples of what you can do with the mtcars dataset in
#> R. I hope this helps! Let me know if you have any questions or need further
#> assistance.<|eot_id|>
A useful feature of the returned Chat object is that it prints the context in a readable format.
To extract the last output from the Chat object, call
last_output()
:
last_output(ctx) |> pretty_rmd()
You can manipulate the mtcars dataset in R using various functions and techniques. Here are a few examples:
1. View the first few rows of the dataset
r head(mtcars)
This will display the first few rows of the dataset.2. Get the summary statistics of the dataset
r summary(mtcars)
This will display the summary statistics of the dataset, including the mean, median, and standard deviation of each variable.3. Filter the dataset to include only cars with a specific transmission type
r mtcars[mtcars$am == 1, ]
This will return a subset of the dataset that includes only cars with a manual transmission.4. Group the dataset by transmission type and calculate the mean mpg
r library(dplyr) mtcars %>% group_by(am) %>% summarise(mean_mpg = mean(mpg))
This will group the dataset by transmission type (am) and calculate the mean mpg for each group.5. Plot a scatterplot of mpg vs. wt
r plot(mpg ~ wt, data = mtcars)
This will create a scatterplot of mpg vs. wt.6. Calculate the correlation between mpg and wt
r cor(mpg, wt, mtcars)
This will calculate the correlation between mpg and wt.7. Create a new column for the horsepower
r mtcars$hp <- mtcars$hp
This will create a new column called hp in the dataset.8. Sort the dataset by mpg in descending order
r mtcars[order(mtcars$mpg, decreasing = TRUE), ]
This will sort the dataset by mpg in descending order.These are just a few examples of what you can do with the mtcars dataset in R. I hope this helps! Let me know if you have any questions or need further assistance.<|eot_id|> The returned value can then be used in further computations.
As an exercise, try to create a readline-based chatbot interface. See
wizrd:::readline_chat
for one answer.
There are three requirements for LLMs to act as functions: 1. Accept a list of input parameters, each of arbitrary type, 1. Implement a series of logical operations, potentially delegating to R functions, including those based on an LLM, and 1. Return an R object of a specified type and structure.
We will demonstrate how the wizrd package meets each of these
requirements in turn. We will use gpt-4o-mini for this example, in order
to support constrained output. Set the OPENAI_API_KEY
environment variable to your OpenAI key before running this.
agent <- openai_agent("gpt-4o-mini", temperature = 0) |>
instruct("Answer questions about this dataset:", mtcars)
For reproducibility reasons, it is best to explicitly specify the underlying model, as above, because the default will change as new models are released.
The instruct()
function configures the agent with a
system prompt, instructing the agent and providing basic context. In
this case, we insert the mtcars dataset verbatim as context, which the
agent can reference in its responses. The agent will now be able to
answer questions about the dataset’s characteristics.
Let’s extend the above example so that it can analyze any given variable in the mtcars dataset.
parameterized_agent <- agent |>
prompt_as("Analyze the relationship between {var1} and {var2}.")
parameterized_agent |> chat(list(var1 = "mpg", var2 = "wt"))
#> <Chat>: 3 messages
#> - - - - - - - - - - - - - - - Latest messages - - - - - - - - - - - - - - -
#> ┌──────────────────────────────────────────┐
#> │ │
#> │ Analyze the relationship between mpg │
#> │ and wt. │
#> │ │
#> └──────────────────────────────────────────┘
#>
#> To analyze the relationship between miles per gallon (mpg) and weight (wt) in
#> the provided dataset, we can consider the following points:
#>
#> 1. **General Trend**: Typically, in automotive datasets, there is an inverse
#> relationship between mpg and weight. As the weight of a vehicle increases,
#> the fuel efficiency (mpg) tends to decrease. Heavier vehicles require more
#> energy to move, which can lead to lower fuel efficiency.
#>
#> 2. **Visual Representation**: A scatter plot of mpg against wt would help
#> visualize this relationship. In such a plot, we would expect to see a
#> downward trend, indicating that as weight increases, mpg decreases.
#>
#> 3. **Statistical Analysis**: To quantify the relationship, we could calculate
#> the correlation coefficient between mpg and wt. A negative correlation
#> coefficient would suggest that as weight increases, mpg decreases.
#>
#> 4. **Regression Analysis**: Performing a linear regression analysis could
#> provide a more detailed understanding of the relationship. The regression
#> equation would help predict mpg based on weight and provide insights into the
#> strength of the relationship.
#>
#> 5. **Outliers**: It is also important to check for outliers in the dataset
#> that may skew the results. For instance, very heavy vehicles with low mpg
#> could significantly affect the overall trend.
#>
#> 6. **Categorical Factors**: Other factors such as the number of cylinders
#> (cyl), horsepower (hp), and type of transmission (am) could also influence
#> mpg. It may be useful to control for these variables in a multivariate
#> analysis to isolate the effect of weight on mpg.
#>
#> In summary, while we expect to see a negative relationship between mpg and
#> wt, further analysis through visualization, correlation, and regression would
#> provide a clearer picture of this relationship in the dataset.
By calling prompt_as()
, we parameterized the agent using
a glue template to accept parameters named var1
and
var2
. By passing "mpg"
and "wt"
as the variables, we get an analysis of the relationship between fuel
efficiency and weight.
We can define the logic of the LLM function using natural language
instructions, inserted into the system prompt, using the
instruct()
function:
instructed_agent <- parameterized_agent |>
instruct("Answer questions about this dataset:", mtcars,
"When comparing variables, calculate their correlation.")
chat(instructed_agent, list(var1 = "mpg", var2 = "wt"))
#> <Chat>: 3 messages
#> - - - - - - - - - - - - - - - Latest messages - - - - - - - - - - - - - - -
#> ┌──────────────────────────────────────────┐
#> │ │
#> │ Analyze the relationship between mpg │
#> │ and wt. │
#> │ │
#> └──────────────────────────────────────────┘
#>
#> To analyze the relationship between miles per gallon (mpg) and weight (wt),
#> we can calculate the correlation coefficient between these two variables. The
#> correlation coefficient (often denoted as "r") measures the strength and
#> direction of a linear relationship between two variables.
#>
#> 1. **Data Extraction**: We will extract the mpg and wt values from the
#> dataset.
#>
#> 2. **Calculation of Correlation**: The correlation coefficient can be
#> calculated using the formula:
#> \[
#> r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum
#> y^2 - (\sum y)^2]}}
#> \]
#> where:
#> - \( n \) is the number of data points,
#> - \( x \) represents mpg values,
#> - \( y \) represents wt values.
#>
#> 3. **Interpretation**: The value of r ranges from -1 to 1:
#> - r = 1 indicates a perfect positive correlation,
#> - r = -1 indicates a perfect negative correlation,
#> - r = 0 indicates no correlation.
#>
#> Let's calculate the correlation coefficient for mpg and wt using the provided
#> dataset.
#>
#> ### Data Points
#> Here are the mpg and wt values extracted from the dataset:
#>
#> | mpg | wt |
#> |-------|-------|
#> | 21.0 | 2.62 |
#> | 21.0 | 2.875 |
#> | 22.8 | 2.32 |
#> | 21.4 | 3.215 |
#> | 18.7 | 3.44 |
#> | 18.1 | 3.46 |
#> | 14.3 | 3.57 |
#> | 24.4 | 3.19 |
#> | 22.8 | 3.15 |
#> | 19.2 | 3.44 |
#> | 17.8 | 3.44 |
#> | 16.4 | 4.07 |
#> | 17.3 | 3.73 |
#> | 15.2 | 3.78 |
#> | 10.4 | 5.25 |
#> | 10.4 | 5.424 |
#> | 14.7 | 5.345 |
#> | 32.4 | 2.2 |
#> | 30.4 | 1.615 |
#> | 33.9 | 1.835 |
#> | 21.5 | 2.465 |
#> | 15.5 | 3.52 |
#> | 15.2 | 3.435 |
#> | 13.3 | 3.84 |
#> | 19.2 | 3.845 |
#> | 27.3 | 1.935 |
#> | 26.0 | 2.14 |
#> | 30.4 | 1.513 |
#> | 15.8 | 3.17 |
#> | 19.7 | 2.77 |
#> | 15.0 | 3.57 |
#> | 21.4 | 2.78 |
#>
#> ### Calculation
#> Using statistical software or a programming language like Python or R, we can
#> compute the correlation coefficient.
#>
#> For example, in Python, you could use:
#> ```python
#> import pandas as pd
#>
#> data = {
#> 'mpg': [21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4,
#> 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 30.4, 33.9, 21.5, 15.5, 15.2, 13.3, 19.2,
#> 27.3, 26, 30.4, 15.8, 19.7, 15, 21.4],
#> 'wt': [2.62, 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44, 3.44,
#> 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2, 1.615, 1.835, 2.465, 3.52, 3.435,
#> 3.84, 3.845, 1.935, 2.14, 1.513, 3.17, 2.77, 3.57, 2.78]
#> }
#>
#> df = pd.DataFrame(data)
#> correlation = df['mpg'].corr(df['wt'])
#> print(correlation)
#> ```
#>
#> ### Result
#> After performing the calculation, you would find that the correlation
#> coefficient between mpg and wt is approximately -0.87.
#>
#> ### Conclusion
#> This indicates a strong negative correlation between mpg and weight. As the
#> weight of the vehicle increases, the miles per gallon (fuel efficiency) tends
#> to decrease. This is a common finding in automotive data, as heavier vehicles
#> generally require more fuel to operate.
The agent will now provide a more structured analysis including correlation, interpretation, and visualization suggestions. However, it is not able to carry out the correlation calculation.
To solve this, we can provide a tool that performs the actual correlation calculation:
calculate_correlation <- function(var1, var2) {
cor(mtcars[[var1]], mtcars[[var2]])
}
equipped_agent <- instructed_agent |> equip(calculate_correlation)
equipped_agent |> chat(list(var1 = "mpg", var2 = "wt"))
#> <Chat>: 5 messages
#> - - - - - - - - - - - - - - - Latest messages - - - - - - - - - - - - - - -
#> + call_2XNUw6iErqvV… +
#> | |
#> | [1] -0.8676594 |
#> | |
#> +--------------------+
#>
#> The correlation between miles per gallon (mpg) and weight (wt) is
#> approximately -0.87. This indicates a strong negative correlation, meaning
#> that as the weight of the vehicle increases, the miles per gallon tends to
#> decrease.
In order to incorporate the output into a larger program, it is often
necessary to convert the output to a more standardized and computable
object. We can use the output_as()
function to structure
our analysis results:
# Return the correlation as a single number
equipped_agent |>
output_as(S7::class_numeric) |>
predict(list(var1 = "mpg", var2 = "wt"))
#> [1] -0.8676594
# Return a filtered subset of mtcars, which serves as the prototype
filtered_agent <- agent |>
output_as(mtcars)
filtered_agent |>
predict("Cars with mpg > 20")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> 5 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> 6 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> 7 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> 8 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> 9 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# Return a summary data.frame
summary_agent <- agent |>
output_as(data.frame(
cyl = integer(),
avg_mpg = numeric(),
avg_hp = numeric()
))
summary_agent |>
predict("Average mpg and hp by number of cylinders.")
#> cyl avg_mpg avg_hp
#> 1 4 24.1 91.5
#> 2 6 18.9 130.5
#> 3 8 15.1 209.5
# Using class_data.frame for more general output
agent |>
output_as(S7::class_data.frame) |>
predict("Top 5 most fuel efficient cars including mpg, hp, and wt.")
#> mpg hp wt
#> 1 33.9 65 1.835
#> 2 32.4 66 2.200
#> 3 30.4 52 1.615
#> 4 30.4 113 1.513
#> 5 27.3 66 1.935
In the above examples, we demonstrate different ways to structure the
output: 1. Using an existing data.frame (mtcars
) as a
template 2. Using a data.frame stub with specific columns 3. Using
class_data.frame
for more general output
The agent will return properly structured data.frames that can be used directly in further analysis or visualization.
Note that output_as()
supports any S7 class as a
constraint, not just data.frames. This allows for type-safe conversion
of agent outputs into any R object structure defined using S7.
Since the agent is already behaving like a function, it is relatively
straightforward to convert it into an actual R function using the
convert()
generic from S7:
# Convert the filtered agent to a function
filter_cars <- S7::convert(filtered_agent, S7::class_function)
filter_cars("Cars with mpg > 30")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 2 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> 3 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> 4 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
The wizrd package implements a client for the Model Context Protocol (MCP), which enables constructing agents from shared ingredients, including tools, prompts, and resources.
Here’s an example of using MCP to interact with a data analysis server:
# Start the data analysis server
data_server.py <- system.file("mcp", "data_server.py",
package = "wizrd")
server <- start_mcp(data_server.py)
# Create an MCP session
session <- connect_mcp(server)
# List available tools
tools <- tools(session)
# Tools are just ordinary R functions
tools$get_mean(mtcars$mpg)
#> [1] "20.090625"
# Equip a agent with the MCP tools and analyze the data
mcp_agent <- agent |>
output_as(S7::class_numeric) |>
equip(tools)
predict(mcp_agent, "What is the mean fuel efficiency in the mtcars dataset?")
#> [1] 20.09062
The MCP protocol provides a standardized way to: 1. Discover available tools, prompts, and resources 2. Call tools with structured arguments 3. Access and use predefined prompts 4. Handle resources and templates
This makes it easier to work with different agent implementations while maintaining a consistent interface in R.
The wizrd package implements experimental functionality for Retrieval Augmented Generation (RAG). One potentially useful application is in the querying of R manual pages.
The code below uses the chunk()
generic to generate text
chunks from the S7 man pages. Next, it creates a TextStore that indexes
those chunks using the nomic text embedding model. It then configures
the prompt generator to query the text store for chunks that are similar
to the query. Finally, it sends the query for an example of the
S7::new_property()
function.
The chunk()
utility has basic support for a number of
formats, including markdown derivatives, HTML and PDF.
Since the output is markdown, we embed it directly in this document.
chunks <- chunk(tools::Rd_db("S7"))
store <- text_store(nomic(), chunks)
agent <- llama() |> prompt_as(rag_with(store))
cat("#### new_property example\n")
#> #### new_property example
last_message(chat(agent, "new_property example"))
#> Here's an example of using the `new_property` function in R:
#>
#> ```r
#> # Define a new property for a class called "Person"
#> person_class <- new_class("Person", properties = list(
#> name = new_property(class_character, default = "John"),
#> age = new_property(class_numeric)
#> ))
#>
#> # Create an instance of the Person class with default values
#> john <- person_class()
#>
#> # Print the initial values of the properties
#> print(john$name) # prints: John
#> print(john$age) # prints: NA
#>
#> # Update the name property and print the result
#> john$name <- "Jane"
#> print(john$name) # prints: Jane
#>
#> # Create a new instance with custom values for both properties
#> jane <- person_class(name = "Jane", age = 30)
#>
#> # Print the initial values of the properties
#> print(jane$name) # prints: Jane
#> print(jane$age) # prints: 30
#>
#> # Update the age property and print the result
#> jane$age <- 31
#> print(jane$age) # prints: 31
#>
#> # Use a dynamic property to compute on demand
#> clock_class <- new_class("Clock", properties = list(
#> now = new_property(getter = function(self) Sys.time())
#> ))
#>
#> my_clock <- clock_class()
#>
#> # Print the initial value of the now property
#> print(my_clock$now) # prints: current time
#>
#> # Wait for a second and print the updated value of the now property
#> Sys.sleep(1)
#> print(my_clock$now) # prints: new current time
#> ```
#>
#> In this example, we define two classes: `Person` with properties `name` and
#> `age`, and `Clock` with a dynamic property `now`. We create instances of
#> these classes with default values or custom values for the properties. We
#> also demonstrate how to use dynamic properties to compute on demand.