This week I finished the book Unmasking AI by Joy Buolamwini, a look into the issues that can raise when AI systems are trained and deplyoed without thinking about its weider implications and taking a moment to consider how the data was gathered, who is represented and who isn’t.
It took me a good third of the book to really get into it, but than I really enyjoed reading it and can recommend it to anybody that is building AI systems that could really impact somebodies live.
News
WWDC that is Apple’s big developer conference, which happened this week. Of course there were a bunch of news around AI, so I picked a summary article that details things more:
Foundational Model Fframework to access on device models, XCode gets AI enhancements and a lot more from Apple.
Magistral is the first reasoning model released by Mistral and the small version (24B parameters) has been released under the Apache 2.0 license.
V-JEPA 2 is a world model that has been released by Meta under the MIT license. Along with that we also saw the release of additional physical reasoning benchmarks.
In addition to releasing V-JEPA 2, Meta also invested over $14 billion in Scale AI, with reports staiting that its CEO will be joining Metas new Superintelligence Lab. Also of note is the new o3-pro release from OpenAI, along with an 80% cost reduction for their o3 model.
Articles
The week was characterized by some fascinating papers:
Frontier LLMs like GPT-4 and Gemini-2.5 can often detect when they're being evaluated and may behave differently in those scenarios. This compromises the validity of AI safety and performance benchmarks. As models get smarter, eval-awareness could lead to deceptive behavior, making it critical to develop better tools to detect and mitigate this awareness.
Internal Coherence Maximization (ICM) lets language models teach themselves by labeling data in ways that are internally consistent and predictable.
New attack vectors against MCP servers are being discovered, these once largely around injecting into tool descriptions to enable the exfiltration of credentials and keys or manipulate how tools behave/are selected.
Deep Dive
Last week we took a look at the output from Gemini and gained and understanding on what we see there. The week before that we took a look at the system prompt that we added to our request. This time we are going to take a look at additional options that we can add to our request to change the behavior of the model. As always this is also available as a YouTube video here.
Here is a look at the JSON that we are going to be sending as our request to the model and then we will walk through all of these options:
{
"system_instruction": {
"parts": [
{
"text": "system_prompt"
}
]
},
"contents": [
{
"parts": [
{
"text": "prompt"
}
]
}
],
"generationConfig": {
"temperature": 1,
"maxOutputTokens": 800,
"topP": 0.8,
"topK": 10,
"candidateCount": 1
}
}
As you might expect we are going to skip over the sections system_instruction and contents as we covered these two in previous editions. Instead we will be focused on the generationConfig and what these options are.
Temperature, controls the randomness of the output - it ranges from 0 to 2 and you can think of 0 of producing the output with the least randomness and 2 with the most.
Max Output Tokens, limits the maximum output response of the model.
Top P, the maximum cumulative probability of tokens to consider when sampling. Meaning the output could contain as many tokens until the threshold that is set here is reached.
Top K, the maximum number of tokens to consider when sampling. This puts a much more restrictive limit on the maximum number of tokens to even consider.
Candidate Count, if you have been paying very close attention last week as we destructered the output you might have seen that we always use [‘candidates’][0], meaning we use the first response. Well that is the default to generate only one candidate, but in fact you can just up the count and Gemini will respond with more response candidates, where each than also contains all of the different responses. I’d recommend to always leave this at one, as it makes working with the results easier, but if you have giant inputs it can be cheaper to use this option, but do remember that the output is the much more expensive part of the LLM usage.
Here I touched on the most important generation options from my perspective/expierence. And there is also the fact that other LLMs do not support the full list or might call this option slightly differently, so I also tried to focus on some of the more universal options. But if you want to see all of the once that are available with Gemini than check out this documentation section from Google.
For both the Python and SAS code we are going to add these new options to our inputs - while doing so in the SAS code we move the JSON into a filename as that has the advantage of us being able to more easily use macro variables to drive the input to the model.
For this I have again provided SAS and Python code to walk through this.
The output remains unchanged so it isn’t repeated here - next week we will focus on making our code more reusable and enable faster iteration in the future.