What a week it was, after the release of an MIT report that claims that 95% of corporations do not see any return on their AI investments. This and other news, like the comment by Sam Altman on an AI bubble, lead to a stock market down turn off around $1 trillion.
In addition we saw the news out of Meta that they have a hiring freeze on AI talent, after their massive spending spree. To me none of this news is particular concerning and I’m still a big believer in the current AI trend and that is because this news follows the normal Gartner Hype Cycle. There had to come a trough of disillusionment with the techniques. And the reality is that it takes more to making and investment successfull than just giving every employee access to a LLM.
With that let us dive into News & Articles and in the Deep Dive lay the foundation for our next adventure.
News
DeepSeek releases V3.1 an updated model version that combines thinking and non-thinking, together with a focus on agentic-tool use.
Google roles out AI Mode in their Search app. This new mode makes use of a more agentic behavior and also brings collaboration features to the table.
Chroma introdcues Chroma Cloud, an on-deman, scalable version of their popular Vector DB. We might take advantage of this.
I also came across LL3M, a model that helps to turn prompts into 3D objects insight of Blender. The weights have been released yet, but as a 3D-printing enthusiast this is on my watchlist.
Articles
Most everybody has heard about MCP at this point, but have you heard about UTCP? Universal Tool Calling Protocol, is an alternative to the MCP, that describes how to call existing tools rather than proxying those calls through a new server. After discovery, the agent speaks directly to the tool’s native endpoint.
Apple developed a new foundation model called Wearable health Behavior Model (WBM) that uses high-level behavioral data (like steps and exercise) from wearables to make health predictions. Using data from over 160,000 people, the model proved to be highly effective on its own and even better when combined with a model that uses raw sensor data. This shows that analyzing a person's daily activities can be as important as tracking their raw physiological signals for predicting health states like sleep, injury, and pregnancy.
Avengers-Pro is a system that acts like a smart traffic controller for LLMs. It analyzes each user request and sends it to the best-fit model from a group of different LLMs to balance performance and cost. The system can either provide a 7% boost in accuracy over the best single model at a similar cost, or it can match that model's performance while cutting the cost by 27%. The core idea is that you don't need one giant, expensive model for every task; you can get better results and save money by intelligently using a mix of models.
I also want to share a tool that I have started to use more. Often I find myself wanting for process diagrams and than ask LLMs to write Mermaid code for me, but this week I came across D2 and I have to say I like the look and feel of it + the ability to animate diagrams is super powerful.
Deep Dive
Last week we wrapped up our first steps into building things with LLMs by creating our own Q/A benchmark + evaluations. This week we will embark on our next adventure where we will be building a chat bot to help users discover new features in SAS Viya.
This week we won’t be writing code but rather describe at a highlevel the different components of this tool. So let us talk what we will need:
Data, we will need a list of a new features of SAS Viya, that we can also grab as updates are released every month.
VectorDB, as the amount of features will quickly add up we do not want to add them all to our prompt all of the time - though we will to some comparison here.
Embedding model & chunking strategy, in order to fill our VectorDB we will need to make use of an embedding model to turn our text data into vectors and we will also have think about if it makes sense to turn features into different chunks.
LLM & Prompt/Context Engineering, this one is obvious but it has to said.
A chat interface, we don’t want to be spending a lot of time here so we will be using something that comes pre-built.
Evaluations, since we are building an application that should serve real users we want it to be useful to them and for that we need to turn this vibe of usefulness into something we can measure and track.
A way to serve this to user, we need to make this application available to others so we will take a look on how to do that.
Below you can find a diagram of how this application will roughly be setup, drawn of course using D2.
I hope you are looking forward to next week where we will be diving into building this application up.