AI Model Development

Explore top LinkedIn content from expert professionals.

Brij kishore Pandey Brij kishore Pandey is an Influencer

AI Architect | AI Engineer | Generative AI | Agentic AI | Tech, Data & AI Content Creator | 1M+ followers

704,470 followers 6mo
Report this post
LLMs vs. SLMs — What’s the real difference? If you’re working with AI models, you’ve probably heard these two terms thrown around a lot lately: • LLMs — Large Language Models • SLMs — Small (or Specialized) Language Models But what’s the actual difference? And when should you use one over the other? 𝗟𝗟𝗠𝘀: 𝗢𝗻𝗲 𝗺𝗼𝗱𝗲𝗹, 𝗺𝗮𝗻𝘆 𝘁𝗮𝘀𝗸𝘀 LLMs are generalists. You train them once, and they can do a bit of everything: Answer questions Summarize content Write code Hold conversations Generate insights They’re flexible, powerful, and capable of handling complex, multi-purpose tasks. But they’re also… heavy. Bigger infrastructure, higher cost, and not always the best at specific things. 𝗦𝗟𝗠𝘀: 𝗢𝗻𝗲 𝗺𝗼𝗱𝗲𝗹, 𝗼𝗻𝗲 𝗷𝗼𝗯 SLMs are specialists. They’re trained for a narrow, well-defined task — and they do it really well. Think: Faster response times Lower compute cost Better accuracy on the specific task they’re built for They don’t try to do everything — just the one thing they’re designed for. So, which should you choose? That depends on what you’re building. If you need versatility, go with an LLM. If you need efficiency and precision, an SLM might be a better fit. And in many real-world scenarios, the best solution is a hybrid: Use LLMs where flexibility matters, and SLMs where accuracy and speed are key.
No more previous content

No more next content
72 Comments
Like Comment
Sarthak Rastogi

AI engineer | Posts on agents + advanced RAG | Experienced in LLM research, ML engineering, Software Engineering

23,272 followers 10mo
Report this post
5 steps that Amazon Finance took to improve their RAG pipeline's accuracy from 49% to 86% 📈 -- - They started by fixing document chunking problems. They saw that the original fixed-size chunks were causing inaccuracies because they didn’t capture complete context. By using the QUILL Editor, they turned unstructured text into HTML, and then identified logical structures based on HTML tags. Just chunking the docs differently raised the accuracy from 49% to 64%. 😦 - Next, prompt engineering. They aimed to: 1. stop hallucinations when there wasn’t relevant context, 2. support both concise and detailed answers, and 3. give citations. They also worked on implementing chain-of-thought reasoning to improve how the LLM structured its answers. This got the accuracy to 76%. - Finally they optimised their embedding models. They tested different first-party and third-party models and found that models like bge-base-en-v1.5 offered better performance on their dataset. Ultimately, they settled on Amazon Titan Embeddings G1. Better retrieval finally got them a better accuracy of 86%. Their targeted improvements in the RAG pipeline and they all added up. Link to the article from AWS: https://lnkd.in/gFDBfhJm #AI #LLMs #RAG
No more previous content

No more next content
12 Comments
Like Comment
Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer

609,262 followers 8mo
Report this post
If you are an AI engineer, thinking how to choose the right foundational model, this one is for you 👇 Whether you’re building an internal AI assistant, a document summarization tool, or real-time analytics workflows, the model you pick will shape performance, cost, governance, and trust. Here’s a distilled framework that’s been helping me and many teams navigate this: 1. Start with your use case, then work backwards. Craft your ideal prompt + answer combo first. Reverse-engineer what knowledge and behavior is needed. Ask: → What are the real prompts my team will use? → Are these retrieval-heavy, multilingual, highly specific, or fast-response tasks? → Can I break down the use case into reusable prompt patterns? 2. Right-size the model. Bigger isn’t always better. A 70B parameter model may sound tempting, but an 8B specialized one could deliver comparable output, faster and cheaper, when paired with: → Prompt tuning → RAG (Retrieval-Augmented Generation) → Instruction tuning via InstructLab Try the best first, but always test if a smaller one can be tuned to reach the same quality. 3. Evaluate performance across three dimensions: → Accuracy: Use the right metric (BLEU, ROUGE, perplexity). → Reliability: Look for transparency into training data, consistency across inputs, and reduced hallucinations. → Speed: Does your use case need instant answers (chatbots, fraud detection) or precise outputs (financial forecasts)? 4. Factor in governance and risk Prioritize models that: → Offer training traceability and explainability → Align with your organization’s risk posture → Allow you to monitor for privacy, bias, and toxicity Responsible deployment begins with responsible selection. 5. Balance performance, deployment, and ROI Think about: → Total cost of ownership (TCO) → Where and how you’ll deploy (on-prem, hybrid, or cloud) → If smaller models reduce GPU costs while meeting performance Also, keep your ESG goals in mind, lighter models can be greener too. 6. The model selection process isn’t linear, it’s cyclical. Revisit the decision as new models emerge, use cases evolve, or infra constraints shift. Governance isn’t a checklist, it’s a continuous layer. My 2 cents 🫰 You don’t need one perfect model. You need the right mix of models, tuned, tested, and aligned with your org’s AI maturity and business priorities. ------------ If you found this insightful, share it with your network ♻️ Follow me (Aishwarya Srinivasan) for more AI insights and educational content ❤️

25 Comments
Like Comment
Matt Wood Matt Wood is an Influencer

CTIO, PwC

76,443 followers 2y
Report this post
LLM field notes: Where multiple models are stronger than the sum of their parts, an AI diaspora is emerging as a strategic strength... Combining the strengths of different LLMs in a thoughtful, combined architecture can enable capabilities beyond what any individual model can achieve alone, and gives more flexibility today (when new models are arriving virtually every day), and in the long term. Let's dive in. 🌳 By combining multiple, specialized LLMs, the overall system is greater than the sum of its parts. More advanced functions can emerge from the combination and orchestration of customized models. 🌻 Mixing and matching different LLMs allows creating solutions tailored to specific goals. The optimal ensemble can be designed for each use case; ready access to multiple models will make it easier to adopt and adapt to new use cases more quickly. 🍄 With multiple redundant models, the system is not reliant on any one component. Failure of one LLM can be compensated for by others. 🌴 Different models have varying computational demands. A combined diasporic system makes it easier to allocate resources strategically, and find the right price/performance balance per use case. 🌵 As better models emerge, the diaspora can be updated by swapping out components without needing to retrain from scratch. This is going to be the new normal for the next few years as whole new models arrive. 🎋 Accelerated development - Building on existing LLMs as modular components speeds up the development process vs monolithic architectures. 🫛 Model diversity - Having an ecosystem of models creates more opportunities for innovation from many sources, not just a single provider. 🌟 Perhaps the biggest benefit is scale - of operation and capability. Each model can focus on its specific capability rather than trying to do everything. This plays to the models' strengths. Models don't get bogged down trying to perform tasks outside their specialty. This avoids inefficient use of compute resources. The workload can be divided across models based on their capabilities and capacity for parallel processing. Takes a bit to build this way (plan and execute on multiple models, orchestration, model management, evaluation, etc), but that upfront cost will pay off time and again, for every incremental capability you are able to add quickly. Plan accordingly. #genai #ai #aws #artificialintelligence
No more previous content

No more next content
17 Comments
Like Comment
Sahar Mor

I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

41,251 followers 1y
Report this post
Researchers from Oxford University just achieved a 14% performance boost in mathematical reasoning by making LLMs work together like specialists in a company. In their new MALT (Multi-Agent LLM Training) paper, they introduced a novel approach where three specialized LLMs - a generator, verifier, and refinement model - collaborate to solve complex problems, similar to how a programmer, tester, and supervisor work together. The breakthrough lies in their training method: (1) Tree-based exploration - generating thousands of reasoning trajectories by having models interact (2) Credit attribution - identifying which model is responsible for successes or failures (3) Specialized training - using both correct and incorrect examples to train each model for its specific role Using this approach on 8B parameter models, MALT achieved relative improvements of 14% on the MATH dataset, 9% on CommonsenseQA, and 7% on GSM8K. This represents a significant step toward more efficient and capable AI systems, showing that well-coordinated smaller models can match the performance of much larger ones. Paper https://lnkd.in/g6ag9rP4 — Join thousands of world-class researchers and engineers from Google, Stanford, OpenAI, and Meta staying ahead on AI http://aitidbits.ai
No more previous content

No more next content
4 Comments
Like Comment
Adam DeJans Jr.

Decision Intelligence | Author

24,434 followers 5mo
Report this post
I keep hearing “ChatGPT-5 will replace solver engines.” That’s not how this works. Let’s break it down: 🔹 LLMs (like ChatGPT) LLMs are probabilistic pattern generators. They predict the next most likely word or token, which makes them great at writing text, drafting code, and summarizing knowledge. But they do not guarantee: • Feasibility → whether a solution satisfies all constraints. • Optimality → whether no better solution exists. • Correctness → whether the answer is even valid in a mathematical sense. Every output is, at best, a plausible guess. 🔹 Optimization solvers (Gurobi, CPLEX, CBC, etc.) Solvers are deterministic engines. They take a mathematical model (variables, constraints, and an objective) and: • Explore massive search spaces (millions or billions of possibilities). • Use decades of algorithmic advances (branch-and-bound, cutting planes, decomposition). • Prove feasibility and, often, prove optimality. This difference is crucial. ⸻ ✅ Example 1: Truck routing with time windows 👉 LLM: can generate a MILP formulation or pseudocode. 👉 Solver: systematically searches the combinatorial explosion of routes, ensuring trucks don’t violate capacity or timing rules, and finds the least-cost solution. ✅ Example 2: Portfolio optimization 👉 LLM: can describe the model and constraints in plain English. 👉 Solver: ensures budgets are not exceeded, risk constraints are respected, and returns the provably best allocation of capital. ⸻ ✅ The key distinction: LLMs are model assistants. Solvers are solution engines. LLMs help translate messy business problems into math. Solvers deliver mathematically rigorous answers. The future isn’t replacement, but rather synergy. Use AI to frame the problem, then let optimization engines do what they’re built for: find the best decision under constraints.

33 Comments
Like Comment
Raghvender Arni

25,660 followers 2y
Report this post
TL;DR: For building Enterprise #genai applications consider doing RAG WITH Fine-tuning to improve performance, lower cost and reduce hallucinations There are two common application engineering patterns to building GenAI applications: RAG and LLM Fine-tuning RAG: This involves an unmodified LLM but using various semantic retrieval techniques (like ANN) and then providing that as context to an LLM to help the LLM generate a response. How to RAG in Amazon Web Services (AWS): (https://lnkd.in/eZC3FH_p) Pros: -- Easy to get started -- Hallucinations can be reduced by a lot -- Will always get the freshest data Cons: -- Slower as multiple hops are needed -- If using commercial LLM, more tokens are passed around and that means more $$$ Fine-tuning: This involves updating an LLM (weights etc) with enterprise data more commonly now using techniques like PEFT How to Fine-tune in Amazon Web Services (AWS): https://lnkd.in/eRDg9X5M) Pros: -- Higher performance both latency and accuracy wise -- Lower cost as the number of tokens passed into LLMs can be reduced significantly Cons -- Even with PEFT, fine-tune is a non trivial task and costs $$ -- Hallucinations will still happen Based on what we see with customers they want to get the best of both worlds. Do RAG with a fine-tuned LLM How: Start by fine-tuning an LLM with enterprise "reference" data. This is data that does not change frequently or at all. This could also be data that you want to be consistent, like a brand voice. Then use that fine-tuned model as the base for your RAG. For the retrieval part you store your "fast-moving" data for semantic searches. This way you lower costs (fewer token costs), improve latency and potentially accuracy (as model is updated with your data) and reduce hallucinations (via RAG and Prompt Eng). To unlock all this effectively you really need a solid data strategy. More on that in future posts.
No more previous content

No more next content
14 Comments
Like Comment
Jonathan Alexander

Manufacturing AI & Advanced Analytics | Digital Transformation | Keynote Speaker | Industry 4.0 | Operational Excellence | Change Management | People Empowerment

8,482 followers 5mo
Report this post
Everyone assumes LLMs are the future. NVIDIA & Georgia Tech just made the case for the opposite. After digging into their new, provocative paper: 𝑆𝑚𝑎𝑙𝑙 𝐿𝑎𝑛𝑔𝑢𝑎𝑔𝑒 𝑀𝑜𝑑𝑒𝑙𝑠 𝑎𝑟𝑒 𝑡ℎ𝑒 𝐹𝑢𝑡𝑢𝑟𝑒 𝑜𝑓 𝐴𝑔𝑒𝑛𝑡𝑖𝑐 𝐴𝐼, one message is clear: We do not always need massive LLMs to build effective AI agents. The paper makes three bold claims: 𝟏. 𝐏𝐨𝐰𝐞𝐫𝐟𝐮𝐥 𝐞𝐧𝐨𝐮𝐠𝐡: SLMs can handle tool use, instruction following, code generation, and reasoning, core tasks for AI agents. 𝟐. 𝐎𝐩𝐞𝐫𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐟𝐢𝐭: Agents mostly need decision-making (e.g., “which tool to call”), not essays or poetry. SLMs are optimized for such focused tasks. 𝟑. 𝐂𝐡𝐞𝐚𝐩𝐞𝐫: A 7B model costs 10–30x less than a 70B model, consumes less energy, and can run locally. So how exactly do they define a Small Language Model (SLM)? → An SLM is compact enough to run on consumer devices while delivering low-latency responses to agentic requests. → An LLM is simply a model that is not an SLM. Supporting arguments from the paper: → 𝐂𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐲: Modern SLMs rival older LLMs in reasoning, instruction following, and tool use, and can be boosted further with verifier feedback or tool augmentation. → 𝐄𝐜𝐨𝐧𝐨𝐦𝐢𝐜𝐬: They are dramatically cheaper to run, fine-tune, and deploy, fitting naturally into modular, “Lego-like” architectures. → 𝐅𝐥𝐞𝐱𝐢𝐛𝐢𝐥𝐢𝐭𝐲: Lower costs make experimentation easier and broaden participation, reducing bias and encouraging innovation. → 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐚𝐥 𝐟𝐢𝐭: Agents only need narrow functionality like tool calls and structured outputs. LLMs’ broad conversational skills often go unused. → 𝐀𝐥𝐢𝐠𝐧𝐦𝐞𝐧𝐭: SLMs can be tuned for consistent formats (like JSON), reducing hallucinations and errors. → 𝐇𝐲𝐛𝐫𝐢𝐝 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡: LLMs are best for planning complex workflows; SLMs excel at execution. → 𝐂𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬 𝐢𝐦𝐩𝐫𝐨𝐯𝐞𝐦𝐞𝐧𝐭: Every agent interaction generates training data, allowing SLMs to steadily replace LLM reliance over time. Of course, skeptics argue that LLMs will always outperform due to scaling laws, economies of scale, & industry inertia. But the authors make a strong case that SLMs are cheaper, faster, specialized, and sustainable. And the best part? They openly invite critique and collaboration to accelerate the shift.

19 Comments
Like Comment
Agus Sudjianto

A geek who can speak: Co-creator of PiML and MoDeVa, SVP Risk & Technology H2O.ai, former EVP-Head of Wells Fargo MRM

26,552 followers 5mo
Report this post
Back to the Future: from LLM Hype to Agentic Reality With the progress of LLMs and the pivot to agentic AI, we’re finally acknowledging an important truth: LLMs are powerful but not reliable decision-makers. They excel at natural language interaction and orchestrating workflows but when it comes to reliable, auditable and repeatable decisions, we’re back to the fundamentals: machine learning models. It’s a return to balance: - LLMs for interaction and adaptability. - ML models for robust predictions, decision-making and optimization. In practice, this means: LLMs as agents that call models. ML models as the decision engines powering trustworthy outcomes. A new generation of hybrid systems where interpretability and performance reliability can be engineered with rigor. The era of “just use LLM” is closing. The era of building with LLMs grounded in machine learning foundations is here to stay.
No more previous content

No more next content
4 Comments
Like Comment
Arturo Ferreira

Exhausted dad of three | Lucky husband to one | Everything else is AI

5,563 followers 2mo
Report this post
The bottleneck isn't GPUs or architecture. It's your dataset. Three ways to customize an LLM: 1. Fine-tuning: Teaches behavior. 1K-10K examples. Shows how to respond. Cheapest option. 2. Continued pretraining: Adds knowledge. Large unlabeled corpus. Extends what model knows. Medium cost. 3. Training from scratch: Full control. Trillions of tokens. Only for national AI projects. Rarely necessary. Most companies only need fine-tuning. How to collect quality data: For fine-tuning, start small. Support tickets with PII removed. Internal Q&A logs. Public instruction datasets. For continued pretraining, go big. Domain archives. Technical standards. Mix 70% domain, 30% general text. The 5-step data pipeline: 1. Normalize. Convert everything to UTF-8 plain text. Remove markup and headers. 2. Filter. Drop short fragments. Remove repeated templates. Redact PII. 3. Deduplicate. Hash for identical content. Find near-duplicates. Do before splitting datasets. 4. Tag with metadata. Language, domain, source. Makes dataset searchable. 5. Validate quality. Check perplexity. Track metrics. Run small pilot first. When your dataset is ready: All sources documented. PII removed. Stats match targets. Splits balanced. Pilot converges cleanly. If any fail, fix data first. What good data does: Models converge faster. Hallucinate less. Cost less to serve. The reality: Building LLMs is a data problem. Not a training problem. Most teams spend 80% of time on data. That's the actual work. Your data is your differentiator. Not your model architecture. Found this helpful? Follow Arturo Ferreira.

108 Comments
Like Comment

AI Model Development

More in AI Model Development

More Artificial Intelligence topics

Explore categories