LLM Fundamentals You Must Know for AI Interviews
AI Interview Tips

LLM Fundamentals You Must Know for AI Interviews

IdealResume TeamMay 18, 20258 min read
Share:

LLM Basics Every AI Engineer Should Know

You don't need a PhD to work with LLMs, but you do need solid foundational knowledge. Here's what interviewers expect you to understand.

Architecture Fundamentals

Q: What is a transformer and why is it important?

Strong Answer: "The transformer architecture, introduced in 'Attention Is All You Need' (2017), revolutionized NLP by replacing recurrence with self-attention. Key benefits:

  • Parallel processing (unlike RNNs)
  • Captures long-range dependencies
  • Scales efficiently with data and compute
  • Basis for all modern LLMs (GPT, Claude, Llama)"

Q: Explain self-attention at a high level.

Strong Answer: "Self-attention allows each token to 'attend to' every other token in the sequence. For each token, we compute:

  • Query: what am I looking for?
  • Key: what do I contain?
  • Value: what information do I provide?

Attention weights are computed by matching queries with keys, then used to weight values. This lets the model dynamically focus on relevant parts of the input."

Q: What's the difference between encoder-only, decoder-only, and encoder-decoder models?

Strong Answer:

  • **Encoder-only (BERT)**: Bidirectional, good for understanding/classification
  • **Decoder-only (GPT, Claude)**: Autoregressive, good for generation
  • **Encoder-decoder (T5)**: Both understanding and generation, good for translation

Most modern LLMs are decoder-only because generation is the primary use case.

Training Concepts

Q: Explain pre-training vs fine-tuning.

Strong Answer: "Pre-training trains on massive text corpora to learn general language understanding. It's expensive (millions of dollars, months of compute). Fine-tuning adapts a pre-trained model to specific tasks using much smaller datasets. It's cheaper and faster, leveraging knowledge from pre-training."

Q: What is RLHF?

Strong Answer: "Reinforcement Learning from Human Feedback trains models to align with human preferences. The process:

  1. Collect human comparisons of model outputs
  2. Train a reward model on these preferences
  3. Use RL (typically PPO) to optimize the model against the reward model
  4. Repeat with updated model

This is how ChatGPT and Claude became so conversational and helpful."

Q: What are the alternatives to RLHF?

Cover:

  • **DPO (Direct Preference Optimization)**: Simpler, no reward model needed
  • **Constitutional AI**: Self-critique against principles
  • **RLAIF**: AI feedback instead of human
  • **SFT alone**: Supervised fine-tuning on curated examples

Inference Concepts

Q: What is temperature and how does it affect outputs?

Strong Answer: "Temperature controls randomness in token selection. Technically, it scales the logits before softmax:

  • Temperature 0: Greedy decoding, always picks highest probability
  • Temperature 0.7: Balanced creativity and coherence (common default)
  • Temperature 1+: More random, creative, but potentially incoherent

Use low temperature for factual tasks, higher for creative tasks."

Q: What are other sampling parameters?

Key parameters:

  • **Top-k**: Sample from top k most likely tokens
  • **Top-p (nucleus)**: Sample from smallest set whose probabilities sum to p
  • **Frequency penalty**: Reduce repetition of tokens
  • **Presence penalty**: Encourage topic diversity
  • **Max tokens**: Output length limit
  • **Stop sequences**: End generation at specific strings

Q: What is the context window and why does it matter?

Strong Answer: "The context window is the maximum number of tokens the model can process in one inference. It includes both input and output. Larger windows enable:

  • Longer documents
  • More few-shot examples
  • Multi-turn conversations
  • Complex RAG contexts

But attention is O(n²) in sequence length, so longer contexts are more expensive."

Practical Knowledge

Q: How do you choose between different LLMs?

Factors:

  • **Task requirements**: Reasoning, coding, creative writing
  • **Context length needed**
  • **Latency requirements**
  • **Cost constraints**
  • **Privacy/data concerns**: API vs self-hosted
  • **Specific capabilities**: Function calling, vision, etc.

Q: What is quantization and when would you use it?

Strong Answer: "Quantization reduces model precision (e.g., from 16-bit to 4-bit) to decrease memory usage and increase speed. Trade-off is some quality degradation. Use cases:

  • Running larger models on limited hardware
  • Reducing inference costs
  • Edge deployment
  • When slight quality loss is acceptable

Common methods: GPTQ, AWQ, GGML/GGUF"

Q: Explain tokens and tokenization.

Strong Answer: "Tokens are the units LLMs process - typically subwords, not whole words. 'Understanding' might be 2 tokens: 'Under' + 'standing'. Different models use different tokenizers (BPE, SentencePiece). Key implications:

  • Pricing is per token
  • Context limits are in tokens
  • Some languages are more token-efficient than others
  • Tokenization affects model behavior"

Limitations and Challenges

Q: What are the main limitations of current LLMs?

Critical limitations:

  • **Hallucinations**: Generating false information confidently
  • **Knowledge cutoff**: Training data has an end date
  • **Reasoning failures**: Especially multi-step logic
  • **Math/counting**: Surprisingly weak
  • **Context constraints**: Limited window size
  • **Inconsistency**: Same prompt can give different answers

Q: How do you mitigate hallucinations?

Strategies:

  • RAG with verified sources
  • Asking model to cite sources
  • Confidence calibration
  • Human review for high-stakes content
  • Prompting for uncertainty acknowledgment
  • Fact-checking pipelines

Evaluation

Q: How do you evaluate LLM quality?

Benchmark types:

  • **General**: MMLU, HellaSwag, ARC
  • **Reasoning**: GSM8K, MATH
  • **Coding**: HumanEval, MBPP
  • **Safety**: TruthfulQA, BBQ

Real-world evaluation:

  • Task-specific test sets
  • Human evaluation
  • A/B testing in production
  • Domain expert review

Current Landscape

Be prepared to discuss:

  • Major model families (GPT, Claude, Llama, Gemini)
  • Open vs closed source trade-offs
  • Recent developments and trends
  • Scaling laws and their implications

Understanding fundamentals helps you reason about new developments, troubleshoot issues, and make informed architectural decisions.

Ready to Build Your Perfect Resume?

Let IdealResume help you create ATS-optimized, tailored resumes that get results.

Get Started Free

Found this helpful? Share it with others who might benefit.

Share: