Discussion about this post

User's avatar
Terry underwood's avatar

I’m not capable of working under the hood of a transformer, and I’m woefully inadequate because I’m not digital myself, but this idea of feature steering seems like a big deal. It means people can change how AI writes, reasons, elaborates, emotes. Humans can change one another’s minds by persuasion, but we can’t feature steer.

Claude with a training cutoff of April, 2024, days this;

“From research and observed patterns, here are the main types of behaviors you can steer:

1. Writing Style

- Formality level (casual to academic)

- Conciseness vs verbosity

- Simplicity vs complexity of language

- Tone (friendly, professional, technical)

2. Reasoning Patterns

- Step-by-step vs holistic explanations

- Depth of analysis (surface vs detailed)

- Degree of uncertainty expression

- Level of mathematical rigor

3. Domain Expertise

- Technical vocabulary density

- Field-specific conventions

- Citation frequency

- Jargon usage

4. Interaction Style

- Question frequency

- Empathy level

- Directiveness vs suggestiveness

- Tutorial vs peer discussion style

5. Output Structure

- List vs narrative format

- Use of examples/analogies

- Code vs prose ratio

- Visual/diagram suggestions

What's interesting is that these aren't binary switches - they're more like continuous spectrums you can adjust. Is there a particular spectrum here that interests you most?”​​​​​​​​​​​​​​​​

I assume Claude is simplified for me but on track. Am I on track? What does feature steering mean in practical terms for, say, a high school student?

Philosophically, does human capacity to turn the dials and mess with artificial brains mean humans really are the boss of AI? Could Hal be steered during his worst moments? Is FS really our fail safety?

Expand full comment
2 more comments...

No posts