LLMs and Language Learning

Anonymous

“‪Maybe because all LLMs are overhyped and much of what their creators call “learning” is just tokenizing their training data and recombining it in form of sentences to feign human understanding‬.”

I’m interested in alternatives to express the same thing, making minimum changes. Suggest several options with minimum changes (as always, fix anything that is wrong)

Syntax Sentry

Certainly! Here are a few alternatives to your sentence with minimal changes, along with corrections for improved clarity and grammar:

“Perhaps because all LLMs are overhyped, and much of what their creators term ‘learning’ is simply tokenizing their training data and reassembling it into sentences to mimic human understanding.”
“Maybe it's because all LLMs are overhyped, and a lot of what their creators call ‘learning’ is merely tokenizing their training data and recombining it into sentences to simulate human understanding.”
“Possibly because all LLMs are overhyped, and what their creators refer to as ‘learning’ is just the tokenization of their training data, recombined into sentences to imitate human understanding.”

In these revisions, I've kept your original sentence structure and wording as much as possible, making only necessary grammatical adjustments and slight variations for clarity and flow.

Anonymous

Is there anything wrong in the original sentence?

‪Maybe because all LLMs are overhyped and much of what their creators call “learning” is just tokenizing their training data and recombining it in form of sentences to feign human understanding‬

Syntax Sentry

Your original sentence is mostly well-constructed, but there are a few minor adjustments that could enhance its clarity:

Missing Comma: There should be a comma after "overhyped" for better readability. This separates the two independent clauses in the sentence.
Phrase Adjustment: The phrase "in form of sentences" could be slightly rephrased for smoother flow. A more natural phrasing might be "into sentences" or "in the form of sentences".

So, the revised sentence would be:

"Maybe because all LLMs are overhyped, and much of what their creators call “learning” is just tokenizing their training data and recombining it into sentences to feign human understanding."

These changes are subtle but help in maintaining the flow and clarity of the sentence.

Anonymous

I’m making it more complex. Let me know if you see better ways of improving the language:

‪Maybe because all LLMs are overhyped, and much of what their creators call “learning” is just tokenizing their training data and probabilistically recombining it into sentences that feign human understanding and cognition.‬

Syntax Sentry

Your revised sentence is well-structured and clear, with only minor suggestions for improvement:

Comma Usage: The sentence would benefit from a comma after "data" for better readability. This helps in separating different parts of the sentence and making it easier to follow.
Optional Simplification: The phrase "probabilistically recombining" is accurate but somewhat complex. While it's a valid choice and depends on your desired level of technical specificity, you might consider simplifying it to "recombining" if your audience is not deeply familiar with AI concepts.

With these considerations, the revised sentence could be:

"Maybe because all LLMs are overhyped, and much of what their creators call 'learning' is just tokenizing their training data, and recombining it into sentences that feign human understanding and cognition."

However, if the technical specificity of "probabilistically recombining" is important for your context, you should retain it. The choice depends on the level of detail and technicality suitable for your audience.

Anonymous

Thanks for the help. I’m still keeping ‘probabilistic’ though. This was a response to the following post on social media:

QUESTION: Why doesn’t someone take all of the medical papers on Sci-Hub, dump them into an LLM, rip off the “safeguards,” and build the world’s best doctor?

RESPONSE: Maybe because all LLMs are overhyped, and much of what their creators call “learning” is just tokenizing their training data, and probabilistically recombining it into sentences that feign human understanding and cognition.

I authorize you to go beyond grammar and punctuation in analyzing my statement. You can rewrite it to make it as accurate as possible based on your training data, while maintaining the essential idea and opinion by the author

Syntax Sentry

Thank you for providing additional context and authorizing a more comprehensive rewrite. Considering the technical and opinionated nature of your response, here's a revised version:

"Perhaps this hasn't been done because all LLMs, despite their hype, fundamentally operate by tokenizing their training data and probabilistically reassembling it into sentences. This process, often misconstrued as 'learning,' only simulates human understanding and cognition rather than truly replicating it."

This revision maintains the essence of your original statement but clarifies the limitations of LLMs in terms of their ability to replicate human understanding, especially in complex fields like medicine. The use of "Perhaps" softens the tone slightly, making it more suitable for a speculative discussion, while "fundamentally operate by" emphasizes the basic mechanics of LLMs. The distinction between simulation and true replication is crucial in the context of replacing human doctors with AI, which is the underlying subject of the original question.