Saltar al contenido principal

Top K Frequent Words

Given a list of words, return the top k frequently occurring words.

Note: If two words occur with the same frequency, then the alphabetically smaller word should come first.

Example(s)

Example 1:

Input: words = ["the", "daily", "byte", "byte"], k = 1
Output: ["byte"]
Explanation:
- "byte" appears 2 times
- "the" appears 1 time
- "daily" appears 1 time
The most frequent word is "byte".

Example 2:

Input: words = ["coding", "is", "fun", "code", "coding", "fun"], k = 2
Output: ["coding", "fun"]
Explanation:
- "coding" appears 2 times
- "fun" appears 2 times
- "is" appears 1 time
- "code" appears 1 time
Top 2 are "coding" and "fun". Since they have the same frequency,
"coding" comes before "fun" alphabetically.

Example 3:

Input: words = ["i", "love", "leetcode", "i", "love", "coding"], k = 2
Output: ["i", "love"]
Explanation:
- "i" appears 2 times
- "love" appears 2 times
- "leetcode" appears 1 time
- "coding" appears 1 time
Top 2 are "i" and "love" (alphabetically ordered).

Example 4:

Input: words = ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
Output: ["the", "is", "sunny", "day"]
Explanation:
- "the" appears 4 times
- "is" appears 3 times
- "sunny" appears 2 times
- "day" appears 1 time

Solution

The solution uses hash map and sorting:

  1. Count frequencies: Use a hash map to count occurrences of each word
  2. Sort words: Sort by frequency (descending), then alphabetically (ascending) for ties
  3. Return top k: Take the first k words from the sorted list

JavaScript Solution - Hash Map + Sort

/**
* Find top k frequent words
* @param {string[]} words - Array of words
* @param {number} k - Number of top words to return
* @return {string[]} - Top k frequent words
*/
function topKFrequent(words, k) {
// Step 1: Count frequencies
const frequency = new Map();
for (const word of words) {
  frequency.set(word, (frequency.get(word) || 0) + 1);
}

// Step 2: Get all unique words and sort
const uniqueWords = Array.from(frequency.keys());

uniqueWords.sort((a, b) => {
  const freqA = frequency.get(a);
  const freqB = frequency.get(b);
  
  // First sort by frequency (descending)
  if (freqA !== freqB) {
    return freqB - freqA;
  }
  
  // If frequencies are equal, sort alphabetically (ascending); use 'en' for stable test output
  return a.localeCompare(b, 'en');
});

// Step 3: Return top k words
return uniqueWords.slice(0, k);
}

// Test cases
console.log('Example 1:', topKFrequent(["the", "daily", "byte", "byte"], 1)); 
// ["byte"]

console.log('Example 2:', topKFrequent(["coding", "is", "fun", "code", "coding", "fun"], 2)); 
// ["coding", "fun"]

console.log('Example 3:', topKFrequent(["i", "love", "leetcode", "i", "love", "coding"], 2)); 
// ["i", "love"]

console.log('Example 4:', topKFrequent(["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], 4)); 
// ["the", "is", "sunny", "day"]
Output:
Click "Run Code" to execute the code and see the results.

Alternative Solution (Heap-Based)

For better performance when k is much smaller than the total number of unique words, we can use a min-heap:

/**
* Heap-based approach (more efficient when k << n)
*/
function topKFrequentHeap(words, k) {
// Count frequencies
const frequency = new Map();
for (const word of words) {
frequency.set(word, (frequency.get(word) || 0) + 1);
}

// Use a min-heap of size k
const heap = [];

for (const [word, freq] of frequency.entries()) {
if (heap.length < k) {
heap.push({ word, freq });
// Maintain heap property
heapifyUp(heap, heap.length - 1);
} else {
const top = heap[0];
// Compare: frequency first, then alphabetically
if (
freq > top.freq ||
(freq === top.freq && word < top.word)
) {
heap[0] = { word, freq };
heapifyDown(heap, 0);
}
}
}

// Extract and sort results
const result = heap.map(item => item.word);
result.sort((a, b) => {
const freqA = frequency.get(a);
const freqB = frequency.get(b);
if (freqA !== freqB) return freqB - freqA;
return a.localeCompare(b);
});

return result;
}

// Helper functions for min-heap
function heapifyUp(heap, index) {
while (index > 0) {
const parent = Math.floor((index - 1) / 2);
if (compare(heap[index], heap[parent]) >= 0) break;
[heap[index], heap[parent]] = [heap[parent], heap[index]];
index = parent;
}
}

function heapifyDown(heap, index) {
while (true) {
let smallest = index;
const left = 2 * index + 1;
const right = 2 * index + 2;

if (left < heap.length && compare(heap[left], heap[smallest]) < 0) {
smallest = left;
}
if (right < heap.length && compare(heap[right], heap[smallest]) < 0) {
smallest = right;
}
if (smallest === index) break;
[heap[index], heap[smallest]] = [heap[smallest], heap[index]];
index = smallest;
}
}

function compare(a, b) {
if (a.freq !== b.freq) return a.freq - b.freq;
return b.word.localeCompare(a.word); // Reverse for min-heap
}

Alternative: Bucket Sort Approach

For cases where frequencies are bounded, we can use bucket sort:

/**
* Bucket sort approach (when max frequency is bounded)
*/
function topKFrequentBucket(words, k) {
// Count frequencies
const frequency = new Map();
for (const word of words) {
frequency.set(word, (frequency.get(word) || 0) + 1);
}

// Create buckets: index = frequency, value = array of words
const buckets = [];
const maxFreq = Math.max(...frequency.values());

for (let i = 0; i <= maxFreq; i++) {
buckets[i] = [];
}

// Put words into buckets
for (const [word, freq] of frequency.entries()) {
buckets[freq].push(word);
}

// Collect top k words from buckets (from highest to lowest frequency)
const result = [];
for (let i = maxFreq; i >= 1 && result.length < k; i--) {
if (buckets[i].length > 0) {
// Sort words in this bucket alphabetically
buckets[i].sort();
// Add words until we have k
for (const word of buckets[i]) {
if (result.length < k) {
result.push(word);
} else {
break;
}
}
}
}

return result;
}

Complexity

  • Time Complexity:
    • Hash Map + Sort: O(n log n) - Where n is the number of unique words. Counting is O(m) where m is total words, sorting is O(n log n).
    • Heap-based: O(n log k) - Where n is unique words, k is the result size. More efficient when k << n.
    • Bucket Sort: O(n + m) - Where n is unique words, m is max frequency. Linear time if frequencies are bounded.
  • Space Complexity: O(n) - For storing frequencies and unique words.

Approach

The solution uses hash map and sorting:

  1. Count frequencies:

    • Iterate through all words
    • Use a hash map to count occurrences of each word
  2. Sort words:

    • Sort by frequency in descending order
    • For words with the same frequency, sort alphabetically in ascending order
  3. Return top k:

    • Take the first k words from the sorted list

Key Insights

  • Hash map for counting: Efficient O(1) insertion and lookup
  • Custom comparator: Sort by frequency first, then alphabetically
  • Tie-breaking: Alphabetical order for words with same frequency
  • Heap optimization: When k is small, heap-based approach is more efficient
  • Bucket sort: Linear time when frequencies are bounded

Step-by-Step Example

Let's trace through Example 2: words = ["coding", "is", "fun", "code", "coding", "fun"], k = 2

Step 1: Count frequencies
"coding": 2
"is": 1
"fun": 2
"code": 1

Step 2: Get unique words
uniqueWords = ["coding", "is", "fun", "code"]

Step 3: Sort
Compare "coding" (freq=2) vs "is" (freq=1): 2 > 1 → "coding" comes first
Compare "coding" (freq=2) vs "fun" (freq=2): frequencies equal
Compare alphabetically: "coding" < "fun" → "coding" comes first
Compare "fun" (freq=2) vs "is" (freq=1): 2 > 1 → "fun" comes before "is"
Compare "is" (freq=1) vs "code" (freq=1): frequencies equal
Compare alphabetically: "code" < "is" → "code" comes before "is"

Sorted: ["coding", "fun", "code", "is"]

Step 4: Return top k = 2
Result: ["coding", "fun"]

Edge Cases

  • k equals number of unique words: Return all words
  • k = 1: Return the most frequent word
  • All words have same frequency: Return first k alphabetically
  • Empty words array: Return empty array
  • Single word repeated: Return that word
  • All words unique: Return first k alphabetically

Important Notes

  • Tie-breaking rule: When frequencies are equal, sort alphabetically (ascending)
  • Stable sort: The sorting algorithm should be stable to maintain alphabetical order
  • Case sensitivity: Typically words are compared case-sensitively
  • Performance: For large datasets with small k, heap-based approach is better

Comparison of Approaches

ApproachTimeSpaceBest For
Hash Map + SortO(n log n)O(n)General purpose, simple
Heap-basedO(n log k)O(k)When k << n
Bucket SortO(n + m)O(n)When frequencies are bounded
  • Top K Frequent Elements: Similar but for numbers, no alphabetical ordering
  • Kth Largest Element: Find kth largest element
  • Group Anagrams: Group words by anagram
  • Word Frequency: Count word frequencies

Takeaways

  • Hash map + sort is the simplest and most intuitive approach
  • Custom comparator handles both frequency and alphabetical ordering
  • Heap optimization is better when k is much smaller than n
  • Bucket sort provides linear time when applicable
  • Tie-breaking is an important detail in the problem