一天一大 leet(恢复空格)难度:中等-Day20200709

题目:

哦，不！你不小心把一个长篇文章中的空格、标点都删掉了，并且大写也弄成了小写。像句子"I reset the computer. It still didn’t boot!"已经变成了"iresetthecomputeritstilldidntboot"。在处理标点符号和大小写之前，你得先把它断成词语。当然了，你有一本厚厚的词典 dictionary，不过，有些词没在词典里。假设文章用 sentence 表示，设计一个算法，把文章断开，要求未识别的字符最少，返回未识别的字符数。

注意：本题相对原题稍作改动，只需返回未识别的字符数

示例

输入：
dictionary = ["looked","just","like","her","brother"]
sentence = "jesslookedjustliketimherbrother"
输出：7
解释：断句后为"jess looked just like tim her brother"，共7个未识别字符

提示

0 <= len(sentence) <= 1000
dictionary 中总字符数不超过 150000。
你可以认为 dictionary 和 sentence 中只包含小写字母。

抛砖引玉

思路：

首先要把字符串所有可能存在的切分规则枚举出来对 sentence 进行双层遍历
另外，每次单词被分割出来之后，需要判断这个单词是否在 wordDict 中

存在，继续向后枚举
不存在，存放本轮枚举的结果+截取的不匹配的字符长度

逻辑：

存放本轮枚举的结果：

设传入的 sentence 的长为 len，那么声明一个数组 dp，dp[len] 来表示 sentence 中最少不匹配数
sentence 为空时，不匹配长度为 0，即:dp[0] = 0

参照上面存储逻辑，sentence 长 len，sentence[len]做本轮匹配的终点，依次向前移动指针给其添加字母：

如果 dictionary 包含分割出的字符，那不满足的值应该和该片段前一个存储的值相同
如果不包含，那说明本次截取的值不满足，记录分割的长度+该片段前一个存储的值

直到添加到字母首位。

注意:每次 i 增加时，其其实值应该为 dp[i-1] + x

dictionary 包含 sentence 的第 i 个字符则，x 为 0
不包含则为 1

/**
 * @param {string[]} dictionary
 * @param {string} sentence
 * @return {number}
 */
var respace = function (dictionary, sentence) {
  let len = sentence.length,
    dp = new Array(len + 1).fill(0)
  for (let i = 1; i <= len; i++) {
    dp[i] = dp[i - 1] + (dictionary.includes(sentence[i - 1]) ? 0 : 1)
    for (let j = i - 1; j >= 0; j--) {
      let word = sentence.substring(j, i)
      if (dictionary.includes(word)) {
        dp[i] = Math.min(dp[i], dp[j])
      } else {
        dp[i] = Math.min(dp[i], dp[j] + i - j)
      }
    }
  }
  return dp[len]
}

其他解法

哈希判断包含

上面判断分割的字符是否包含在 dictionary 中用了 includes includes 内部还有一次轮询，可以使用哈希来替代，优化时间

哈希方法可以用 set，map，object 实现

/**
 * @param {string[]} dictionary
 * @param {string} sentence
 * @return {number}
 */
var respace = function (dictionary, sentence) {
  let len = sentence.length,
    dp = new Array(len + 1).fill(0),
    map = new Map()

  // 字典生成哈希
  dictionary.forEach((i) => map.set(i))

  for (let i = 1; i <= len; i++) {
    dp[i] = dp[i - 1] + (map.has(sentence[i - 1]) ? 0 : 1)
    for (let j = i - 1; j >= 0; j--) {
      if (map.has(sentence.substring(j, i))) {
        dp[i] = Math.min(dp[i], dp[j])
      } else {
        dp[i] = Math.min(dp[i], dp[j] + i - j)
      }
    }
  }
  return dp[len]
}

指定分割长度

存储不匹配结果的逻辑不变
按照字典 dictionary 中单词的长度对 sentence，优化切分规则

/**
 * @param {string[]} dictionary
 * @param {string} sentence
 * @return {number}
 */
var respace = function (dictionary, sentence) {
  let len = sentence.length,
    dp = new Array(len + 1).fill(0)

  for (let i = 1; i <= len; i++) {
    // 初始值
    dp[i] = dp[i - 1] + (dictionary.includes(sentence[i - 1]) ? 0 : 1)

    // 遍历字典，分割字符
    for (let j = 0; j < dictionary.length; j++) {
      let item = dictionary[j],
        itemLen = item.length,
        strStart = i - ditemLen

      // 小于字段字符长度不分割
      if (i < itemLen) continue

      if (sentence.substring(strStart, i) === item) {
        // 存在则取该位置不匹配的最小值
        dp[i] = Math.min(dp[strStart], dp[i])
      }
    }
  }
  return dp[len]
}

优化：

使用按字段分割的方式，就可以省略 dp 的初始化了
不会再涉及向前查询是查到 undefind 的问题

则优化后为：

/**
 * @param {string[]} dictionary
 * @param {string} sentence
 * @return {number}
 */
var respace = function (dictionary, sentence) {
  let dp = [0],
    len = sentence.length
  for (let i = 1; i <= len; i++) {
    dp[i] = dp[i - 1] + (dictionary.includes(sentence[i - 1]) ? 0 : 1)
    for (let j = 0; j < dictionary.length; j++) {
      let item = dictionary[j]
      if (sentence.substring(i - item.length, i) === item) {
        dp[i] = Math.min(dp[i], dp[i - item.length])
      }
    }
  }
  return dp[dp.length - 1]
}