数据转换

数据转换 从 Apache ECharts^TM 5 开始支持。在 ECharts 中，术语 数据转换 指的是从用户提供的源数据和转换函数生成新的数据。这个特性使得用户能够以声明式的方式处理数据，并为用户提供一些通用的“转换函数”来使这类任务“开箱即用”。（为了上下文的一致性，我们使用的名词形式是“转换”而不是“转化”）。

数据转换的抽象公式是： 输出数据 = f(输入数据)，其中转换函数 f 可以是 filter、sort、regression、boxplot、cluster、aggregate（待完成）等。在这些转换方法的帮助下，用户可以实现以下功能：

将数据划分为多个系列。
进行一些统计并可视化结果。
将一些可视化算法应用于数据并显示结果。
对数据进行排序。
删除或选择某些类型的空或特殊数据。
...

开始使用数据转换

在 ECharts 中，数据转换是基于数据集的概念实现的。可以在数据集实例中配置 dataset.transform，以表明该数据集是从这个 transform 生成的。例如

var option = {
  dataset: [
    {
      // This dataset is on `datasetIndex: 0`.
      source: [
        ['Product', 'Sales', 'Price', 'Year'],
        ['Cake', 123, 32, 2011],
        ['Cereal', 231, 14, 2011],
        ['Tofu', 235, 5, 2011],
        ['Dumpling', 341, 25, 2011],
        ['Biscuit', 122, 29, 2011],
        ['Cake', 143, 30, 2012],
        ['Cereal', 201, 19, 2012],
        ['Tofu', 255, 7, 2012],
        ['Dumpling', 241, 27, 2012],
        ['Biscuit', 102, 34, 2012],
        ['Cake', 153, 28, 2013],
        ['Cereal', 181, 21, 2013],
        ['Tofu', 395, 4, 2013],
        ['Dumpling', 281, 31, 2013],
        ['Biscuit', 92, 39, 2013],
        ['Cake', 223, 29, 2014],
        ['Cereal', 211, 17, 2014],
        ['Tofu', 345, 3, 2014],
        ['Dumpling', 211, 35, 2014],
        ['Biscuit', 72, 24, 2014]
      ]
      // id: 'a'
    },
    {
      // This dataset is on `datasetIndex: 1`.
      // A `transform` is configured to indicate that the
      // final data of this dataset is transformed via this
      // transform function.
      transform: {
        type: 'filter',
        config: { dimension: 'Year', value: 2011 }
      }
      // There can be optional properties `fromDatasetIndex` or `fromDatasetId`
      // to indicate that where is the input data of the transform from.
      // For example, `fromDatasetIndex: 0` specify the input data is from
      // the dataset on `datasetIndex: 0`, or `fromDatasetId: 'a'` specify the
      // input data is from the dataset having `id: 'a'`.
      // [DEFAULT_RULE]
      // If both `fromDatasetIndex` and `fromDatasetId` are omitted,
      // `fromDatasetIndex: 0` are used by default.
    },
    {
      // This dataset is on `datasetIndex: 2`.
      // Similarly, if neither `fromDatasetIndex` nor `fromDatasetId` is
      // specified, `fromDatasetIndex: 0` is used by default
      transform: {
        // The "filter" transform filters and gets data items only match
        // the given condition in property `config`.
        type: 'filter',
        // Transforms has a property `config`. In this "filter" transform,
        // the `config` specify the condition that each result data item
        // should be satisfied. In this case, this transform get all of
        // the data items that the value on dimension "Year" equals to 2012.
        config: { dimension: 'Year', value: 2012 }
      }
    },
    {
      // This dataset is on `datasetIndex: 3`
      transform: {
        type: 'filter',
        config: { dimension: 'Year', value: 2013 }
      }
    }
  ],
  series: [
    {
      type: 'pie',
      radius: 50,
      center: ['25%', '50%'],
      // In this case, each "pie" series reference to a dataset that has
      // the result of its "filter" transform.
      datasetIndex: 1
    },
    {
      type: 'pie',
      radius: 50,
      center: ['50%', '50%'],
      datasetIndex: 2
    },
    {
      type: 'pie',
      radius: 50,
      center: ['75%', '50%'],
      datasetIndex: 3
    }
  ]
};var option = {
  dataset: [
    {
      // This dataset is on `datasetIndex: 0`.
      source: [
        ['Product', 'Sales', 'Price', 'Year'],
        ['Cake', 123, 32, 2011],
        ['Cereal', 231, 14, 2011],
        ['Tofu', 235, 5, 2011],
        ['Dumpling', 341, 25, 2011],
        ['Biscuit', 122, 29, 2011],
        ['Cake', 143, 30, 2012],
        ['Cereal', 201, 19, 2012],
        ['Tofu', 255, 7, 2012],
        ['Dumpling', 241, 27, 2012],
        ['Biscuit', 102, 34, 2012],
        ['Cake', 153, 28, 2013],
        ['Cereal', 181, 21, 2013],
        ['Tofu', 395, 4, 2013],
        ['Dumpling', 281, 31, 2013],
        ['Biscuit', 92, 39, 2013],
        ['Cake', 223, 29, 2014],
        ['Cereal', 211, 17, 2014],
        ['Tofu', 345, 3, 2014],
        ['Dumpling', 211, 35, 2014],
        ['Biscuit', 72, 24, 2014]
      ]
      // id: 'a'
    },
    {
      // This dataset is on `datasetIndex: 1`.
      // A `transform` is configured to indicate that the
      // final data of this dataset is transformed via this
      // transform function.
      transform: {
        type: 'filter',
        config: { dimension: 'Year', value: 2011 }
      }
      // There can be optional properties `fromDatasetIndex` or `fromDatasetId`
      // to indicate that where is the input data of the transform from.
      // For example, `fromDatasetIndex: 0` specify the input data is from
      // the dataset on `datasetIndex: 0`, or `fromDatasetId: 'a'` specify the
      // input data is from the dataset having `id: 'a'`.
      // [DEFAULT_RULE]
      // If both `fromDatasetIndex` and `fromDatasetId` are omitted,
      // `fromDatasetIndex: 0` are used by default.
    },
    {
      // This dataset is on `datasetIndex: 2`.
      // Similarly, if neither `fromDatasetIndex` nor `fromDatasetId` is
      // specified, `fromDatasetIndex: 0` is used by default
      transform: {
        // The "filter" transform filters and gets data items only match
        // the given condition in property `config`.
        type: 'filter',
        // Transforms has a property `config`. In this "filter" transform,
        // the `config` specify the condition that each result data item
        // should be satisfied. In this case, this transform get all of
        // the data items that the value on dimension "Year" equals to 2012.
        config: { dimension: 'Year', value: 2012 }
      }
    },
    {
      // This dataset is on `datasetIndex: 3`
      transform: {
        type: 'filter',
        config: { dimension: 'Year', value: 2013 }
      }
    }
  ],
  series: [
    {
      type: 'pie',
      radius: 50,
      center: ['25%', '50%'],
      // In this case, each "pie" series reference to a dataset that has
      // the result of its "filter" transform.
      datasetIndex: 1
    },
    {
      type: 'pie',
      radius: 50,
      center: ['50%', '50%'],
      datasetIndex: 2
    },
    {
      type: 'pie',
      radius: 50,
      center: ['75%', '50%'],
      datasetIndex: 3
    }
  ]
};

在线示例

让我们总结一下使用数据转换的关键点

通过在一些空白数据集中声明 transform、fromDatasetIndex/fromDatasetId，从现有的声明数据生成新的数据。
系列引用这些数据集来显示结果。

高级用法

管道转换

有一种语法糖可以像管道一样连接转换，例如

option = {
  dataset: [
    {
      source: [] // The original data
    },
    {
      // Declare transforms in an array to pipe multiple transforms,
      // which makes them execute one by one and take the output of
      // the previous transform as the input of the next transform.
      transform: [
        {
          type: 'filter',
          config: { dimension: 'Product', value: 'Tofu' }
        },
        {
          type: 'sort',
          config: { dimension: 'Year', order: 'desc' }
        }
      ]
    }
  ],
  series: {
    type: 'pie',
    // Display the result of the piped transform.
    datasetIndex: 1
  }
};

注意：理论上，任何类型的转换都能够有多个输入数据和多个输出数据。但是，当一个转换被管道化时，它只能接受一个输入（除非它是管道的第一个转换）并产生一个输出（除非它是管道的最后一个转换）。

输出多个数据

在大多数情况下，转换函数只需要生成一个数据。但是，确实存在转换函数需要生成多个数据的情况，其中每个数据可能被不同的系列使用。

例如，在内置的箱线图转换中，除了生成的箱线图数据外，还生成了离群值数据，这些数据可以在散点图中使用。请参阅示例。

我们使用属性 dataset.fromTransformResult 来满足此要求。例如

option = {
  dataset: [
    {
      // Original source data.
      source: []
    },
    {
      transform: {
        type: 'boxplot'
      }
      // After this "boxplot transform" two result data generated:
      // result[0]: The boxplot data
      // result[1]: The outlier data
      // By default, when series or other dataset reference this dataset,
      // only result[0] can be visited.
      // If we need to visit result[1], we have to use another dataset
      // as follows:
    },
    {
      // This extra dataset references the dataset above, and retrieves
      // the result[1] as its own data. Thus series or other dataset can
      // reference this dataset to get the data from result[1].
      fromDatasetIndex: 1,
      fromTransformResult: 1
    }
  ],
  xAxis: {
    type: 'category'
  },
  yAxis: {},
  series: [
    {
      name: 'boxplot',
      type: 'boxplot',
      // Reference the data from result[0].
      datasetIndex: 1
    },
    {
      name: 'outlier',
      type: 'scatter',
      // Reference the data from result[1].
      datasetIndex: 2
    }
  ]
};

更重要的是，dataset.fromTransformResult 和 dataset.transform 都可以出现在一个数据集中，这意味着转换的输入是从 fromTransformResult 指定的上游结果中检索的。例如

{
  fromDatasetIndex: 1,
  fromTransformResult: 1,
  transform: {
    type: 'sort',
    config: { dimension: 2, order: 'desc' }
  }
}

在开发环境中调试

在使用数据转换时，我们可能会遇到最终图表显示不正确的问题，但我们不知道配置哪里出错了。在这种情况下，属性 transform.print 可能会有所帮助。（transform.print 仅在开发环境中可用）。

option = {
  dataset: [
    {
      source: []
    },
    {
      transform: {
        type: 'filter',
        config: {},
        // The result of this transform will be printed
        // in dev tool via `console.log`.
        print: true
      }
    }
  ]
};

筛选转换

转换类型“filter”是一种内置转换，它根据指定的条件提供数据筛选。基本选项如下

option = {
  dataset: [
    {
      source: [
        ['Product', 'Sales', 'Price', 'Year'],
        ['Cake', 123, 32, 2011],
        ['Latte', 231, 14, 2011],
        ['Tofu', 235, 5, 2011],
        ['Milk Tee', 341, 25, 2011],
        ['Porridge', 122, 29, 2011],
        ['Cake', 143, 30, 2012],
        ['Latte', 201, 19, 2012],
        ['Tofu', 255, 7, 2012],
        ['Milk Tee', 241, 27, 2012],
        ['Porridge', 102, 34, 2012],
        ['Cake', 153, 28, 2013],
        ['Latte', 181, 21, 2013],
        ['Tofu', 395, 4, 2013],
        ['Milk Tee', 281, 31, 2013],
        ['Porridge', 92, 39, 2013],
        ['Cake', 223, 29, 2014],
        ['Latte', 211, 17, 2014],
        ['Tofu', 345, 3, 2014],
        ['Milk Tee', 211, 35, 2014],
        ['Porridge', 72, 24, 2014]
      ]
    },
    {
      transform: {
        type: 'filter',
        config: { dimension: 'Year', '=': 2011 }
        // The config is the "condition" of this filter.
        // This transform traverse the source data and
        // and retrieve all the items that the "Year"
        // is `2011`.
      }
    }
  ],
  series: {
    type: 'pie',
    datasetIndex: 1
  }
};option = {
  dataset: [
    {
      source: [
        ['Product', 'Sales', 'Price', 'Year'],
        ['Cake', 123, 32, 2011],
        ['Latte', 231, 14, 2011],
        ['Tofu', 235, 5, 2011],
        ['Milk Tee', 341, 25, 2011],
        ['Porridge', 122, 29, 2011],
        ['Cake', 143, 30, 2012],
        ['Latte', 201, 19, 2012],
        ['Tofu', 255, 7, 2012],
        ['Milk Tee', 241, 27, 2012],
        ['Porridge', 102, 34, 2012],
        ['Cake', 153, 28, 2013],
        ['Latte', 181, 21, 2013],
        ['Tofu', 395, 4, 2013],
        ['Milk Tee', 281, 31, 2013],
        ['Porridge', 92, 39, 2013],
        ['Cake', 223, 29, 2014],
        ['Latte', 211, 17, 2014],
        ['Tofu', 345, 3, 2014],
        ['Milk Tee', 211, 35, 2014],
        ['Porridge', 72, 24, 2014]
      ]
    },
    {
      transform: {
        type: 'filter',
        config: { dimension: 'Year', '=': 2011 }
        // The config is the "condition" of this filter.
        // This transform traverse the source data and
        // and retrieve all the items that the "Year"
        // is `2011`.
      }
    }
  ],
  series: {
    type: 'pie',
    datasetIndex: 1
  }
};

在线示例

这是筛选转换的另一个例子

关于维度

config.dimension 可以是

在数据集中声明的维度名称，例如 config: { dimension: 'Year', '=': 2011 }。维度名称声明不是强制性的。
维度索引（从 0 开始），例如 config: { dimension: 3, '=': 2011 }。

关于关系运算符

关系运算符可以是：> (gt)、>= (gte)、< (lt)、<= (lte)、= (eq)、!= (ne, <>)、reg。（括号中的名称是别名）。它们遵循通用的语义。除了常见的数字比较外，还有一些额外的功能

多个运算符可以出现在一个 {} 项中，例如 { dimension: 'Price', '>=': 20, '<': 30 }，这意味着逻辑“与”（Price >= 20 并且 Price < 30）。
数据值可以是“数字字符串”。数字字符串是可以转换为数字的字符串。例如 ' 123 '。在转换过程中，空格和换行符将自动修剪。
如果我们需要比较 “JS Date 实例”或日期字符串（如 '2012-05-12'），我们需要手动指定 parser: 'time'，例如 config: { dimension: 3, lt: '2012-05-12', parser: 'time' }。
支持纯字符串比较，但只能用于 =、!=。>、>=、<、<= 不支持纯字符串比较（这四个运算符的“右值”不能是“字符串”）。
运算符 reg 可以用于进行正则表达式测试。例如使用 { dimension: 'Name', reg: /\s+Müller\s*$/ } 来选择“Name”维度包含姓氏 Müller 的所有数据项。

关于逻辑关系

有时我们还需要表达逻辑关系（ and / or / not ）

option = {
  dataset: [
    {
      source: [
        // ...
      ]
    },
    {
      transform: {
        type: 'filter',
        config: {
          // Use operator "and".
          // Similarly, we can also use "or", "not" in the same place.
          // But "not" should be followed with a {...} rather than `[...]`.
          and: [
            { dimension: 'Year', '=': 2011 },
            { dimension: 'Price', '>=': 20, '<': 30 }
          ]
        }
        // The condition is "Year" is 2011 and "Price" is greater
        // or equal to 20 but less than 30.
      }
    }
  ],
  series: {
    type: 'pie',
    datasetIndex: 1
  }
};

and/or/not 可以嵌套，例如

transform: {
  type: 'filter',
  config: {
    or: [{
      and: [{
        dimension: 'Price', '>=': 10, '<': 20
      }, {
        dimension: 'Sales', '<': 100
      }, {
        not: { dimension: 'Product', '=': 'Tofu' }
      }]
    }, {
      and: [{
        dimension: 'Price', '>=': 10, '<': 20
      }, {
        dimension: 'Sales', '<': 100
      }, {
        not: { dimension: 'Product', '=': 'Cake' }
      }]
    }]
  }
}

关于解析器

在进行值比较时，可以指定一些“解析器”。目前仅支持

parser: 'time'：在比较之前将值解析为日期时间。解析规则与 echarts.time.parse 相同，其中 JS Date 实例、时间戳数字（以毫秒为单位）和时间字符串（如 '2012-05-12 03:11:22'）都支持解析为时间戳数字，而其他值将被解析为 NaN。
parser: 'trim'：在进行比较之前修剪字符串。对于非字符串，返回原始值。
parser: 'number'：强制在比较之前将值转换为数字。如果无法转换为有意义的数字，则转换为 NaN。在大多数情况下，这不是必需的，因为默认情况下，如果可能，该值将在比较之前自动转换为数字。但是，默认转换很严格，而此解析器提供了宽松的策略。如果我们遇到带有单位后缀的数字字符串（例如 '33%'、12px），则应使用 parser: 'number' 将其转换为数字，然后再进行比较。

这是一个显示 parser: 'time' 的示例

option = {
  dataset: [
    {
      source: [
        ['Product', 'Sales', 'Price', 'Date'],
        ['Milk Tee', 311, 21, '2012-05-12'],
        ['Cake', 135, 28, '2012-05-22'],
        ['Latte', 262, 36, '2012-06-02'],
        ['Milk Tee', 359, 21, '2012-06-22'],
        ['Cake', 121, 28, '2012-07-02'],
        ['Latte', 271, 36, '2012-06-22']
        // ...
      ]
    },
    {
      transform: {
        type: 'filter',
        config: {
          dimension: 'Date',
          '>=': '2012-05',
          '<': '2012-06',
          parser: 'time'
        }
      }
    }
  ]
};

正式定义

最后，我们在此处给出筛选转换配置的正式定义

type FilterTransform = {
  type: 'filter';
  config: ConditionalExpressionOption;
};
type ConditionalExpressionOption =
  | true
  | false
  | RelationalExpressionOption
  | LogicalExpressionOption;
type RelationalExpressionOption = {
  dimension: DimensionName | DimensionIndex;
  parser?: 'time' | 'trim' | 'number';
  lt?: DataValue; // less than
  lte?: DataValue; // less than or equal
  gt?: DataValue; // greater than
  gte?: DataValue; // greater than or equal
  eq?: DataValue; // equal
  ne?: DataValue; // not equal
  '<'?: DataValue; // lt
  '<='?: DataValue; // lte
  '>'?: DataValue; // gt
  '>='?: DataValue; // gte
  '='?: DataValue; // eq
  '!='?: DataValue; // ne
  '<>'?: DataValue; // ne (SQL style)
  reg?: RegExp | string; // RegExp
};
type LogicalExpressionOption = {
  and?: ConditionalExpressionOption[];
  or?: ConditionalExpressionOption[];
  not?: ConditionalExpressionOption;
};
type DataValue = string | number | Date;
type DimensionName = string;
type DimensionIndex = number;

请注意，当使用最小化捆绑包时，如果需要使用此内置转换，除了 Dataset 组件外，还需要导入 Transform 组件。

import {
  DatasetComponent,
  TransformComponent
} from 'echarts/components';

echarts.use([
  DatasetComponent,
  TransformComponent
]);

排序转换

另一个内置转换是“sort”。

option = {
  dataset: [
    {
      dimensions: ['name', 'age', 'profession', 'score', 'date'],
      source: [
        [' Hannah Krause ', 41, 'Engineer', 314, '2011-02-12'],
        ['Zhao Qian ', 20, 'Teacher', 351, '2011-03-01'],
        [' Jasmin Krause ', 52, 'Musician', 287, '2011-02-14'],
        ['Li Lei', 37, 'Teacher', 219, '2011-02-18'],
        [' Karle Neumann ', 25, 'Engineer', 253, '2011-04-02'],
        [' Adrian Groß', 19, 'Teacher', null, '2011-01-16'],
        ['Mia Neumann', 71, 'Engineer', 165, '2011-03-19'],
        [' Böhm Fuchs', 36, 'Musician', 318, '2011-02-24'],
        ['Han Meimei ', 67, 'Engineer', 366, '2011-03-12']
      ]
    },
    {
      transform: {
        type: 'sort',
        // Sort by score.
        config: { dimension: 'score', order: 'asc' }
      }
    }
  ],
  series: {
    type: 'bar',
    datasetIndex: 1
  }
  // ...
};

关于“排序转换”的一些额外特性

支持按多个维度排序。请参阅下面的示例。
排序规则
- 默认情况下，“数字”（即数字和数字字符串，例如 ' 123 '）可以按数字顺序排序。
- 否则，“非数字字符串”也可以在它们之间进行排序。这可能有助于将具有相同标签的数据项分组的情况，尤其是在多个维度参与排序时（请参阅下面的示例）。
- 当“数字”与“非数字字符串”进行比较时，或者其中任何一个与其他类型的值进行比较时，它们是不可比较的。因此，我们将后者称为“不可比较”，并根据属性 incomparable: 'min' | 'max' 将其视为“最小值”或“最大值”。此功能通常有助于决定是否将空值（如 null、undefined、NaN、''、'-'）或其他非法值放在开头或结尾。
可以使用 parser: 'time' | 'trim' | 'number'，与“筛选转换”相同。
- 如果打算对时间值（JS Date 实例或时间字符串，如 '2012-03-12 11:13:54'）进行排序，则应指定 parser: 'time'。例如 config: { dimension: 'date', order: 'desc', parser: 'time' }
- 如果打算对带有单位后缀的值（如 '33%'、'16px'）进行排序，则需要使用 parser: 'number'。

请参阅多个排序的示例

option = {
  dataset: [
    {
      dimensions: ['name', 'age', 'profession', 'score', 'date'],
      source: [
        [' Hannah Krause ', 41, 'Engineer', 314, '2011-02-12'],
        ['Zhao Qian ', 20, 'Teacher', 351, '2011-03-01'],
        [' Jasmin Krause ', 52, 'Musician', 287, '2011-02-14'],
        ['Li Lei', 37, 'Teacher', 219, '2011-02-18'],
        [' Karle Neumann ', 25, 'Engineer', 253, '2011-04-02'],
        [' Adrian Groß', 19, 'Teacher', null, '2011-01-16'],
        ['Mia Neumann', 71, 'Engineer', 165, '2011-03-19'],
        [' Böhm Fuchs', 36, 'Musician', 318, '2011-02-24'],
        ['Han Meimei ', 67, 'Engineer', 366, '2011-03-12']
      ]
    },
    {
      transform: {
        type: 'sort',
        config: [
          // Sort by the two dimensions.
          { dimension: 'profession', order: 'desc' },
          { dimension: 'score', order: 'desc' }
        ]
      }
    }
  ],
  series: {
    type: 'bar',
    datasetIndex: 1
  }
  // ...
};

最后，我们在此给出排序转换配置的正式定义。

type SortTransform = {
  type: 'sort';
  config: OrderExpression | OrderExpression[];
};
type OrderExpression = {
  dimension: DimensionName | DimensionIndex;
  order: 'asc' | 'desc';
  incomparable?: 'min' | 'max';
  parser?: 'time' | 'trim' | 'number';
};
type DimensionName = string;
type DimensionIndex = number;

请注意，当使用最小化捆绑包时，如果需要使用此内置转换，除了 Dataset 组件外，还需要导入 Transform 组件。

import {
  DatasetComponent,
  TransformComponent
} from 'echarts/components';

echarts.use([
  DatasetComponent,
  TransformComponent
]);

使用外部转换

除了内置的转换（如“filter”、“sort”）之外，我们还可以使用外部转换来提供更强大的功能。这里我们以第三方库 ecStat 为例。

这个例子展示了如何通过 ecStat 生成回归线。

// Register the external transform at first.
echarts.registerTransform(ecStatTransform(ecStat).regression);

option = {
  dataset: [
    {
      source: rawData
    },
    {
      transform: {
        // Reference the registered external transform.
        // Note that external transform has a namespace (like 'ecStat:xxx'
        // has namespace 'ecStat').
        // built-in transform (like 'filter', 'sort') does not have a namespace.
        type: 'ecStat:regression',
        config: {
          // Parameters needed by the external transform.
          method: 'exponential'
        }
      }
    }
  ],
  xAxis: { type: 'category' },
  yAxis: {},
  series: [
    {
      name: 'scatter',
      type: 'scatter',
      datasetIndex: 0
    },
    {
      name: 'regression',
      type: 'line',
      symbol: 'none',
      datasetIndex: 1
    }
  ]
};