数据挖掘查询语言

数据挖掘查询语言提出由Han, Fu, Wang等DBMiner 数据挖掘系统。数据挖掘查询语言实际上是基于结构化查询语言（SQL）。数据挖掘查询语言可以设计为支持ad hoc和交互式数据挖掘。DMQL提供的命令来指定原语。DMQL可以与数据库中的数据仓库正常工作。数据挖掘查询语言可以用来定义数据挖掘任务。特别是我们研究如何定义数据挖掘查询语言数据仓库和数据集市。

任务相关的数据的语法规范

这里是DMQL的指定任务相关的数据的语法：

use database database_name, 
or 
use data warehouse data_warehouse_name
in relevance to att_or_dim_list
from relation(s)/cube(s) [where condition]
order by order_list
group by grouping_list

指定类型的知识语法

在这里，我们将讨论的语法特征，辨析，关联，分类和预测。

表征

特征语法是：

mine characteristics [as pattern_name]
  analyze  {measure(s) }
The analyze clause, specifies aggregate measures, such as count, sum, or count%.
For example:
Description describing customer purchasing habits.
mine characteristics as customerPurchasing
analyze count%

判别

判别语法是：

mine comparison [as {pattern_name]}
For {target_class } where  {t arget_condition } 
{versus  {contrast_class_i }
where {contrast_condition_i}}  
analyze  {measure(s) }

例如，用户可以定义bigSpenders作为购买物品的售价为100美元或以上的平均水平，budgetSpenders作为谁在低于100美元，平均购买商品的客户的客户。判别描述从每一类客户的挖掘可以在DMQL作为被指定：

mine comparison as purchaseGroups
for bigSpenders where avg(I.price) ≥$100
versus budgetSpenders where avg(I.price)< $100
analyze count

关联

关联的语法是：

mine associations [ as {pattern_name} ]
{matching {metapattern} }

实例：

mine associations as buyingHabits
matching P(X:customer,W) ^ Q(X,Y) ≥ buys(X,Z)

注：其中，X是客户关系的关键，P和Q是谓词变量和W，Y和Z是对象变量。

分类

分类的语法是：

mine classification [as pattern_name]
analyze classifying_attribute_or_dimension

例如，矿山模式进行分类客户信用评级，其中类由属性credit_rating确定，矿山划分为classifyCustomerCreditRating

analyze credit_rating

预测

预测的语法是：

mine prediction [as pattern_name]
analyze prediction_attribute_or_dimension
{set {attribute_or_dimension_i= value_i}}

概念层次规格语法

指定要使用什么概念层次：

use hierarchy <hierarchy> for <attribute_or_dimension>

我们使用不同的语法来定义不同的类型层次结构，如：

-schema hierarchies
define hierarchy time_hierarchy on date as [date,month quarter,year]
-
set-grouping hierarchies
define hierarchy age_hierarchy for age on customer as
level1: {young, middle_aged, senior} < level0: all
level2: {20, ..., 39} < level1: young
level3: {40, ..., 59} < level1: middle_aged
level4: {60, ..., 89} < level1: senior
-operation-derived hierarchies
define hierarchy age_hierarchy  for age  on customer  as
{age_category(1), ..., age_category(5)} 
:= cluster(default, age, 5) < all(age)
-rule-based hierarchies
define hierarchy profit_margin_hierarchy  on item  as
level_1: low_profit_margin < level_0:  all
if (price - cost)< $50
   level_1:  medium-profit_margin < level_0:  all
if ((price - cost) > $50)  and ((price - cost) ≤ $250)) 
   level_1:  high_profit_margin < level_0:  all

兴趣度度量规范语法

兴趣度度量和阈值可通过指定的语句的用户：

with <interest_measure_name>  threshold = threshold_value

实例：

with support threshold = 0.05
with confidence threshold = 0.7

格局呈报及可视化规约语法

我们有自己的语法，它允许用户指定一个或多个形式发现的模式的显示。

display as <result_form>

实例：

display as table

DMQL全规格

作为一家公司的市场部经理，你想描绘谁购买售价不低于100美元的物品，WRT顾客的年龄，购买类型的项目，与发生在哪一个项目是做顾客的购买习惯。你想知道客户具有该特性的百分比。特别是，只关心在加拿大制造，及与美国运通（“美国运通”）信用卡支付购买。你想查看的一个表的形式所得到的描述。

use database AllElectronics_db
use hierarchy location_hierarchy for B.address
mine characteristics as customerPurchasing
analyze count%
in relevance to C.age,I.type,I.place_made
from customer C, item I, purchase P, items_sold S,  branch B
where I.item_ID = S.item_ID and P.cust_ID = C.cust_ID and
P.method_paid = "AmEx" and B.address = "Canada" and I.price ≥ 100
with noise threshold = 5%
display as table

数据挖掘语言的标准化

标准化的数据挖掘语言将达到以下目的：

数据挖掘解决方案的系统开发。
提高互操作性多个数据挖掘系统和功能之一。
推动教育。
推广使用在行业和社会数据挖掘系统。

上一篇：数据挖掘系统下一篇：数据挖掘分类与预测

任务相关的数据的语法规范

指定类型的知识语法

表征

判别

关联

分类

预测

概念层次规格语法

兴趣度度量规范语法

格局呈报及可视化规约语法

DMQL全规格

数据挖掘语言的标准化

HTML / CSS

脚本语言

高级语言

Java技术

XML技术

大数据

开发工具

框架

软件测试

前端技术

数据库

其他技术