求teradata数据库管理系统的一些重要知识点

发布网友发布时间：2022-04-29 17:47

我来回答

共3个回答

懂视网时间：2022-04-08 01:16

Insert into table_name values (‘20141125’, ‘CA’, 0.57, 0.87)

Insert into table_name_1(flight_dt,airline_cd)

select * from table_name_2

Insert into table_name_1

select * from table_name_2

32.更新数据

Update table_name set flight=’20141125’,airline_cd=’CA’ Where sum_dt=’20141023’

33.数据删除

Delete from table_name where flight_dt=’20141125’

34.交易完整性

Teradata系统保持交易的完整，在缺省模式下，以分号结束的每个SQL语句都是一个完整的交易；也可以用BT和ET 显示的定义一个交易。

Select * from table_name_1 ;

Select * from table_name_2 ;

表示两个交易，一个失败不影响另一个的执行

BT ;

Select * from table_name_1 ;

Select * from table_name_2 ;

ET ;

表示一个交易，任何一个SQL语句失败，整个交易都会失败，系统将主动进行恢复处理。

35. group by 与where 同时使用时，group by 只对符合where限制的记录进行聚合。（在聚合之前，where已将不符合限制条件的记录删除）

在对聚合后的结果进行限制时使用 having （having是对聚合后的结果进行筛选）

Select avg（pax_nb）as pax_1 from table_name having pax_1>999

36.集合操作（在子查询中不能使用集合操作）

Union -------保持字段数目相同且对应字段的域兼容 . Union（自动去除重复的记录） union all（保留重复记录）

Intersect -----相交操作

Except 、 minus ----排它操作

37.视图 ----- 视图定义中不能使用 order by

Create view view_name as select * from table_name

Replace view view_name(flight_dt,rpk) as Select flight_dt , rpk from table_name group by 1

在关系数据库中，并不是所有的视图都是可更新的，因为有些视图的更新不能唯一地有意义地转换成对相应基本表的更新。

（1）若视图是由两个以上基本表导出的，则此视图不允许更新。

（2）若视图的字段来自字段表达式或常数，则不允许对此视图执行INSERT和UPDATE操作，但允许执行DELETE操作。

（3）若视图的字段来自集函数，则此视图不允许更新。

（4）若视图定义中含有GROUP BY子句，则此视图不允许更新。

（5）若视图定义中含有DISTINCT短语，则此视图不允许更新。

（6）若视图定义中有嵌套查询，并且内层查询的FROM子句中涉及的表也是导出该视图的基本表，则此视图不允许更新。

38. 系统日历

系统日历基础表 Sys_calendar.Caldates

当前日期字段： current_date (select current_date -----2014/11/27)

包含内容：

calendar_date DATE UNIQUE (标准Teradata日期)

day_of_week BYTEINT, (1-7，星期几，1代表星期天)

day_of_month BYTEINT, (1-31，本月中的第几号)

day_of_year SMALLINT, (1-366，本年中的第几天)

weekday_of_month BYTEINT, (本月中该星期几出现的次数)

week_of_month BYTEINT, (本月中第几周，以星期天到星期六为一周。0，表

示月的第一个不完整的周；1表示月的第一个完整的周)

week_of_year BYTEINT, (0-53) (本年中第几周，0表示第一个不完整的周)

month_of_quarter BYTEINT, (1-3，本季度中第几月)

month_of_year BYTEINT, (1-12，本年中第几月)

month_of_calendar INTEGER, (1-n，本日历中第几月，从1900年1月起)

quarter_of_year BYTEINT, (1-4，本年中第几季度)

quarter_of_calendar INTEGER, (本日历中第几季度，从1900年1月起)

year_of_calendar SMALLINT, (年份，从1900起)

39. 累计统计 csum( flight_number ,flight_dt)

以 flight_dt 对 flight_number 进行累计

在不同的航线上按 flight_dt 对flight_number 进行累计 ,即在不同航线上进行重新累计

Select csum( flight_number ,flight_dt) from table_name group by flight_class_name_cn

40.排序函数 rank(flight_profit) ----按flight_profit 的降序进行排序，最高的flight_profit 的序号为1

在不同的航空公司上进行排序：group by 限制排序的组别

Select airline_cd , route_cd , rank(flight_profit) from table_name where flight_dt=’20141127’ group by airline_cd

------为不同的航空公司统计其航线收入排序

对rank后的结果进行条件限制： qualify

Quality 是对列表取前几列，不看具体的值 Quality>10 是取列表中的前10个，不是取排名的前10

按降序排 rank（flight_profit ASC）此时Quality>10 就是取列表中的前10行（即flight_profit最低的前10个）

Select rank(flight_profit) as rank_1 from table_name

where flight_dt=’20141127’ group by airline_cd

qualify rank_1>10

选择排名在前10的数据

41.分位数（按照order_list 进行升序排列，每条记录处于位置分位数）

Quantile （quantile_constant , order_list）

quantile_constant ---à定义分位数大小的常量如100—百分位数，4---四分位数

order_list ---à用于分割和排序的列

Quantile （quantile_constant , order_list ASC）按照order_list的降序排列，最大的在最前面，分位数最小(为0)

Quantile （quantile_constant , order_list_1 ,order_list_2）当两条记录的order_list1值一样时，按照order_list2的值进行升序排序，然后确定相应的分位数

42.数据采样

Select * from table_name sample 1000 -------采样1000条数据

Select * from table_name sample 0.25 -------采样25%的数据

43.数据导入库

把外部数据文件保存成CSV格式的excel文件，把CSV文件再保存为txt文本文件

在库里建立待导入数据的新表

File ---> import data

Insert into new_table_name (‘字段1’，‘字段2’，’字段3’) values(?,?,?)

44.查询表中的重复数据

Select flight_dt, airline_cd,rpk,ask from table_name where flight_dt=’20141128’ group by 1,2,3,4 having count(*)>1

45.可变临时表

可以使用 HELP VOLATILE TABLE 命令获得存在于会话中的所有可变临时表

的信息。（注意：HELP DATABASE命令不会显示可变临时表，因为数据字典没有记录可变临时表。）

可变临时表不能：使用存取日志、改名、使用 Multiload 或Fastload实用程序装载

46.非唯一次索引 UNSI

非唯一次索引(NUSI)是Teradata的一种索引，非主索引，索引的列值允许不唯一。典型地，在WHERE子句中使用索引的列，将提高查询性能。创建非唯一次索引，可以使用CREATE TABLE语法与表一起创建，也可以使用CREATE INDEX语法在建表后创建。如果索引不再需要了，可以使用DROP INDEX删除索引。

创建了非唯一次索引，每个AMP上都建立了一个子表。子表中存储了一些记录，包含每个索引值和基础表记录的记录号(row-id)，子表中记录按照索引值的哈稀值排序存储。这样，按照索引值来查找记录非常方便，但是进行范围搜索，索引就没有用了。例如，使用上面的索引，查询工作代码为122100的雇员，索引起作用；查询工作代码在122000和123000之间的雇员，索引不起作用。

创建非唯一次索引：

(1) 在建表语句后面直接加上 index（字段1名，字段2名….）

(2) 在表已经存在的情况下创建非唯一次索引

Create index （字段名） on table_name

删除某表的非唯一次索引：

Drop index（字段名） on table_name

--------按值排序的非唯一次索引

按值排序的非唯一次索引(Value Ordered NUSI)的索引子表按数据值存储记录，而不是哈稀值。在按照范围查询时，这种索引非常有用。

(1) 在建表语句后直接加 index（字段名） order by values（字段名）

(2) 表已存在情况下

Create index（字段名） order by values（字段名） on table_name

按值排序的非唯一次索引的列必须是：

! 单一的列

! 属于索引定义中的列

! 数字列 – 不允许非数字列

! 长度不能大于4个字节 – INT, SMALLINT, BYTEINT, DATE, DEC是有效的。

注：虽然允许DECIMAL数据类型，但长度不能超过4个字节，不能有小数。

--------连接索引

连接索引是一种能够提高某些类型查询的性能的索引技术，可以包含一个或多个表中的列。连接索引被创建后，由优化器决定是否使用，用户不能直接访问。

连接索引的目的，是从索引子表提供数据，避免访问基础表。

CREATE JOIN INDEX cust_ord_ix AS

SELECT (c.cust_id, cust_name) , (order_id, order_status, order_date)

FROM customer c INNER JOIN orders o

ON c.cust_id = o.cust_id

PRIMARY INDEX (cust_id); ---------为连接索引赋予的主索引（缺省时默认第一列为主索引）

连接索引包括两部分：固定部分(第一个括号内) 和可重复部分 (第二个括号内)。

47.外部数据加载（数据量不大，字段较少）

（1）将外部的CSV类型的数据文件保存为文本文件 txt

------确保 Teradata assistant 可识别的分隔符为逗号

Tools---Options---Export/Import Data ----选择逗号

（2）先建立待导入数据的空表----Import Data -----加载语句

Insert into ptest.corp_name (sort_num,corp_name) values (?,?) ;

48. partition by order by

rank() over (partition by calss order by age )

(按照分组字段对记录进行排序) 先按照班级分组，然后在每个班级中按照age排序

row_number() over (partition by class order by age )

(按照分组字段对记录进行排序) 先按照班级分组，然后在每个班级中按照age排序

sum() over (partition by calss order by score )

(按照分组字段对记录进行聚合) 先按照班级分组，然后对分数求sum

49．Explain (select ……) 返回一个SQL经过优化处理后的执行步骤，只是执行的步骤，并未真正的执行

explain(select airline_cd , dep_airport_cd,arr_airport_cd,sum(pax_num) AS PAX ,sum( gross_pax_rev )

AS REV from PMART.APP_OTH_AIR_BILL

where substr(summ_dt,1,4)=‘2013‘ and airline_cd=‘UA‘ group by 1,2,3 ORDER BY 2,3)

1) First, we lock a distinct PMART."pseudo table" for read on a

RowHash to prevent global deadlock for PMART.APP_OTH_AIR_BILL.

2) Next, we lock PMART.APP_OTH_AIR_BILL for read.

3) We do an all-AMPs SUM step to aggregate from

PMART.APP_OTH_AIR_BILL by way of an all-rows scan with a condition

of ("(PMART.APP_OTH_AIR_BILL.Airline_Cd = ‘UA‘) AND

((SUBSTR(PMART.APP_OTH_AIR_BILL.summ_dt ,1 ,4 ))= ‘2013‘)")

, grouping by field1 ( PMART.APP_OTH_AIR_BILL.Airline_Cd

,PMART.APP_OTH_AIR_BILL.DEP_AIRPORT_CD

,PMART.APP_OTH_AIR_BILL.ARR_AIRPORT_CD). Aggregate Intermediate

Results are computed globally, then placed in Spool 3. The size

of Spool 3 is estimated with no confidence to be 283,494 rows (

12,757,230 bytes). The estimated time for this step is 0.11

seconds.

4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of

an all-rows scan into Spool 1 (group_amps), which is built locally

on the AMPs. Then we do a SORT to order Spool 1 by the sort key

in spool field1 (PMART.APP_OTH_AIR_BILL.DEP_AIRPORT_CD,

PMART.APP_OTH_AIR_BILL.ARR_AIRPORT_CD). The size of Spool 1 is

estimated with no confidence to be 283,494 rows (14,741,688 bytes).

The estimated time for this step is 0.03 seconds.

5) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of Spool 1 are sent back to the user as the result of

statement 1. The total estimated time is 0.14 seconds.

SQL-Teradata基础

标签：

热心网友时间：2022-04-07 22:24

Company： A Well-known Americas E-Business
Position： Senior Data Engineer
Work region： Shanghai
Work content：
公司：美国一知名电子商务公司
职位：高级数据工程师
工作地点：上海
工作内容：

POSITION: Senior Data Engineer, Risk Infrastructure/Operations/Tech Support
职位：高级数据工程师，风险基础设施/业务/技术支持

JOB DUTIES:
Design, implement, maintain and give on-going proction support on large scale data-driven platforms and processes through analysis, creative solution design, integration, optimization, automation, monitoring, and trouble-shooting.
Help modelers/analysts/scientists/statisticians/biz-rule-writers convert ideas/logics to manageable operations and proction jobs with metrics.
Work with Modeling Team, Rule Team, Biz-strategy Team, Proct Management, and Engineering Team, etc, in supporting at all phases of risk/fraud models/rules projects.
Provide on-going proction support and monitoring to make all proction jobs reliable and smooth; be able to quickly identify and fixing root causes of proction problems.
Utilize SQL and database programming in analyzing massive and highly complex data sets, performing ad-hoc analysis and data manipulation.
Role is script heavy with emphasis on automation. Strong coding background preferred.
Work independently as well as in a team environment.

工作描述：
通过分析，制定、实施、维护和给予大规模数据驱动平台及流程支持；创造性解决方案制定、实施、集成、优化、自动化、监控和故障排除；辅助模型师/分析师/科学家/统计人员/商业策划师运用各项指标转换思路和逻辑来管理操作和生产；与模型组、规则组、商务战略团队、生产管理小组和工程组等部门合作，以支持风险/欺诈模式/规则项目在各个阶段的运行；提供持续生产过程的支持和监督，以确保所有生产工序可靠、顺畅；能够快速识别和解决生产问题；在分析大规模和高度复杂的数据集时，能利用SQL和数据库编程演示特色分析和数据处理模式。在自动化行业来说，脚本的工作是繁重的，拥有强大编码背景是首选，可以独立工作或小组合作。

Other Required：
Degree: Bachelor
Above 3 years related experience;
其它要求：
学位：学士
三年以上相关工作经验

MINIMUM REQUIREMENTS:
• BS / MS degree, or foreign equivalent, in Computer Science or a closely

热心网友时间：2022-04-07 23:42

TD有认证的，你可以去学的，第二门是考sql的。通常能接触到td的都是银行，电信这些大企业的工作人员吧，让公司出钱给你去报名贝。
特殊语法和函数很多，不可能一一列举，用到查手册贝。
个人感觉你还是先学好标准sql吧，像你所说地内联、外联的概念都是sql的基础概念，并不是td里的特殊概念。