Data-Analytic Thinking (1)

Inrtoduction, 데이터사이언스개론(22-03-07)

by lawtech

Mar 11. 2023

| 목차

I. 데이터 사이언스의 의의

1. 데이터 사이언스란 무엇인가?

2. 데이터 사이언티스트란 누구인가?

3. 데이터 사이언스 현황과 예 소개

II. 데이터 사이언스의 성질

1. 데이터 사이언스와 데이터 마이닝의 비교

2. 데이터 사이언스와 데이터 엔지니어링의 비교

III. 데이터 사이언스의 적용

1. 데이터 사이언스와 빅데이터

2. 데이터 기반 의사결정

3. 데이터 분석적 사고와 활용

*가급적 영어 본문 자체로 이해하시는 것이 좋습니다. 정확한 정보전달이 목적이라, 나중에 영어 회화나 작문 관련 글도 써 보겠습니다. 대부분은 해석 없이 영어로 썼습니다.

I. 데이터 사이언스의 의의

1. 데이터 사이언스란 무엇인가?

An 'interdisciplinary' field
to discover and extract knowledge or insights
from data (» data mining)

데이터 사이언스란 데이터로부터 지식이나 통찰을 발견하고 추출하는 '학제적' 분야이다.

그림에서 알 수 있듯, 컴퓨터 사이언스와 수학&통계, 비지니스 영역의 전문지식이라는 각 분야의 핵심이 된다.

As you know in this pictures, It is kind of key of the Computer Science, Maths&Statistics and Business/Domain Expertise field.

-

2. 데이터 사이언티스트란 누구인가?

Data Scientist: Who uses analytical and technical capabilities to extract insights from data.

Data Engineer: Who designs and develops software and systems for managing data.

Statistician: Who applies statistical theories and methods to solve real life problems.

데이터 사이언티스트는 데이터로부터 인사이트를 추출할 논리적 기술적 능력을 사용하는 사람들이기에 단순히 통계학 이론이나 방법론을 적용하는 통계학자나 소프트웨어나 시스템을 개발하는 데이터 엔지니어와도 구별된다.

-

3. 데이터 사이언스의 현황과 예 소개

과거에는 통계학자나 분석가가 수동으로 데이터 묶음들을 처리했다면, 현재는 데이터가 수동적인 분석을 초월할 정도로 많아졌다.

Now every espect of business is open to data collection, but it also tends to affect business infrastructures to exploit data from other competitive companies.

But everyone agrees with this flow of rising data science principles and techniques increasingly.

그러나 모든 이가 데이터 사이언스의 원리나 확산의 증가 흐름에 동의하지 않을 수는 없다.

Example 1. Hurricane Frances

It's an article of a New York Times from 2004.

In this story, Wal-Mart's chief information officer camp up with forecasts based on what had happened when Hurricane Charley struck several weeks earlier. and She found that the shopper's history could start predicting what would happen at that time.

Specifically, It's an example of data-driven predictions that would be more valuable to discover unobvious patterns. Here are some predictions of what is valuable or not.

1) People in the hurricane's path would buy more bottled water

Not useful, because it is a bit obvious.

2) A particular DVD sold out in the hurricane's path

Might not be useful, if i sold out at many Wal-Marts across the country.

3) The sales of bottled water would increase by 20%

Useful, because it can be used for local Wal-Marts to stock properly.

4) Strawberry Pop-Tarts increase in sales up to 7 times their normal rate

Very useful, because it is unusual demand that is hard to expect.

In my point of view, The more specific situation, the more clearly useful.

II. 데이터 사이언스의 성질

1. 데이터 사이언스와 데이터 마이닝의 비교

Data mining means the process of discovering patterns from large data sets and Generally it uses to analyze customer behavior. It's application fields are range from Tageted marketing, Online advertising, Recommendations for cross-selling, Credit scoring to Fraud detection.

Sometimes, We see that the two terms are used interchangeably in the data area.

From a wide perspective, Data Mining is a concept of Data Sciences and it is defined by the extraction of knowledge from data, via technologies that incorporate the principles of data science. But Data Science is included more extended conception that is called Data Analysis, which is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

In this context, Data Science can be redefined by a set of fundamental principles that guide the extraction of knowledge from data. and It is used more broadly than the traditional use of Data Mining.

-

2. 데이터 사이언스와 데이터 엔지니어링의 비교

Data Science is also differentiated from Data Engineering.

Supporting Data Science, Data Engineering designs, develops, and maintains Data processing systems such as Databases. But Data Science do collect, explore, and analyzes data, using data engineering technologies to access data.

III. 데이터 사이언스의 적용

1. 데이터 사이언스와 빅데이터

Big data is based on Data set that are too large for traditional data processing systems, and therefore require new processing technologies.

Including data engineering to support Data Science or Data Mining, Big data technologies used for many tasks like Hadoop, HBase, MongoDB, and Spark, etc.

2. 데이터 기반 의사결정

Example 1. Predicting Customer Churn

: How can you solve A telecom company MegaTelCo's Churn proplem?

[condition]

1) Generally 20% of cell phone customers leave when their contracts expire.

2) Since the cell phone market is now saturated, telecom companies are trying to retain their customers.

3) Customers switching from one company to another is called churn.

4) The company is planning to send a special retention offer to some customers prior to the expiration of their contracts.

How should we choose a set of customers
to send the offers in order to reduce churn
for a particular incentive budget?

That's why we have to discuss Data-Driven Decision.

Through Data Science, We can do Data-Driven Decision Making(DDD), which is based on Data Driven Thinking rather than purely on human intuition. Here are some examples on DDD.

ex 1. Advertisement selection

When you select to Advertise something, you might tend to be based on your experience and what you see. It's not DDD. Whereas DDD is based on the data analysis regarding how consumers react to different ads.

ex 2. Target vs. Wal-Mart

: Who does take the advantageous position first? It deals within a Data-Driven Decision.

Customers have shopping habits that are difficult to change. However, when a new baby is born, they can be changed. For instance, they will buy everything else once they buy diapers, It compares to the last buy list.

Most retailers already know this and send out special offers to the new parents.

Thus, the name of the market, 'Target' which is analyzing historical data on customers(Who do buy the diaper) predicts that people are expecting a baby. In this situation, they would gain an advantage by sending offers before their competitors do.

Actually, It is shown that statistically,
the more data-driven a company is,
the more productive it is,
with about a 4% to 6% increase in productivity.

ex 3. Automated DDD

: Followed by current computer systems, it can be possible to decide by Automated DDD.

Examples of a Automated DDD

1) Banks and telecommunication companies use this to setting fraud detection systems.

2) Retail companies use this for merchandising decision systems.

3) Amazon and Netflix use Automated recommendation systems.

4) Advertising companies use Automated real-time advertising decision systems.

너무 길어져서 다음 글에 마저 업로드합니다.

keyword

작가의 이전글Why do I study Data-science?Data Analytic Thinking (2)작가의 다음글