brunch

라이킷 1 댓글

You can make anything
by writing

C.S.Lewis

계정을 잊어버리셨나요?

by Hyunjung Kang Mar 10. 2018

#1d1a 2018-02

2/1

Amazon Health

https://stratechery.com/2018/amazon-health/

Amazon Health

Amazon Health doesn’t seem like much now, but there are hints it could be the ultimate application of Aggregation Theory.

stratechery.com

아마존이 버크셔 헤서웨이, JP Morgan & Chase와 함께 미국의 임직원들을 위한 health care system을 만든다고.

그냥 직원 복지를 위한거냐.. 하면 그것만은 아닐거고, 결국 기술적인 standard를 만들게 될것이라고. 아마존이 한다는데 병원이든 약국이든 보험이든.. 맞추게 되겠죠.

우리나라에서 이걸 할 수 있는건 삼성밖에 없지 않을까.. 이걸 총대매고 할만한 동기가 있을지 잘 모르겠지만 (우리나라는 워낙 국민건강보험이 잘 되어있고, 회사간에 차이가 있지도 않고..) 표준을 정부 주도로 만들진 않았으면 좋겠다는 바람입니다.

이 글을 통해 알게된 재밌는 사실 중 하나는.. 미국의 직장 health benefit이 어마어마한 이유에는 World War 2가 한몫했다고.

전쟁으로 동원되는 젊은이들이 많아서 노동력 부족 -> 임금을 올려서 해결하려 함 -> inflation이 올 것을 두려워해서 나라에서 임금 인상을 막음 -> 그럼 임금은 아니고 복지를 듬뿍!

2/2

How Booking.com increases the power of online experiments with CUPED

https://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d?source=linkShare-2275f266a1e-1517531715

How Booking.com increases the power of online experiments with CUPED

Simon Jackson |Data Scientist at Booking.com

booking.ai

MS에서 나온 CUPID라는 방법을 이용해서 온라인 experiment 효과 분석을 하고 있는데, CUPID는(논문 링크 본문에 첨부됨), pre-experiment data를 이용해서 실험 데이터의 variance를 줄이는 방법이라고 합니다.
실험 결과가 유의하다고 보기 위해서는, 1 평균이 분산에 비해 크게 달라지거나, 2 평균의 변화는 크진 않지만 분산이 엄청 작아서 확실한 변화라고 느낄 수 있거나.. 해야 하는데, 이전 데이터를 사용해서 분산을 줄여주면 2, 작은 평균 변화도 더 잘 감지할 수 있다는

MS 논문

http://www.exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf

Netflix의 validation paper

http://www.kdd.org/kdd2016/papers/files/adp0945-xieA.pdf

논문 쓴 팀(?)의 웹사이트

http://exp-platform.com/

그들의 HBR article

https://hbr.org/2017/09/the-surprising-power-of-online-experiments

2/8

How to use ML to predict the quality of wines

https://medium.freecodecamp.org/using-machine-learning-to-predict-the-quality-of-wines-9e2e13d7480d?source=linkShare-2275f266a1e-1518101442

How to Use Machine Learning to Predict the Quality of Wines

In the Game of Machine Learning, you Regress or you Classify

medium.freecodecamp.org

ML에 대한 설명을 예시 들어가면서 해주고, 실제 코드로 돌려보는 것도 간단한 수준으로 보여줌. 스터디 하다가 못하고 있어서 까먹어가던거 환기시킬겸 하며 읽었음.

2/9

What Amazon does to wages

https://medium.com/@the_economist/what-amazon-does-to-wages-d1ae19d6d3ba?source=linkShare-2275f266a1e-1518134575

What Amazon does to wages – The Economist – Medium

Is the world’s largest online retailer underpaying its employees?

medium.com

아마존 물류센터의 직원들이 다른 곳보다 적은 임금을 받고 있다고. 여러가지 기술을 이용해 Idle time 없이 일하고 있음에도 임금은 낮은 편이라는데..

아마존 물류센터가 있는 지역이 원래 좀 떨어진다는 주장도 있지만 이건 아니라고 하고..

평균 연령대는 낮은 편. 숙련되지 않은 사람도 금방 똑같게 일하게 할 수 있어서

Health insurance 등의 혜택을 다 빵빵하게 받기 때문에 임금이 높지 않은거라고도 하고

아마존이 들어가 있는 곳은 해당 지역의 아마존 고용 의존도가 높아서 그렇다고도 합니다.

2/12

The Art of Effective Visualization of Multi-dimensional Data

https://towardsdatascience.com/the-art-of-effective-visualization-of-multi-dimensional-data-6c7202990c57?source=linkShare-2275f266a1e-1518443650

The Art of Effective Visualization of Multi-dimensional Data

Strategies for Effective Data Visualization

towardsdatascience.com

탐색적 분석을 할 때 유용하게 쓸만한, visualization tip 관련된 글. 비슷한 글 여러번 읽었는데 그중에서도 자세하고 좋은듯

2/13

The big losers at the Olympics are the host nations

https://capx.co/the-big-losers-at-the-olympics-are-the-host-nations/

The big losers at the Olympics are the host nations - CapX

The average Olympic Games exceeds its budget by 156 per cent

capx.co

올림픽은 손해다!!! 라는 글입니다.

관광효과? ㄴㄴ 딱히 늘지 않음

고용효과? ㄴㄴ 많지도 않고 대부분 temporary or already employed

만든 시설? 별로.. 다시 못씀

캐나다 76년 올림픽 비용 터는데 30년 걸림.

거의 유일하게 경제적 측면에서 잘된 올림픽은 84년 LA 올림픽. 대부분의 재정을 private 에서 끌어왔고.. 경기장 같은건 대학에 지어서 나중에 재활용

이런 기사도.. http://www.huffingtonpost.kr/entry/zimbalist_kr_5a83d043e4b02b66c513385c

평창올림픽 적자 규모가 상상을 초월할 수도 있다

"서울에서 2시간 떨어진 곳에 이런 돈을 투자하다니."

www.huffingtonpost.kr

2/14

The stages of the data organization

https://towardsdatascience.com/the-stages-of-the-data-organization-b3f4f0589716?source=linkShare-2275f266a1e-1518569302

The stages of the data organization – Towards Data Science

Background

towardsdatascience.com

회사의 규모에 따라 데이터 조직이 어떤 형태로 구성되면 좋은지 적은 글. Decentralized data team은 비효율적이고 alignment 를 맞추기 어렵다는 것에 동의합니다.

Young startup stage에서

The overall objective is to build the strong infrastructure as well as a shared knowledge base within the team, while staying relevant to the business.

인프라를 구축하고, 하나의 팀으로 일하다가 팀에서 비즈니스 팀의 요구를 들어주기 어려워졌을때 비즈니스 팀에 dedicated data person 을 고용하고, 원래의 데이터 팀은 data platform team의 형태로

사이즈가 더 커져서 Dedicated data person 이 아니라 business function data team 을 운영해야 할지도 (eng, ds, analyst). 그렇더라도 data vertical 에서 각자의 knowledge 를 공유하는게 중요함. 물론 platform team이 통일된 infra, language를 사용할 수 있도록 도와줘야 하구요

Giant 사이즈로 가는건.. 뭐 전세계적인 서비스 운영하는 페북 구글 .. 이런데 정도에서야 고민할만한 것 같아서 요약은 스킵. 링크로 Airbnb, uber 글은 읽어볼만함.

2/19

10 things I learned from Jason Fried about Building Products

https://uxplanet.org/10-things-i-learned-from-jason-fried-about-building-products-5b6694ff02aa?source=linkShare-2275f266a1e-1519015203

10 things I learned from Jason Fried about Building Products

Jason is a savage.

uxplanet.org

Jason Fried 인터뷰 내용을 정리한 글. 전반부에는 프로덕트에 대해, 후반부에는 프로덕트에 집중할 수 있는 work culture에 대해 다루고 있습니다.

- 내놓고 실제 운영해보기 전까지의 테스트는 의미가 없으니 내보내고 iteration 하는게 옳다 (iteration은 내보낸 이후에만 가능하다)

- testing 한다고 실패를 막을 수 있는 것도 아니다. (테스트에서 확인할 수 있는거랑 라이브 결과랑 다르다)

- changing software is easy! 요즘은 과거와 달리 업뎃 자주할 수 있다... (휴 그래도 어려워..)

- Work in the way that works for you

툴, 테스트 방법 등.. - 스스로에게 맞는 방법을 찾아라. 애플 구글 페북 이런애들이랑 비교하려고 하지 말고..

- Protect the time and attention of your team

오버타임 웤 하지말고 / 알아서 시간 관리하게 하고 / guilty 없이 recharge 하게 하고

- Make a call and move on

- 핑퐁 주고받으며 결정이 안되는 상황에서는. 그냥 담당자가 정한다. disagree and commit

뭐 이런 얘기입니다.. podcast (the product breakfast club) 재밌을거 같은데 들어봐야겠 ㅎㅎ

2/20

What is the most effective way to structure a data science team?

https://towardsdatascience.com/what-is-the-most-effective-way-to-structure-a-data-science-team-498041b88dae?source=linkShare-2275f266a1e-1519089067

What is the most effective way to structure a data science team?

From 2012 to 2017, I had the privilege to build the Data and Analytics organization at Coursera from scratch. Over that period of time, we…

towardsdatascience.com

저번주에 읽은 uber data scientist 글이랑 비슷한 내용인데 더 자세한 부분이 있어서 공유. 이 사람은 Coursera의 Head of Data and Analytics 였고요

decentralized vs. centralized DS team의 장단점

For startups, centralized teams tend to be more efficient headcount-wise due to flexibility in resourcing allocation.

Structurally, centralization also simplifies hiring and recruiting, creates agency to drive company-wide analytical initiatives, and reduces knowledge silos.

In some cases, this can lead to an unhealthy dynamic where data science is treated as a support function, answering questions from product managers rather than operating as true thought partners and proactively driving conversations from a data-informed perspective.

하이브리드 구조도 있다.

Most smaller companies tend to rely on a hybrid centralized/decentralized strategy that combines elements of the two strategies above. Generally, data scientists report centrally since recruiting and retaining talent is generally the primary bottleneck in building a data science team at the early stage. However, to ensure that data scientists are empowered to succeed, startups will often position data scientists to work closely with business units, a practice known as embedding.

그리고 DS에게 필요한 회사내 support에 대해서도 얘기하고 있는데요.

Data infrastructure engineering support.

뭐 당연한 얘기.. DS보다 Data engineer를 먼저 채용하라는 얘기 하고 있고.

Product and engineering managers who understand the complexities of building data products.

data product의 특성을 product & engineering에서 이해해줄 수 있어야 하고, 서로 도와야.

Strong executive buy-in

(사실 이게 제일 중요한 것 같은데) data가 중요하다고 말만 하는게 아니라.. 실제로 조직, 특히 executeive team에서 중요하게 생각하고 일을 할 수 있게 time & resource 를 지원해줘야 결국 strong culture가 만들어진다.

2/21

A Beginner’s Guide to Data Engineering — Part II

https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-ii-47c4e7cbda71?source=linkShare-2275f266a1e-1519172297

A Beginner’s Guide to Data Engineering — Part II – Towards Data Science

Data Modeling, Data Partitioning, Airflow, and ETL Best Practices

towardsdatascience.com

Star schema(https://en.wikipedia.org/wiki/Star_schema) 에 대해 설명하고 있고..

Data partitioning 에 관해서도 다루고 있슴다.

.sql 파일에서도 Jinja template 을 사용할 수 있다는건 몰랐네요..

Star schema라는건.. reference id만을 담은 fact table을 센터로 하고, 각종 정보는 dimension table에 넣는 방식.

Snowflake schema라는 것도 있는데 dimension table이 모두 normalized 되었다는게 다르다고..(http://www.vertabelo.com/blog/technical-articles/data-warehouse-modeling-star-schema-vs-snowflake-schema)

2/22

South Korea’s Most Dangerous Enemy: Demographics

https://www.nytimes.com/2018/02/20/magazine/south-koreas-most-dangerous-enemy-demographics.html

South Korea’s Most Dangerous Enemy: Demographics

The countries of East Asia may need to overcome their ethno-nationalistic resistance to immigration if they want to remain economic powerhouses.

www.nytimes.com

우리나라 demographic 문제가 심각하다는 뉴욕타임즈 글.

강원도의 경우 median age가 2045년, 60이 될거라고;;

나라 전체의 median은 1975년 19.6 -> 2015년 41.2 (edited)

1990년 외국인 인구가 5만명, 현재 2백만명 이상 (x40)

more than 5 percent of the children born in South Korea are now multiethnic.

By one estimate, foreign-born or multiethnic Koreans will make up an estimated 10 percent of the entire population by 2030.

Most South Koreans, Moon says, “are still very reluctant to entertain the possibility that immigration can be a dynamic, innovative force.”

주로 중국, 동남아로부터의 low-skilled worker의 이주, 결혼으로 인한 이주가 많이 되고 있는데 제도적인 뒷받침이 부족한 편이다. (그나마 조선족은 덜하고)

예를 들면 서울시의 citizenship 안주고 5년까지 일할 수 있게한 E.P.S 라는 제도는.. 외국인 노동자에게 불리, 고용주의 노예가 되게함 (직장 잃으면 돌아가야 하니깐).

교육에서도 계속 단군신화, 한민족 한핏줄 이런걸 강조하고 있는데.. 이게 이미 5% 넘는 multiethinic children born rate를 생각하면; 빨리 바뀌어야 할 거고..

2/23

Four Questions Every Marketplace Startup Should Be Able to Answer

https://medium.com/@jgolden/four-questions-every-marketplace-startup-should-be-able-to-answer-defb0590e049

Four Questions Every Marketplace Startup Should Be Able to Answer

The six years I spent building products at Airbnb, scaling the marketplace over 100X, transformed how I understand the work of starting a…

medium.com

좀 길지만 딱 봐도 좋은 글인 것 같아서 추천.

Airbnb의 Director of Product였던 Jonathan Golden이 쓴 글입니다. dropbox, greylock에서도 일했었네요.

Marketplace 스타텁이라면 꼭 고려해야할 요소 - network effects, type of supply, incentives, and size and frequency of interaction -에 대한 글

Marketplaced의 일반적인 장점이라면

- They involve low capital costs, as inventory is brought to the market from the suppliers.

- The market can self-correct by offering more of the good or service as buyers demand it more, making each marketplace function like a mini economy.

단점으로는..

- Because marketplaces need a certain amount of supply from day one, they are super hard to start.

- It is hard to control the quality of inventory.

예를 들면 옥션같은 플랫폼에서 살게 하나도 없으면 소비자가 들어오지 않고, airbnb에 방이 없으면 숙박 원하는 게스트 입장에서 매력을 못느끼듯이.. buyer가 들어올만큼의 충분한 supply를 가져오는 것 자체가 힘듭니다. 그래서 marketplace는 시작하기가 쉽지 않죠. 후발주자일 수록 더 심하구요.

국내만 봐도 11번가가 늦게 들어왔는데도 잘된 이례적인 케이스였어요. 그때 supply / demand 중 어디에 돈을 더 부어야 하나.. 갖고 case study를 했던 기억이 있네요. 11% 할인, 무료환불 등등 demand side에 더 파격적인 혜택을 줬었죠. supply쪽은 굳이 오픈마켓 여러개 돌리고 있는 판매자들이 11번가에 입점을 안할 이유는 없었을거라..

You may be able to solve that same problem, and build a great business, by controlling supply instead of providing a platform where supply is self-managed. On the other hand, while controlling supply might be easier at the outset, costs often prove more challenging at scale.

supply가 self-managed인 경우 quality control 문제를 해결할 수는 있겠지만, scale up 하면서 cost 도 증가한다는 문제가 있죠.

2/27

The WeWork Manifesto: First, Office Space. Next, the World.

https://medium.com/the-new-york-times/dea65ee90cb5?source=linkShare-2275f266a1e-1519666424

The WeWork Manifesto: First, Office Space. Next, the World.

The brash, ambitious founders of WeWork, a global network of shared office spaces, want nothing less than to transform the way we work…

medium.com

WeWork가 WeWork(오피스) 외에도 WeLive(주거시설), Rise(gym), WeGrow(보육시설, 런칭 준비중) 등 다양한 형태의 서비스를 하고 있다고. WeWork보다 더 많은 멤버와 부동산을 소유하고 있는 IWG (상장한 회사) 보다 WeWork이 10배의 valuation 을 가진 이유는 무엇일까..

창업자는 이게 부동산 비즈니스가 아니라고 “a generation of interconnected emotionally intelligent entrepreneur” 라고 주장합니다.

일 뿐만 아니라 사는 방식까지도 바꾸고자 하는.. 원대한 포부를 갖고 있다고.

(참고로 보육시설 WeGrow의 1년 학비는 36k 달러라고 ㅋㅋㅋ)

회사에서 급여 관리나 총무업무, 보안 등을 외주 주는 것처럼 오피스의 디자인, 유지업무를 외주 준다.. 지금에야 꽤 널리 퍼진 개념이지만, 위워크 이전에는 이렇게까지 흥하지 못했었죠

Innovating Faster on Personalization Algorithms at Netflix Using Interleaving

https://medium.com/netflix-techblog/interleaving-in-online-experiments-at-netflix-a04ee392ec55?source=linkShare-2275f266a1e-1519719254

Innovating Faster on Personalization Algorithms at Netflix Using Interleaving

By Juliette Aurisset, Michael Ramm, Joshua Parks

medium.com

넷플릭스 엔지니어 발표 듣다가 나온 내용이랑 연관된 블로그 포스팅을 퍼왔어요

Netflix 에서는 algorithm 테스트를 할 때 interleaving 이라는 방식을 사용하는데, A안 B안 노출하는 대상을 분리하는 기존의 방식과 달리, A와 B를 교대로 섞어서 배치하고 모든 테스트 대상자에게 노출하는 방법

A1, B1, A2, B2, ... (처음에 A로 시작할지 B로 시작할지는 랜덤)

의미있는 결과 보기위한 테스트 사이즈가 1/100정도로 줄었다고 함

실제 traditional A/B의 지표와 일치하는지 어떻게 확인했고.. 뭐 이런 내용입니다.

2/28

Direct Traffic in Google Analytics & Last Non-direct Attribution

https://www.lunametrics.com/blog/2017/12/04/direct-traffic-gotchas-last-non-direct-attribution/

Direct Traffic in Google Analytics & Last Non-Direct Attribution

Understand how Google Analytics uses last non-direct click attribution for all default reports, and how it may affect your Acquisition Reports.

www.lunametrics.com

Last non-direct click attribution gives you credit for the long-term effects of your marketing campaigns. Many organizations, particularly those with long buying cycles, like to see these long-term effects.

In addition, last non-direct click attribution decreases the amount of traffic that is labeled “Direct” in your reports while increasing traffic to known sources of traffic. This can give you more actionable data when you are putting together your marketing strategy. It can be much easier to react to data about the performance of your email, social, or cpc campaigns than to the performance of direct traffic.

브런치는 최신 브라우저에 최적화 되어있습니다. IE chrome safari