Presto常用代码

解析JSON: 1 SELECT CAST(json_extract(field, '$.json_path') AS string) 正则匹配 1 SELECT regexp_like('text', 'regex') 定义查询变量 1 2 3 4 5 6 7 8 9 10 11 WITH VARIABLES AS ( SELECT '2022-11-01' AS mdt, '2022-11-31' AS edt ) SELECT a1column FROM app.table a INNER JOIN VARIABLES ON a1dt >= VARIABLES.mdt AND a.dt <= VARIABLES.edt……

Continue reading

opencv常用代码

读写包含中文文件名的图片 1 2 3 4 5 6 7 8 9 10 import cv2 import glob import numpy as np def read_image(path): return cv2.imdecode(np.fromfile(path, dtype=np.uint8), cv2.IMREAD_UNCHANGED) def save_image(path, img): cv2.imencode(".jpg", img)[1].tofile(path) 图片旋转 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import cv2 def rotateAndScale(img, scaleFactor=1.0, degreesCCW=30, borderValue=(255,255,255)): oldY, oldX = img.shape[:2] M = cv2.getRotationMatrix2D(center=(oldX / 2, oldY / 2), angle=degreesCCW, scale=scaleFactor) newX, newY = oldX * scaleFactor, oldY * scaleFactor r = np.deg2rad(degreesCCW) newX, newY = (abs(np.sin(r) * newY) + abs(np.cos(r) * newX), abs(np.sin(r) * newX) + abs(np.cos(r) * newY)) (tx, ty) = ((newX - oldX) / 2, (newY - oldY) / 2) M[0, 2] += tx M[1, 2] += ty return cv2.warpAffine(img, M, dsize=(int(newX), int(newY)), borderValue=borderValue)……

Continue reading

Pickle常用代码

写入数据: 1 2 3 4 5 import pickle, gzip def save_zipped_pickle(obj, fname, protocol=-1): with gzip.open(fname, 'wb') as f: pickle.dump(obj, f, protocol) 读取数据 1 2 3 4 5 6 import pickle, gzip def load_zipped_pickle(fname): with gzip.open(fname, 'rb') as f: loaded_object = pickle.load(f) return loaded_object……

Continue reading

JUnit常用代码

单元测试模板 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 import org.junit.After; import org.junit.AfterClass; import org.junit.Assert; import org.junit.Before; import org.junit.BeforeClass; import org.junit.Test; import org.mkcjito.InjectMocks; import org.mkcjito.Mock; import org.mkcjito.Mockito; import org.mkcjito.MockitoAnnotations; public class Test { // 模拟对象 @Mock private ClassA a; // 被测类 @InjectMocks private Instance instance; // 初始化 @BeforeClass public static void setUpClass() { } @AfterClass public static void tearDownClass() { } @Before public void setUp() throws Exception() { // 初始化测试用例类中由Mockito的注解……

Continue reading

Pandas常用代码

最佳迭代方法: 1 2 3 4 5 import pandas as pd from tqdm import tqdm for row in tqdm(df.to_dict(orient="records")): # do something 获取行数和列数 1 2 3 4 5 6 7 import pandas as pd rows = len(df.axes[0]) cols = len(df.axes[1]) rows = df.shape[0] cols = df.shape[1] 分块读取超大文件 1 2 3 4 5 6 7 8 import pandas as pd from tqdm import tqdm data = pd.read_csv('dataset.csv', chunksize=1000) for chunk in data: for row in tqdm(chunk.to_dict(orient="records")): # do something 根据已有单个列扩充新列 1 2 3 4 5 6 import pandas as pd def valFunc(val): return val+1 df['D'] = df['C'].apply(valFunc) 自定义函数筛选 1 2 3 4 5 6 import pandas as……

Continue reading

XGBoost常用代码

显示特征重要性 1 2 3 4 5 6 import pandas as pd f_importance = xbg_reg.get_booster().get_score(importance_type='gain') importance_df = pd.DataFrame.from_dict(data=f_importance, orient='index') importance_df.plot.bar() 辅助数据文件:组输入 适用于排序任务中的pairwise和listwise;组文件自动加载,若数据文件为train.txt,则组文件为train.txt.group;组文件为单列数字,按顺序指示数据文件中每个组的实例数。 辅助数据文件……

Continue reading

TQDM常用代码

与Pandas结合: 1 2 3 4 from tqdm.autonotebook import tqdm for row in tqdm(df.itertuples(), total=df.shape[0], ncols=128): # do something……

Continue reading

Python常用代码

Dict按value排序 1 sorted_obj = {k: v for k, v in sorted(obj.items(), key=lambda item: item[1], reverse=True)} 多线程执行,注意结果的获取方式 1 2 3 4 5 6 7 8 9 10 11 12 from multiprocessing import Pool import os def f(x): print('Child process id:', os.getpid()) return x*2 if __name__ == '__main__': with Pool(5) as p: results = p.map(f, [1, 2, 3]) print(results)……

Continue reading